A Shadow Execution and Dynamic Analysis Framework for LLVM IR and Javascript

Total Page:16

File Type:pdf, Size:1020Kb

A Shadow Execution and Dynamic Analysis Framework for LLVM IR and Javascript A Shadow Execution and Dynamic Analysis Framework for LLVM IR and JavaScript ∗ Liang Gong Cuong Nguyen University of California, Berkeley University of California, Berkeley [email protected] [email protected] ABSTRACT 1. INTRODUCTION Dynamic program analysis is a widely used technique to Dynamic program analysis is the analysis of software ap- analyse programs by executing program on a real or virtual plications by running the programs with a set of test inputs processor. However, implementing a robust and scalable and observing their behaviours during executions. Dynamic dynamic analysis tool for a full-fledged language such as program analysis is widely used for software testing, software C/C++ or JavaScript is challenged by many factors: ar- profiling, understanding software behaviours, among other bitrary pointer arithmetic, un-interpreted functions or arbi- things [12, 5, 9]. We observed that most program analysis trary dynamic features. As a consequence, many existing tools are implemented in two phases, first instrument the dynamic analysis tools either ignore or do not support these program at instructions of interest and compute additional features completely. In this paper, we propose a robust information at these instructions. However, implementing a and scalable dynamic analysis framework that supports full- robust and scalable dynamic analysis tool for a full-fledged fledged LLVM IR and JavaScript language based on shadow language such as C/C++ or JavaScript is challenged by execution. Our framework can be easily extended to imple- many factors: arbitrary pointer arithmetic, un-interpreted ment almost any dynamic analysis of interest. functions or arbitrary dynamic features. As a consequence, We implemented on top of our frameworks several dynam- many existing dynamic analysis tools either ignore or do not ic analysis techniques, all of which are less than a few hun- support these features completely. dreds line of codes. In particular, one of our analysis which Indeed, for example on the C/C++ site, consider the tool found bugs in real world programs and websites including Klee, which is a state of the art test generation tools based Facebook and jQuery. Finally, we show that our framework on LLVM IR [7]. A recent study shown that the tool does is able to dealt with real world programs of medium and not support several features of C programs: unsupported large size within an acceptable overhead, which is compara- symbolic sizes to malloc, unsupported calls to unknown in- ble to existing state-of-the-art dynamic analysis tools such structions, call to an invalid function pointer, or inlined as Pin and Valgrind. assembly or unsupported external functions with symbolic arguments [13]. Some other start-of-the-art analysis tools for Categories and Subject Descriptors C programs overall skip mentioning about those issues [4, 1]. Similar analysis tools to ours are Valgrind and Pin which D.2.5 [Software Engineering]: Testing and Debugging| also allow code instrumentation at different code granularity symbolic execution, testing tools levels (i.e. function, block or instruction level, etc.) [15, 14]. However, they do now allow altering the semantic of the General Terms programs, thus is less general than our approach. Verification On the JavaScript side, there exists no dynamic analysis framework for front-end in browser analysis for JavaScript similar to valgrind[15], PIN[14] or DynamoRIO[6] for x86. Keywords Based on this general concept, we implement an analysis Dynamic Analysis; LLVM; JavaScript; framework for the front-end JavaScript. Our framework in- tercepts and transforms all possible JavaScript code snippet ∗Names are in alphabetical order. Each author makes equal contribution to the paper. in either a HTML web page or an external JavaScript file. This work was done as a final project for CS262A in Fall Concretely, the contributions of this paper are as follows: 2013 at UC Berkeley. 1. We design and implement a general LLVM IR and JavaScript analysis framework. To our best knowl- edge, this is the first general purpose dynamic analy- sis framework for LLVM IR and front-end JavaScript Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are based on shadow execution. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to 2. We demonstrate that our analysis framework preserves republish, to post on servers or to redistribute to lists, requires prior specific the semantics of the target code and does not cause permission and/or a fee. obvious slowdown on real-world programs. CS262a ’13 UC Berkeley Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. 3. On top of our framework, we implement several sim- ple but powerful dynamic analysis modules which find // alloca double %a! bugs in real-world websites including jQuery and Face- my_alloca(double) { book. %my_a = new ShadowValue(primitive: double) } 4. Based on our framework, analysis module developers ! are not required to have deep knowledge on the low level language and virtual machine mechanism. // store %b, %a! my_store(%my_b, %my_a) { The rest of this paper is structured as follows. In Section %my_a.concrete = %my_b.concrete 2 we introduce shadow execution and show an example in %my_a.shadow = new ShadowInfo(%my_b.shadow.value) which we compute long double values along with concrete } values. In section 3 we go though the technical details of ! LLVM and JavaScript shadow execution system. We then show some dynamic analysis applications we built on top // %c = fadd %a, %b! of our framework in Section 4. We the conclude with an my_fadd(%my_a, %my_b) { evaluation section (Section 5), related work (Section 6) and %my_c = new ShadowValue(primitive: %my_a.primitive) conclusion (Section 7). %my_c.concrete = %my_a.concrete + %my_b.concrete %my_c.shadow = new ShadowInfo(%my_a.shadow.value 2. BACKGROUND + %my_b.shadow.value) } 2.1 Shadow Execution In shadow execution, each concrete value in the program Figure 2: An example of shadow execution. has an associate shadow value. Shadow value can carry useful information about the concrete value, such as how the error was propagated, the array bound of a pointer or object. Finally, for the %c = fadd %a %b instruction, the symbolic value of the concrete value, etc. In our implemen- shadow execution first creates a shadow object for c, called tation, the shadow execution is performed side-by-side with my_c. Then it assigns the concrete value of my_c to be the the concrete execution and the concrete execution assists sum of the concrete values of my_a and my_b. Finally, it shadow execution to guarantee correctness. Cases in which initializes the shadow information of my_c to be the sum the concrete execution needs to assist shadow execution are of shadow information of my_a and my_b. Recall that the to interpret the return value of an un-interpreted function shadow information in this case is the long double value of or to replicate the un-interpreted function side effects. the concrete value. In this way, the long double value of the concrete value is computed along the execution. Shadow Value VALUE concrete Shadow Info 3. TECHNICAL DETAILS TYPE primitive void* shadow long double value We are now ready to discuss the technical details for our shadow execution systems. We implemented two shadow execution systems, one for LLVM IR and one for JavaScript. Figure 1: Class diagram of a shadow value. Each system employs the same underlying shadow execution techniques, but they differ in the way different language Figure 1 depicts the design of our shadow value. Each features are handled. In particular, the LLVM shadow ex- shadow value maintains three values: the concrete value ecution system faces challenges in implementing a memory as computed in the concrete execution called concrete, the model for pointer arithmetics and handling un-interpreted type of the value called primitive, and a void pointer called calls, among others, while JavaScript's challenge is to ad- shadow. For modelling LLVM IR, we only need to have value dress the dynamic nature of the language. This section of primitive types. Arrays, structures and their combina- discusses the technical details on both LLVM and JavaScript tions are modelled using primitive types. The shadow void shadow execution system, respectively in Section 3.1 and 3.2. pointer can point to any objects defined by the users. They serve as a carrier for the extra information the users want 3.1 LLVM Shadow Execution System to tag along. The extra information are different depending on the analysis. In the example in Figure 1, the shadow 3.1.1 Overview information contains a long double value, which carry the Figure 3 illustrates the architecture of our LLVM shadow concrete value computed in long double precision. execution framework. The input to our framework is an LLVM Figure 2 depicts an example of how shadow execution IR of the program of interest. The output of our framework is performed for three LLVM instructions alloca, store is an instrumented LLVM IR programs with some default and fadd. Firstly, at the alloca double %a instruction, hooks. By default, our hooks re-interpret the program using the shadow execution creates a shadow value object of type shadow execution. Users can extend these hooks to add double. Shadow information is not initialized at this step. more shadow information and define the semantics of how Secondly, at the store %b %a instruction, the shadow exe- these shadow information are computed to implement their cution first assigns the concrete value of b's shadow object analysis of interest. As these hooks are implemented as dy- (my_b) to the concrete value of a's shadow object (my_a). namically loaded libraries, they can be plugged in (possibly It then initializes the shadow information of a's shadow at the same time) or plugged out easily.
Recommended publications
  • Three Architectural Models for Compiler-Controlled Speculative
    Three Architectural Mo dels for Compiler-Controlled Sp eculative Execution Pohua P. Chang Nancy J. Warter Scott A. Mahlke Wil liam Y. Chen Wen-mei W. Hwu Abstract To e ectively exploit instruction level parallelism, the compiler must move instructions across branches. When an instruction is moved ab ove a branch that it is control dep endent on, it is considered to b e sp eculatively executed since it is executed b efore it is known whether or not its result is needed. There are p otential hazards when sp eculatively executing instructions. If these hazards can b e eliminated, the compiler can more aggressively schedule the co de. The hazards of sp eculative execution are outlined in this pap er. Three architectural mo dels: re- stricted, general and b o osting, whichhave increasing amounts of supp ort for removing these hazards are discussed. The p erformance gained by each level of additional hardware supp ort is analyzed using the IMPACT C compiler which p erforms sup erblo ckscheduling for sup erscalar and sup erpip elined pro cessors. Index terms - Conditional branches, exception handling, sp eculative execution, static co de scheduling, sup erblo ck, sup erpip elining, sup erscalar. The authors are with the Center for Reliable and High-Performance Computing, University of Illinois, Urbana- Champaign, Illinoi s, 61801. 1 1 Intro duction For non-numeric programs, there is insucient instruction level parallelism available within a basic blo ck to exploit sup erscalar and sup erpip eli ned pro cessors [1][2][3]. Toschedule instructions b eyond the basic blo ck b oundary, instructions havetobemoved across conditional branches.
    [Show full text]
  • Advanced Data Structures
    Advanced Data Structures PETER BRASS City College of New York CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521880374 © Peter Brass 2008 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2008 ISBN-13 978-0-511-43685-7 eBook (EBL) ISBN-13 978-0-521-88037-4 hardback Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Preface page xi 1 Elementary Structures 1 1.1 Stack 1 1.2 Queue 8 1.3 Double-Ended Queue 16 1.4 Dynamical Allocation of Nodes 16 1.5 Shadow Copies of Array-Based Structures 18 2 Search Trees 23 2.1 Two Models of Search Trees 23 2.2 General Properties and Transformations 26 2.3 Height of a Search Tree 29 2.4 Basic Find, Insert, and Delete 31 2.5ReturningfromLeaftoRoot35 2.6 Dealing with Nonunique Keys 37 2.7 Queries for the Keys in an Interval 38 2.8 Building Optimal Search Trees 40 2.9 Converting Trees into Lists 47 2.10
    [Show full text]
  • Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK
    1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it.
    [Show full text]
  • A Parallel Program Execution Model Supporting Modular Software Construction
    A Parallel Program Execution Model Supporting Modular Software Construction Jack B. Dennis Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 U.S.A. [email protected] Abstract as a guide for computer system design—follows from basic requirements for supporting modular software construction. A watershed is near in the architecture of computer sys- The fundamental theme of this paper is: tems. There is overwhelming demand for systems that sup- port a universal format for computer programs and software The architecture of computer systems should components so users may benefit from their use on a wide reflect the requirements of the structure of pro- variety of computing platforms. At present this demand is grams. The programming interface provided being met by commodity microprocessors together with stan- should address software engineering issues, in dard operating system interfaces. However, current systems particular, the ability to practice the modular do not offer a standard API (application program interface) construction of software. for parallel programming, and the popular interfaces for parallel computing violate essential principles of modular The positions taken in this presentation are contrary to or component-based software construction. Moreover, mi- much conventional wisdom, so I have included a ques- croprocessor architecture is reaching the limit of what can tion/answer dialog at appropriate places to highlight points be done usefully within the framework of superscalar and of debate. We start with a discussion of the nature and VLIW processor models. The next step is to put several purpose of a program execution model. Our Parallelism processors (or the equivalent) on a single chip.
    [Show full text]
  • Fast As a Shadow, Expressive As a Tree: Hybrid Memory Monitoring for C
    Fast as a Shadow, Expressive as a Tree: Hybrid Memory Monitoring for C Nikolai Kosmatov1 with Arvid Jakobsson2, Guillaume Petiot1 and Julien Signoles1 [email protected] [email protected] SASEFOR, November 24, 2015 A.Jakobsson, N.Kosmatov, J.Signoles (CEA) Hybrid Memory Monitoring for C 2015-11-24 1 / 48 Outline Context and motivation Frama-C, a platform for analysis of C code Motivation The memory monitoring library An overview Patricia trie model Shadow memory based model The Hybrid model Design principles Illustrating example Dataflow analysis An overview How it proceeds Evaluation Conclusion and future work A.Jakobsson, N.Kosmatov, J.Signoles (CEA) Hybrid Memory Monitoring for C 2015-11-24 2 / 48 Context and motivation Frama-C, a platform for analysis of C code Outline Context and motivation Frama-C, a platform for analysis of C code Motivation The memory monitoring library An overview Patricia trie model Shadow memory based model The Hybrid model Design principles Illustrating example Dataflow analysis An overview How it proceeds Evaluation Conclusion and future work A.Jakobsson, N.Kosmatov, J.Signoles (CEA) Hybrid Memory Monitoring for C 2015-11-24 3 / 48 Context and motivation Frama-C, a platform for analysis of C code A brief history I 90's: CAVEAT, Hoare logic-based tool for C code at CEA I 2000's: CAVEAT used by Airbus during certification process of the A380 (DO-178 level A qualification) I 2002: Why and its C front-end Caduceus (at INRIA) I 2006: Joint project on a successor to CAVEAT and Caduceus I 2008: First public release of Frama-C (Hydrogen) I Today: Frama-C Sodium (v.11) I Multiple projects around the platform I A growing community of users.
    [Show full text]
  • Rockjit: Securing Just-In-Time Compilation Using Modular Control-Flow Integrity
    RockJIT: Securing Just-In-Time Compilation Using Modular Control-Flow Integrity Ben Niu Gang Tan Lehigh University Lehigh University 19 Memorial Dr West 19 Memorial Dr West Bethlehem, PA, 18015 Bethlehem, PA, 18015 [email protected] [email protected] ABSTRACT For performance, modern managed language implementations Managed languages such as JavaScript are popular. For perfor- adopt Just-In-Time (JIT) compilation. Instead of performing pure mance, modern implementations of managed languages adopt Just- interpretation, a JIT compiler dynamically compiles programs into In-Time (JIT) compilation. The danger to a JIT compiler is that an native code and performs optimization on the fly based on informa- attacker can often control the input program and use it to trigger a tion collected through runtime profiling. JIT compilation in man- vulnerability in the JIT compiler to launch code injection or JIT aged languages is the key to high performance, which is often the spraying attacks. In this paper, we propose a general approach only metric when comparing JIT engines, as seen in the case of called RockJIT to securing JIT compilers through Control-Flow JavaScript. Hereafter, we use the term JITted code for native code Integrity (CFI). RockJIT builds a fine-grained control-flow graph that is dynamically generated by a JIT compiler, and code heap for from the source code of the JIT compiler and dynamically up- memory pages that hold JITted code. dates the control-flow policy when new code is generated on the fly. In terms of security, JIT brings its own set of challenges. First, a Through evaluation on Google’s V8 JavaScript engine, we demon- JIT compiler is large and usually written in C/C++, which lacks strate that RockJIT can enforce strong security on a JIT compiler, memory safety.
    [Show full text]
  • Visual Representations of Executing Programs
    Visual Representations of Executing Programs Steven P. Reiss Department of Computer Science Brown University Providence, RI 02912-1910 401-863-7641, FAX: 401-863-7657 {spr}@cs.brown.edu Abstract Programmers have always been curious about what their programs are doing while it is exe- cuting, especially when the behavior is not what they are expecting. Since program execution is intricate and involved, visualization has long been used to provide the programmer with appro- priate insights into program execution. This paper looks at the evolution of on-line visual repre- sentations of executing programs, showing how they have moved from concrete representations of relatively small programs to abstract representations of larger systems. Based on this examina- tion, we describe the challenges implicit in future execution visualizations and methodologies that can meet these challenges. 1. Introduction An on-line visual representation of an executing program is a graphical display that provides information about what a program is doing as the program does it. Visualization is used to make the abstract notion of a computer executing a program concrete in the mind of the programmer. The concurrency of the visualization in con- junction with the execution lets the programmer correlate real time events (e.g., inputs, button presses, error messages, or unexpected delays) with the visualization, making the visualization more useful and meaningful. Visual representations of executing programs have several uses. First, they have traditionally been used for program understanding as can be seen from their use in most algorithm animation systems [37,52]. Second, in various forms they have been integrated into debuggers and used for debugging [2,31].
    [Show full text]
  • Identifying Executable Plans
    Identifying executable plans Tania Bedrax-Weiss∗ Jeremy D. Frank Ari K. J´onssony Conor McGann∗ NASA Ames Research Center, MS 269-2 Moffett Field, CA 94035-1000, ftania,frank,jonsson,[email protected] Abstract AI solutions for planning and plan execution often use declarative models to describe the domain of interest. Generating plans for execution imposes a different set The planning system typically uses an abstract, long- of requirements on the planning process than those im- term model and the execution system typically uses a posed by planning alone. In highly unpredictable ex- ecution environments, a fully-grounded plan may be- concrete, short-term model. In most systems that deal come inconsistent frequently when the world fails to with planning and execution, the language used in the behave as expected. Intelligent execution permits mak- declarative model for planning is different than the lan- ing decisions when the most up-to-date information guage used in the execution model. This approach en- is available, ensuring fewer failures. Planning should forces a rigid separation between the planning model acknowledge the capabilities of the execution system, and the execution model. The execution system and the both to ensure robust execution in the face of uncer- planning system have to agree on the semantics of the tainty, which also relieves the planner of the burden plan, and having two separate models requires the sys- of making premature commitments. We present Plan tem designer to replicate the information contained in Identification Functions (PIFs), which formalize what the planning model in the execution model.
    [Show full text]
  • Speculative Separation for Privatization and Reductions
    Speculative Separation for Privatization and Reductions Nick P. Johnson Hanjun Kim Prakash Prabhu Ayal Zaksy David I. August Princeton University, Princeton, NJ yIntel Corporation, Haifa, Israel fnpjohnso, hanjunk, pprabhu, [email protected] [email protected] Abstract Memory Layout Static Speculative Automatic parallelization is a promising strategy to improve appli- Speculative LRPD [22] R−LRPD [7] cation performance in the multicore era. However, common pro- Privateer (this work) gramming practices such as the reuse of data structures introduce Dynamic PD [21] artificial constraints that obstruct automatic parallelization. Privati- Polaris [29] ASSA [14] zation relieves these constraints by replicating data structures, thus Static Array Expansion [10] enabling scalable parallelization. Prior privatization schemes are Criterion DSA [31] RSSA [23] limited to arrays and scalar variables because they are sensitive to Privatization Manual Paralax [32] STMs [8, 18] the layout of dynamic data structures. This work presents Privateer, the first fully automatic privatization system to handle dynamic and Figure 1: Privatization Criterion and Memory Layout. recursive data structures, even in languages with unrestricted point- ers. To reduce sensitivity to memory layout, Privateer speculatively separates memory objects. Privateer’s lightweight runtime system contention and relaxes the program dependence structure by repli- validates speculative separation and speculative privatization to en- cating the reused storage locations, producing multiple copies in sure correct parallel execution. Privateer enables automatic paral- memory that support independent, concurrent access. Similarly, re- lelization of general-purpose C/C++ applications, yielding a ge- duction techniques relax ordering constraints on associative, com- omean whole-program speedup of 11.4× over best sequential ex- mutative operators by replacing (or expanding) storage locations.
    [Show full text]
  • INTRODUCTION to .NET FRAMEWORK NET Framework .NET Framework Is a Complete Environment That Allows Developers to Develop, Run, An
    INTRODUCTION TO .NET FRAMEWORK NET Framework .NET Framework is a complete environment that allows developers to develop, run, and deploy the following applications: Console applications Windows Forms applications Windows Presentation Foundation (WPF) applications Web applications (ASP.NET applications) Web services Windows services Service-oriented applications using Windows Communication Foundation (WCF) Workflow-enabled applications using Windows Workflow Foundation (WF) .NET Framework also enables a developer to create sharable components to be used in distributed computing architecture. NET Framework supports the object-oriented programming model for multiple languages, such as Visual Basic, Visual C#, and Visual C++. NET Framework supports multiple programming languages in a manner that allows language interoperability. This implies that each language can use the code written in some other language. The main components of .NET Framework? The following are the key components of .NET Framework: .NET Framework Class Library Common Language Runtime Dynamic Language Runtimes (DLR) Application Domains Runtime Host Common Type System Metadata and Self-Describing Components Cross-Language Interoperability .NET Framework Security Profiling Side-by-Side Execution Microsoft Intermediate Language (MSIL) The .NET Framework is shipped with compilers of all .NET programming languages to develop programs. Each .NET compiler produces an intermediate code after compiling the source code. 1 The intermediate code is common for all languages and is understandable only to .NET environment. This intermediate code is known as MSIL. IL Intermediate Language is also known as MSIL (Microsoft Intermediate Language) or CIL (Common Intermediate Language). All .NET source code is compiled to IL. IL is then converted to machine code at the point where the software is installed, or at run-time by a Just-In-Time (JIT) compiler.
    [Show full text]
  • EXE: Automatically Generating Inputs of Death
    EXE: Automatically Generating Inputs of Death Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler Computer Systems Laboratory Stanford University Stanford, CA 94305, U.S.A {cristic, vganesh, piotrek, dill, engler} @cs.stanford.edu ABSTRACT 1. INTRODUCTION This paper presents EXE, an effective bug-finding tool that Attacker-exposed code is often a tangled mess of deeply- automatically generates inputs that crash real code. Instead nested conditionals, labyrinthine call chains, huge amounts of running code on manually or randomly constructed input, of code, and frequent, abusive use of casting and pointer EXE runs it on symbolic input initially allowed to be “any- operations. For safety, this code must exhaustively vet in- thing.” As checked code runs, EXE tracks the constraints put received directly from potential attackers (such as sys- on each symbolic (i.e., input-derived) memory location. If a tem call parameters, network packets, even data from USB statement uses a symbolic value, EXE does not run it, but sticks). However, attempting to guard against all possible instead adds it as an input-constraint; all other statements attacks adds significant code complexity and requires aware- run as usual. If code conditionally checks a symbolic ex- ness of subtle issues such as arithmetic and buffer overflow pression, EXE forks execution, constraining the expression conditions, which the historical record unequivocally shows to be true on the true branch and false on the other. Be- programmers reason about poorly. cause EXE reasons about all possible values on a path, it Currently, programmers check for such errors using a com- has much more power than a traditional runtime tool: (1) bination of code review, manual and random testing, dy- it can force execution down any feasible program path and namic tools, and static analysis.
    [Show full text]
  • Code Transformation and Analysis Using Clang and LLVM Static and Dynamic Analysis
    Code transformation and analysis using Clang and LLVM Static and Dynamic Analysis Hal Finkel1 and G´abor Horv´ath2 Computer Science Summer School 2017 1 Argonne National Laboratory 2 Ericsson and E¨otv¨osLor´adUniversity Table of contents 1. Introduction 2. Static Analysis with Clang 3. Instrumentation and More 1 Introduction Space of Techniques During this set of lectures we'll cover a space of techniques for the analysis and transformation of code using LLVM. Each of these techniques have overlapping areas of applicability: Static Analysis LLVM Instrumentation Source Transformation 2 Space of Techniques When to use source-to-source transformation: • When you need to use the instrumented code with multiple compilers. • When you intend for the instrumentation to become a permanent part of the code base. You'll end up being concerned with the textual formatting of the instrumentation if humans also need to maintain or enhance this same code. 3 Space of Techniques When to use Clang's static analysis: • When the analysis can be performed on an AST representation. • When you'd like to maintain a strong connection to the original source code. • When false negatives are acceptable (i.e. it is okay if you miss problems). https://clang-analyzer.llvm.org/ 4 Space of Techniques When to use IR instrumentation: • When the necessary conditions can be (or can only be) detected at runtime (often in conjunction with a specialized runtime library). • When you require stronger coverage guarantees than static analysis. • When you'd like to reduce the cost of the instrumentation by running optimizations before the instrumentation is inserted, after the instrumentation is inserted, or both.
    [Show full text]