Abstract Interpretation and Low-Level Code Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Abstract Interpretation and Low-Level Code Optimization Abstract Interpretation and LowLevel Co de Optimization Saumya Debray t of Computer Science Departmen University of Arizona Tucson AZ Abstract language levels our current implementation applies op timizations at levels the Janus virtual machine Abstract interpretation is widely accepted as a natu the intermediate representations of the C compiler ral framework for semanticsbased analysis of program and the target machine co de the last two within prop erties However most formulations of abstract in the C compiler In each case the optimizations can el semantic enti terpretation are in terms of highlev b e seen as program transformations at a particular lan address the needs of low ties that do not adequately guage level A fundamental requirement of the compila level optimizations In this pap er we discuss the role tion pro cess is that it should b e semanticspreserving of abstract interpretation in lowlevel compiler opti in the sense that the meaning or b ehavior of the mizations examine some of its limitations and consider executable co de should conform to what the semantics ways in which they might b e addressed or this to of the source program says it should b e F happ en it is necessary in general that b oth transla tions and optimizations should b e semanticspreserving Intro duction in this sense Since our primary fo cus is on optimiza tions rather than translations we will assume here that The pro cess of compilation by which executable co de our translations satisfy this requirement and fo cus our is generated from a source program can b e thought of attention on optimizations as a series of transformations and translations through It is very often the case that an optimization is not a succession of languages starting at the source lan universally applicable In other words in order to en guage and ending at the target language In this pic sure that an optimization do es not alter the observable ture we can distinguish b etween two kinds of trans b ehavior of a program in unacceptable ways we have formations translations which take a program in a to ensure that certain preconditions particular to that language and pro duce a program in a dierent usu optimization are satised As an example consider reg ally lowerlevel language and optimizations which ister allo cation in a C compiler the value of a variable transform a program in a language to another program can b e kept in a register only if certain conditions re in the same language As an example a compiler that garding aliasing are fullled In general this means that we have implemented for a logic programming language it may b e necessary to examine a program and extract called Jan us works by translating the input pro some information ab out its b ehavior which can then grams into C then invoking a C compiler to generate b e used for optimization purp oses Further in order to executable co de In this system we can identify the fol verify that the prop erties so inferred describ e all p ossi lowing language levels the source language the ble runtime b ehaviors of a program it is necessary to Janus virtual machine language C the inter b e able to relate the analyses to the semantics of the mediate representations within the C compiler and language in a precise way the target machine language In principle optimiz Semanticsbased techniques such as abstract inter ing transformations can b e applied at each of these ve pretation provide a natural framework for This work was supp orted in part by the National Science Foun such program analyses The general idea is to rely on dation under grant CCR the formal semantics of a program to sp ecify all of its p ossible computational b ehaviors and to derive nitely computable descriptions of such b ehaviors by system atically approximating the op erational b ehavior of the 0 Benchmark Execution Time secs Heap Usage words noopt opt nooptopt noopt opt nooptopt aquad bessel binomial chebyshev e fib log mandelbrot muldiv nrev pi sum tak Geometric Mean Table Performance improvements due to lowlevel optimizations jc on a SparcstationIPC or not an optimization is to considered lowlevel de program The correctness of an analysis can then b e p ends among other things on the language b eing con derived from the mathematical relationships b etween sidered for example in a language with explicit con the actual computational domain of the program and structs for iteration the implementation of a tail re the domain of descriptions manipulated by the analy cursive pro cedure in terms of iteration could b e consid sis and b etween the actual op erations executed by the ered as a highlevel optimization in a language without program and the approximations to those op erations sourcelevel iterative constructs however this would b e used during the analysis a lowlevel optimization program transformations can b e viewed Optimizing There are two reasons why lowlevel optimizations at many levels corresp onding to the dierent levels of are imp ortant The rst is that they are b eyond the languages encountered during compilation At a high reach of the user The p oint is that when faced with a level for example we have transformations such as compiler that do es not do much in the way of highlevel nite dierencing recursion removal ie trans optimizations the determined user can in principle formation of recursive programs to tail recursive form carry out the transformations manually where necessary deforestation transformations for par a in order to obtain co de with go o d p erformance With allelization and vectorization see for example compiler that do es not p erform lowlevel optimizations as well as various transformations describ ed by Bacon however there is little that even the most determined of et al At the level of intermediate co de we users can do In particular this implies that in the ab have machineindep endent lowlevel optimizations such sence of lowlevel optimizations even carefully crafted as induction variable elimination closure representa programs written by skilled programmers will incur p er tion optimization in functional languages and formance p enalties over which they have little control dereferencing optimizations in logic programming lan The second reason such optimizations are imp ortant guages At a lower level still we have machine is that they can pro duce substantial p erformance im dep endent transformations such as register allo cation provements As an example of this Table gives some and instruction scheduling Concep p erformance numb ers for jc an implementation of a tually we can divide these various optimizations into dynamically typ ed logic programming language two classes high level optimizations which corresp ond The jc compiler currently p erforms only lowlevel opti roughly to optimizations that can b e expressed in terms mizations call forwarding which is a form of jump of transformations on the source program or its ab redirection at the intermediate co de level a simple form stract syntax tree and lowlevel optimizations which of interpro cedural register allo cation for output value that are not visible at the involve constructs and ob jects placement and representation optimization ie us source level and which therefore cannot b e so expressed ing unb oxed values where p ossible for numerical val this classication is not absolute of course whether ues As Table indicates for the b enchmarks tested ab out machinelevel entities has b een abstracted away these optimizations more than double the sp eed of the The problem of course is that usually we think of the programs on the average and also lead to signicant pro cess of abstraction as forgetting ab out irrelevant improvements in heap memory usage The sp eed of asp ects of the b ehavior of a program while in this case the resulting co de is comp etitive with that of optimized it is precisely the most relevant asp ects of the programs C co de written in a natural imp erative style on the b ehavior that are b eing forgotten b enchmarks shown the Janus programswhic h are dy The problem can b e addressed by abstract interpre w synchronization b e namically typ ed and use datao tation based on a lowlevel semantics While this do es tween pro ducers and consumersis on the average not seem dierent from any other sort of abstract inter only slower than C co de compiled with gcc O pretation at a conceptual level the practical details can ab out faster than C compiled with cc O and b ecome messy As an example it is very likely simpler faster than C compiled with cc O This indicates and more convenient to manipulate a highlevel repre that lowlevel optimizations can b e a valuable source of sentation of a program such as an abstract syntax tree p erformance improvements for such analyses since the numb er of dierent kinds of The app eal of semanticsbased program manipula ob jects and op erations that have to b e dealt with for tion techniques is that they allow us to reason formally such representations is relatively small However it is ab out the manipulations themselves and certify with not clear that a high level program representation can some condence that such manipulations will not cause enco de lowlevel information in a reasonable way with bad things to happ en This pap er considers the appli out implicit or explicit assumptions ab out the b ehav cability and relevance of semanticsbased program anal ior of the co de generator This in turn
Recommended publications
  • Equality Saturation: a New Approach to Optimization
    Logical Methods in Computer Science Vol. 7 (1:10) 2011, pp. 1–37 Submitted Oct. 12, 2009 www.lmcs-online.org Published Mar. 28, 2011 EQUALITY SATURATION: A NEW APPROACH TO OPTIMIZATION ROSS TATE, MICHAEL STEPP, ZACHARY TATLOCK, AND SORIN LERNER Department of Computer Science and Engineering, University of California, San Diego e-mail address: {rtate,mstepp,ztatlock,lerner}@cs.ucsd.edu Abstract. Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization. We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The op- timizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program. At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuris- tic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own. We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer.
    [Show full text]
  • Scalable Conditional Induction Variables (CIV) Analysis
    Scalable Conditional Induction Variables (CIV) Analysis ifact Cosmin E. Oancea Lawrence Rauchwerger rt * * Comple A t te n * A te s W i E * s e n l C l o D C O Department of Computer Science Department of Computer Science and Engineering o * * c u e G m s E u e C e n R t v e o d t * y * s E University of Copenhagen Texas A & M University a a l d u e a [email protected] [email protected] t Abstract k = k0 Ind. k = k0 DO i = 1, N DO i =1,N Var. DO i = 1, N IF(cond(b(i)))THEN Subscripts using induction variables that cannot be ex- k = k+2 ) a(k0+2*i)=.. civ = civ+1 )? pressed as a formula in terms of the enclosing-loop indices a(k)=.. Sub. ENDDO a(civ) = ... appear in the low-level implementation of common pro- ENDDO k=k0+MAX(2N,0) ENDIF ENDDO gramming abstractions such as filter, or stack operations and (a) (b) (c) pose significant challenges to automatic parallelization. Be- Figure 1. Loops with affine and CIV array accesses. cause the complexity of such induction variables is often due to their conditional evaluation across the iteration space of its closed-form equivalent k0+2*i, which enables its in- loops we name them Conditional Induction Variables (CIV). dependent evaluation by all iterations. More importantly, This paper presents a flow-sensitive technique that sum- the resulted code, shown in Figure 1(b), allows the com- marizes both such CIV-based and affine subscripts to pro- piler to verify that the set of points written by any dis- gram level, using the same representation.
    [Show full text]
  • Induction Variable Analysis with Delayed Abstractions1
    Induction Variable Analysis with Delayed Abstractions1 SEBASTIAN POP, and GEORGES-ANDRE´ SILBER CRI, Mines Paris, France and ALBERT COHEN ALCHEMY group, INRIA Futurs, Orsay, France We present the design of an induction variable analyzer suitable for the analysis of typed, low-level, three address representations in SSA form. At the heart of our analyzer stands a new algorithm that recognizes scalar evolutions. We define a representation called trees of recurrences that is able to capture different levels of abstractions: from the finer level that is a subset of the SSA representation restricted to arithmetic operations on scalar variables, to the coarser levels such as the evolution envelopes that abstract sets of possible evolutions in loops. Unlike previous work, our algorithm tracks induction variables without prior classification of a few evolution patterns: different levels of abstraction can be obtained on demand. The low complexity of the algorithm fits the constraints of a production compiler as illustrated by the evaluation of our implementation on standard benchmark programs. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—compilers, interpreters, optimization, retargetable compilers; F.3.2 [Logics and Meanings of Programs]: Semantics of Programming Languages—partial evaluation, program analysis General Terms: Compilers Additional Key Words and Phrases: Scalar evolutions, static analysis, static single assignment representation, assessing compilers heuristics regressions. 1Extension of Conference Paper:
    [Show full text]
  • Precise Null Pointer Analysis Through Global Value Numbering
    Precise Null Pointer Analysis Through Global Value Numbering Ankush Das1 and Akash Lal2 1 Carnegie Mellon University, Pittsburgh, PA, USA 2 Microsoft Research, Bangalore, India Abstract. Precise analysis of pointer information plays an important role in many static analysis tools. The precision, however, must be bal- anced against the scalability of the analysis. This paper focusses on improving the precision of standard context and flow insensitive alias analysis algorithms at a low scalability cost. In particular, we present a semantics-preserving program transformation that drastically improves the precision of existing analyses when deciding if a pointer can alias Null. Our program transformation is based on Global Value Number- ing, a scheme inspired from compiler optimization literature. It allows even a flow-insensitive analysis to make use of branch conditions such as checking if a pointer is Null and gain precision. We perform experiments on real-world code and show that the transformation improves precision (in terms of the number of dereferences proved safe) from 86.56% to 98.05%, while incurring a small overhead in the running time. Keywords: Alias Analysis, Global Value Numbering, Static Single As- signment, Null Pointer Analysis 1 Introduction Detecting and eliminating null-pointer exceptions is an important step towards developing reliable systems. Static analysis tools that look for null-pointer ex- ceptions typically employ techniques based on alias analysis to detect possible aliasing between pointers. Two pointer-valued variables are said to alias if they hold the same memory location during runtime. Statically, aliasing can be de- cided in two ways: (a) may-alias [1], where two pointers are said to may-alias if they can point to the same memory location under some possible execution, and (b) must-alias [27], where two pointers are said to must-alias if they always point to the same memory location under all possible executions.
    [Show full text]
  • A Formally-Verified Alias Analysis
    A Formally-Verified Alias Analysis Valentin Robert1;2 and Xavier Leroy1 1 INRIA Paris-Rocquencourt 2 University of California, San Diego [email protected], [email protected] Abstract. This paper reports on the formalization and proof of sound- ness, using the Coq proof assistant, of an alias analysis: a static analysis that approximates the flow of pointer values. The alias analysis con- sidered is of the points-to kind and is intraprocedural, flow-sensitive, field-sensitive, and untyped. Its soundness proof follows the general style of abstract interpretation. The analysis is designed to fit in the Comp- Cert C verified compiler, supporting future aggressive optimizations over memory accesses. 1 Introduction Alias analysis. Most imperative programming languages feature pointers, or object references, as first-class values. With pointers and object references comes the possibility of aliasing: two syntactically-distinct program variables, or two semantically-distinct object fields can contain identical pointers referencing the same shared piece of data. The possibility of aliasing increases the expressiveness of the language, en- abling programmers to implement mutable data structures with sharing; how- ever, it also complicates tremendously formal reasoning about programs, as well as optimizing compilation. In this paper, we focus on optimizing compilation in the presence of pointers and aliasing. Consider, for example, the following C program fragment: ... *p = 1; *q = 2; x = *p + 3; ... Performance would be increased if the compiler propagates the constant 1 stored in p to its use in *p + 3, obtaining ... *p = 1; *q = 2; x = 4; ... This optimization, however, is unsound if p and q can alias.
    [Show full text]
  • Strength Reduction of Induction Variables and Pointer Analysis – Induction Variable Elimination
    Loop optimizations • Optimize loops – Loop invariant code motion [last time] Loop Optimizations – Strength reduction of induction variables and Pointer Analysis – Induction variable elimination CS 412/413 Spring 2008 Introduction to Compilers 1 CS 412/413 Spring 2008 Introduction to Compilers 2 Strength Reduction Induction Variables • Basic idea: replace expensive operations (multiplications) with • An induction variable is a variable in a loop, cheaper ones (additions) in definitions of induction variables whose value is a function of the loop iteration s = 3*i+1; number v = f(i) while (i<10) { while (i<10) { j = 3*i+1; //<i,3,1> j = s; • In compilers, this a linear function: a[j] = a[j] –2; a[j] = a[j] –2; i = i+2; i = i+2; f(i) = c*i + d } s= s+6; } •Observation:linear combinations of linear • Benefit: cheaper to compute s = s+6 than j = 3*i functions are linear functions – s = s+6 requires an addition – Consequence: linear combinations of induction – j = 3*i requires a multiplication variables are induction variables CS 412/413 Spring 2008 Introduction to Compilers 3 CS 412/413 Spring 2008 Introduction to Compilers 4 1 Families of Induction Variables Representation • Basic induction variable: a variable whose only definition in the • Representation of induction variables in family i by triples: loop body is of the form – Denote basic induction variable i by <i, 1, 0> i = i + c – Denote induction variable k=i*a+b by triple <i, a, b> where c is a loop-invariant value • Derived induction variables: Each basic induction variable i defines
    [Show full text]
  • Aliases, Intro. to Optimization
    Aliasing Two variables are aliased if they can refer to the same storage location. Possible Sources:pointers, parameter passing, storage overlap... Ex. address of a passed x and y are aliased! Pointer Analysis, Alias Analysis to get less conservative info Needed for correct, aggressive optimization Procedures: terminology a, e global b,c formal arguments d local call site with actual arguments At procedure call, formals bound to actuals, may be aliased Ex. (b,a) , (c, d) Globals, actuals may be modified, used Ex. a, b Call Graphs Determines possible flow of control, interprocedurally G = (N, LE, s) N set of nodes LE set of labelled edges n m s start node Qu: Why need call site labels? Why list? Example Call Graph 1,2 3 4 5 6 7 Interprocedural Dataflow Analysis Based on call graph: forward, backward Gen, Kill: Need to summarize procedures per call Flow sensitive: take procedure's control flow into account Flow insensitive: ignore procedure's control flow Difficulties: Hard, complex Flow sensitive alias analysis intractable Separate compilation? Scale compiler can do both flow sensitive and insensitive Most compilers ultraconservative, or flow insensitive Scalar Replacement of Aggregates Use scalar temporary instead of aggregate variable Compiler may limit optimization to such scalars Can do better register allocation, constant propagation,... Particulary useful when small number of constant values Can use constant propagation, dead code elimination to specialize code Value Numbering of Basic Blocks Eliminates computations whose values are already computed in BB value needn't be constant Method: Value Number Hash Table Global Copy Propagation Given A:=B, replace later uses of A by B, as long as A,B not redefined (with dead code elim) Global Copy Propagation Ex.
    [Show full text]
  • Generalizing Loop-Invariant Code Motion in a Real-World Compiler
    Imperial College London Department of Computing Generalizing loop-invariant code motion in a real-world compiler Author: Supervisor: Paul Colea Fabio Luporini Co-supervisor: Prof. Paul H. J. Kelly MEng Computing Individual Project June 2015 Abstract Motivated by the perpetual goal of automatically generating efficient code from high-level programming abstractions, compiler optimization has developed into an area of intense research. Apart from general-purpose transformations which are applicable to all or most programs, many highly domain-specific optimizations have also been developed. In this project, we extend such a domain-specific compiler optimization, initially described and implemented in the context of finite element analysis, to one that is suitable for arbitrary applications. Our optimization is a generalization of loop-invariant code motion, a technique which moves invariant statements out of program loops. The novelty of the transformation is due to its ability to avoid more redundant recomputation than normal code motion, at the cost of additional storage space. This project provides a theoretical description of the above technique which is fit for general programs, together with an implementation in LLVM, one of the most successful open-source compiler frameworks. We introduce a simple heuristic-driven profitability model which manages to successfully safeguard against potential performance regressions, at the cost of missing some speedup opportunities. We evaluate the functional correctness of our implementation using the comprehensive LLVM test suite, passing all of its 497 whole program tests. The results of our performance evaluation using the same set of tests reveal that generalized code motion is applicable to many programs, but that consistent performance gains depend on an accurate cost model.
    [Show full text]
  • Efficient Symbolic Analysis for Optimizing Compilers*
    Efficient Symbolic Analysis for Optimizing Compilers? Robert A. van Engelen Dept. of Computer Science, Florida State University, Tallahassee, FL 32306-4530 [email protected] Abstract. Because most of the execution time of a program is typically spend in loops, loop optimization is the main target of optimizing and re- structuring compilers. An accurate determination of induction variables and dependencies in loops is of paramount importance to many loop opti- mization and parallelization techniques, such as generalized loop strength reduction, loop parallelization by induction variable substitution, and loop-invariant expression elimination. In this paper we present a new method for induction variable recognition. Existing methods are either ad-hoc and not powerful enough to recognize some types of induction variables, or existing methods are powerful but not safe. The most pow- erful method known is the symbolic differencing method as demonstrated by the Parafrase-2 compiler on parallelizing the Perfect Benchmarks(R). However, symbolic differencing is inherently unsafe and a compiler that uses this method may produce incorrectly transformed programs without issuing a warning. In contrast, our method is safe, simpler to implement in a compiler, better adaptable for controlling loop transformations, and recognizes a larger class of induction variables. 1 Introduction It is well known that the optimization and parallelization of scientific applica- tions by restructuring compilers requires extensive analysis of induction vari- ables and dependencies
    [Show full text]
  • Automatic Parallelization of C by Means of Language Transcription
    Automatic Parallelization of C by Means of Language Transcription Richard L. Kennell Rudolf Eigenmann Purdue University, School of Electrical and Computer Engineering Abstract. The automatic parallelization of C has always been frustrated by pointer arithmetic, irregular control flow and complicated data aggregation. Each of these problems is similar to familiar challenges encountered in the parallelization of more rigidly-structured languages such as FORTRAN. By creating a mapping from one language to the other, we can expose the capabil- ities of existing automatically parallelizing compilers to the C language. In this paper, we describe our approach to mapping applications written in C to a form suitable for the Polaris source-to- source FORTRAN compiler. We also describe the improvements in the compiled applications realized by this second level of transformation and show results for a small application in compar- ison to commercial compilers. 1.0 Introduction Polaris is a automatically parallelizing source-to-source FORTRAN compiler. It accepts FORTRAN77 input and produces a FORTRAN output in a new dialect that supports explicit par- allelism by means of embedded directives such as the OpenMP [Ope97] or Sun FORTRAN Directives [Sun96]. The benefit that Polaris provides is in automating the analysis of the loops and array accesses in the application to determine how they can best be expressed to exploit available parallelism. Since FORTRAN naturally constrains the way in which parallelism exists, the analy- sis is somewhat more straightforward than with other languages. This allows Polaris to perform very complicated interprocedural and global analysis without risk of misinterpretation of pro- grammer intent. Experimental results show that Polaris is able to markedly improve the run-time of applications without additional programmer direction [PVE96, BDE+96].
    [Show full text]
  • Compiler-Based Code-Improvement Techniques
    Compiler-Based Code-Improvement Techniques KEITH D. COOPER, KATHRYN S. MCKINLEY, and LINDA TORCZON Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors. 1INTRODUCTION This paper presents an overview of compiler-based methods for improving the run-time behavior of programs — often mislabeled code optimization. These techniques have a long history in the literature. For example, Backus makes it quite clear that code quality was a major concern to the implementors of the first Fortran compilers [18].
    [Show full text]
  • Lecture 1 Introduction
    Lecture 1 Introduction • What would you get out of this course? • Structure of a Compiler • Optimization Example Carnegie Mellon Todd C. Mowry 15-745: Introduction 1 What Do Compilers Do? 1. Translate one language into another – e.g., convert C++ into x86 object code – difficult for “natural” languages, but feasible for computer languages 2. Improve (i.e. “optimize”) the code – e.g, make the code run 3 times faster – driving force behind modern processor design Carnegie Mellon 15-745: Introduction 2 Todd C. Mowry What Do We Mean By “Optimization”? • Informal Definition: – transform a computation to an equivalent but “better” form • in what way is it equivalent? • in what way is it better? • “Optimize” is a bit of a misnomer – the result is not actually optimal Carnegie Mellon 15-745: Introduction 3 Todd C. Mowry How Can the Compiler Improve Performance? Execution time = Operation count * Machine cycles per operation • Minimize the number of operations – arithmetic operations, memory acesses • Replace expensive operations with simpler ones – e.g., replace 4-cycle multiplication with 1-cycle shift • Minimize cache misses Processor – both data and instruction accesses • Perform work in parallel cache – instruction scheduling within a thread memory – parallel execution across multiple threads • Related issue: minimize object code size – more important on embedded systems Carnegie Mellon 15-745: Introduction 4 Todd C. Mowry Other Optimization Goals Besides Performance • Minimizing power and energy consumption • Finding (and minimizing the
    [Show full text]