Macroscopic Data Structure Analysis and Optimization

Total Page:16

File Type:pdf, Size:1020Kb

Macroscopic Data Structure Analysis and Optimization MACROSCOPIC DATA STRUCTURE ANALYSIS AND OPTIMIZATION BY CHRIS LATTNER B.S., University of Portland, 2000 M.S., University of Illinois at Urbana-Champaign, 2002 DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2005 Urbana, Illinois Abstract Providing high performance for pointer-intensive programs on modern architectures is an increas- ingly difficult problem for compilers. Pointer-intensive programs are often bound by memory latency and cache performance, but traditional approaches to these problems usually fail: Pointer-intensive programs are often highly-irregular and the compiler has little control over the layout of heap al- located objects. This thesis presents a new class of techniques named “Macroscopic Data Structure Analyses and Optimizations”, which is a new approach to the problem of analyzing and optimizing pointer- intensive programs. Instead of analyzing individual load/store operations or structure definitions, this approach identifies, analyzes, and transforms entire memory structures as a unit. The foun- dation of the approach is an analysis named Data Structure Analysis and a transformation named Automatic Pool Allocation. Data Structure Analysis is a context-sensitive pointer analysis which identifies data structures on the heap and their important properties (such as type safety). Auto- matic Pool Allocation uses the results of Data Structure Analysis to segregate dynamically allocated objects on the heap, giving control over the layout of the data structure in memory to the compiler. Based on these two foundation techniques, this thesis describes several performance improv- ing optimizations for pointer-intensive programs. First, Automatic Pool Allocation itself provides important locality improvements for the program. Once the program is pool allocated, several pool-specific optimizations can be performed to reduce inter-object padding and pool overhead. Second, we describe an aggressive technique, Automatic Pointer Compression, which reduces the size of pointers on 64-bit targets to 32-bits or less, increasing effective cache capacity and memory bandwidth for pointer-intensive programs. This thesis describes the approach, analysis, and transformation of programs with macroscopic techniques, and evaluates the net performance impact of the transformations. Finally, it describes iii a large class of potential applications for the work in fields such as heap safety and reliability, program understanding, distributed computing, and static garbage collection. iv To Tanya, for her unwavering love and support. v Acknowledgments This thesis would not be possible without the support of many people who have helped me in ways both large and small. In particular, I would like to thank my advisor, Vikram Adve, for his support, patience, and especially his trust and respect. He taught me how to communicate ideas more effectively, gave me the freedom to investigate the (broad and sometimes strange) areas that interest me, and contributed many ideas to this work. Few other advisors would allow a motivated student to go off and build an entire compiler to support their research. My wife Tanya is an unshakable source of support, understanding, and love. She helped me make it through the occasionally grueling all nighters and other challenging parts of these last five years, selflessly supporting me even when under pressures from her own research work and job. In addition to support of my research, she continues to enrich my life as a whole. I have deeply enjoyed my interactions and friendships with the members of the LLVM research group as well as the open-source community we have built around LLVM. Both have provided important insights, hard problems, and a desire to make LLVM as stable, robust, and extensible as possible. Special thanks go to Misha Brukman for reading (and rereading) many of my papers with his particularly critical eye for incorrect-hyphenation and mispellings. I would like to thank the UIUC Classical Fencing Club as a whole, and John Mainzer and Luda Yafremava in particular, for absorbing many of the frustrations and craziness accumulated over the course of this work. They provided an important outlet and taught me physical awareness, flexibility, and dexterity that I did not think was possible. They are also responsible for keeping random workplace violence to an tolerable level, for which my colleagues are undoubtedly thankful! Finally, I would like to thank Steven Vegdahl, who encouraged me to pursue graduate studies and whose infectious love of compilers started me on this path in the first place. vi Table of Contents Chapter 1 Introduction . 1 1.1 Foundations of the Macroscopic Approach . 4 1.1.1 Data Structure Analysis . 4 1.1.2 Automatic Pool Allocation . 5 1.2 Applications of Macroscopic Techniques . 7 1.2.1 Simple Pool Allocation Optimizations . 7 1.2.2 Transparent Pointer Compression . 8 1.2.3 Other Macroscopic Techniques . 8 1.3 Research Contributions of this Thesis . 9 1.4 Thesis Organization . 10 Chapter 2 The LLVM Compiler Infrastructure . 11 2.1 Introduction . 11 2.2 Program Representation . 15 2.2.1 Overview of the LLVM Instruction Set . 15 2.2.2 Language-Independent Type Information, Cast, and GetElementPtr . 16 2.2.3 Explicit Memory Allocation and Unified Memory Model . 19 2.2.4 Function Calls and Exception Handling . 19 2.2.5 Plain-text, Binary, and In-memory Representations . 22 2.3 The LLVM Compiler Architecture . 23 2.3.1 High-Level Design of the LLVM Compiler Framework . 23 2.3.2 Compile-Time: External Front-end and Static Optimizer . 25 2.3.3 Linker & Interprocedural Optimizer . 26 2.3.4 Offline or JIT Native Code Generation . 27 2.3.5 Runtime Path Profiling & Reoptimization . 27 2.3.6 Offline Reoptimization with End-user Profile Information . 28 2.4 Applications and Experiences . 29 2.4.1 Representation Issues . 29 2.4.2 Example Applications of LLVM . 34 2.5 Related Work . 36 2.6 Conclusion . 39 Chapter 3 Data Structure Analysis . 40 3.1 The Data Structure Graph . 42 3.1.1 Graph Nodes and Fields . 45 3.2 Construction Algorithm . 49 3.2.1 Primitive Graph Operations . 49 vii 3.2.2 Local Analysis Phase . 51 3.2.3 Bottom-Up Analysis Phase . 53 3.2.4 Top-Down Analysis Phase . 59 3.2.5 Complexity Analysis . 60 3.2.6 Bounding Graph Size . 60 3.3 Engineering an Efficient Pointer Analysis . 61 3.3.1 The Globals Graph . 61 3.3.2 Efficient Graph Inlining . 63 3.3.3 Partitioning EV for Efficient Global Variable Iteration . 64 3.3.4 Shrinking EV with Global Value Equivalence Classes . 64 3.3.5 Avoiding N 2 Inlining for Function Pointers . 66 3.3.6 Merge Call Nodes for External Functions . 66 3.3.7 Direct Call Nodes . 67 3.4 Experimental Results . 67 3.4.1 Benchmark Suite and Simple Measurements . 67 3.4.2 Analysis Time & Memory Consumption . 70 3.4.3 Inferred Type Information . 71 3.5 Related Work . 74 3.5.1 Shape Analyses . 75 3.5.2 Cloning-based Context-Sensitive Analyses . 75 3.5.3 Non-cloning Context Sensitive Analyses . 76 3.6 Data Structure Analysis: Summary of Contributions . 77 Chapter 4 Using Data Structure Analysis for Alias and IP Mod/Ref Analysis . 79 4.1 Alias Analysis and Mod/Ref Information . 79 4.1.1 Alias Analysis Assumptions and Applications . 80 4.1.2 Mod/Ref Analysis Assumptions and Applications . 81 4.2 Implementing Alias and Mod/Ref Analysis with DSA: ds-aa . 82 4.2.1 Computing Alias Analysis Responses . 83 4.2.2 Computing Mod/Ref Responses . 84 4.3 Alias Analysis Implementations for Comparison . 87 4.3.1 local Alias Analysis . 87 4.3.2 steens-fi Alias Analysis . 88 4.3.3 steens-fs Alias Analysis . 88 4.3.4 anders Alias Analysis . 89 4.4 Analysis Precision with a Synthetic Client . 89 4.4.1 Alias Precision . 90 4.4.2 Mod/Ref Precision . 95 4.5 Analysis Precision with Scalar Loop Optimizations . 97 4.5.1 Number of Transformations Performed . 98 4.5.2 Alias and Mod/Ref Queries . 103 4.6 Observations and Conclusions . 104 Chapter 5 Automatic Pool Allocation . 108 5.1 The Transformation Overview and Example . 110 5.1.1 Pool Allocator Runtime Library . 111 5.1.2 Overview Using an Example . 113 viii 5.2 The Core Pool Allocation Transformation . 114 5.2.1 Analysis: Finding Pool Descriptors for each H Node . 114 5.2.2 The Simple Transformation (No Indirect Calls) . 115 5.2.3 Passing Descriptors for Indirect Function Calls . 118 5.3 Algorithmic Complexity . 120 5.4 Simple Pool Allocation Refinements . 121 5.4.1 Argument Passing for Global Pools . 121 5.4.2 poolcreate/pooldestroy Placement . 121 5.5 Experimental Results . 123 5.5.1 Methodology and Benchmarks . 123 5.5.2 Pool Allocation Statistics . 124 5.5.3.
Recommended publications
  • Generalized Jump Threading in Libfirm
    Institut für Programmstrukturen und Datenorganisation (IPD) Lehrstuhl Prof. Dr.-Ing. Snelting Generalized Jump Threading in libFIRM Masterarbeit von Joachim Priesner an der Fakultät für Informatik Erstgutachter: Prof. Dr.-Ing. Gregor Snelting Zweitgutachter: Prof. Dr.-Ing. Jörg Henkel Betreuender Mitarbeiter: Dipl.-Inform. Andreas Zwinkau Bearbeitungszeit: 5. Oktober 2016 – 23. Januar 2017 KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft www.kit.edu Zusammenfassung/Abstract Jump Threading (dt. „Sprünge fädeln“) ist eine Compileroptimierung, die statisch vorhersagbare bedingte Sprünge in unbedingte Sprünge umwandelt. Bei der Ausfüh- rung kann ein Prozessor bedingte Sprünge zunächst nur heuristisch mit Hilfe der Sprungvorhersage auswerten. Sie stellen daher generell ein Performancehindernis dar. Die Umwandlung ist insbesondere auch dann möglich, wenn das Sprungziel nur auf einer Teilmenge der zu dem Sprung führenden Ausführungspfade statisch be- stimmbar ist. In diesem Fall, der den überwiegenden Teil der durch Jump Threading optimierten Sprünge betrifft, muss die Optimierung Grundblöcke duplizieren, um jene Ausführungspfade zu isolieren. Verschiedene aktuelle Compiler enthalten sehr unterschiedliche Implementierungen von Jump Threading. In dieser Masterarbeit wird zunächst ein theoretischer Rahmen für Jump Threading vorgestellt. Sodann wird eine allgemeine Fassung eines Jump- Threading-Algorithmus entwickelt, implementiert und in diverser Hinsicht untersucht, insbesondere auf Wechselwirkungen mit anderen Optimierungen wie If
    [Show full text]
  • Stackwalkerapi Programmer's Guide
    Paradyn Parallel Performance Tools StackwalkerAPI Programmer’s Guide 9.2 Release June 2016 Computer Sciences Department University of Wisconsin–Madison Madison, WI 53706 Computer Science Department University of Maryland College Park, MD 20742 Email [email protected] Web www.dyninst.org Contents 1 Introduction 3 2 Abstractions 4 2.1 Stackwalking Interface . .5 2.2 Callback Interface . .5 3 API Reference 6 3.1 Definitions and Basic Types . .6 3.1.1 Definitions . .6 3.1.2 Basic Types . .9 3.2 Namespace StackwalkerAPI . 10 3.3 Stackwalking Interface . 10 3.3.1 Class Walker . 10 3.3.2 Class Frame . 14 3.4 Accessing Local Variables . 18 3.5 Callback Interface . 19 3.5.1 Default Implementations . 19 3.5.2 Class FrameStepper . 19 3.5.3 Class StepperGroup . 22 3.5.4 Class ProcessState . 24 3.5.5 Class SymbolLookup . 27 4 Callback Interface Default Implementations 27 4.1 Debugger Interface . 28 4.1.1 Class ProcDebug . 29 4.2 FrameSteppers . 31 4.2.1 Class FrameFuncStepper . 31 4.2.2 Class SigHandlerStepper . 32 4.2.3 Class DebugStepper . 33 1 4.2.4 Class AnalysisStepper . 33 4.2.5 Class StepperWanderer . 33 4.2.6 Class BottomOfStackStepper . 33 2 1 Introduction This document describes StackwalkerAPI, an API and library for walking a call stack. The call stack (also known as the run-time stack) is a stack found in a process that contains the currently active stack frames. Each stack frame is a record of an executing function (or function-like object such as a signal handler or system call).
    [Show full text]
  • Procedures & the Stack
    Procedures & The Stack Spring 2016 Procedures & The Stack Announcements ¢ Lab 1 graded § Late days § Extra credit ¢ HW 2 out ¢ Lab 2 prep in section tomorrow § Bring laptops! ¢ Slides HTTP://XKCD.COM/1270/ 1 Procedures & The Stack Spring 2016 Memory & data Roadmap Integers & floats C: Java: Machine code & C x86 assembly car *c = malloc(sizeof(car)); Car c = new Car(); Procedures & stacks c->miles = 100; c.setMiles(100); Arrays & structs c->gals = 17; c.setGals(17); float mpg = get_mpg(c); float mpg = Memory & caches free(c); c.getMPG(); Processes Virtual memory Assembly get_mpg: Memory allocation pushq %rbp Java vs. C language: movq %rsp, %rbp ... popq %rbp ret OS: Machine 0111010000011000 100011010000010000000010 code: 1000100111000010 110000011111101000011111 Computer system: 2 Procedures & The Stack Spring 2016 Mechanisms required for procedures ¢ Passing control P(…) { § To beginning oF procedure code • • § Back to return point y = Q(x); print(y) ¢ Passing data • § Procedure arguments } § Return value ¢ Memory management int Q(int i) § Allocate during procedure execution { § Deallocate upon return int t = 3*i; int v[10]; ¢ All implemented with machine • instructions • § An x86-64 procedure uses only those return v[t]; mechanisms required For that procedure } 3 Procedures & The Stack Spring 2016 Questions to answer about procedures ¢ How do I pass arguments to a procedure? ¢ How do I get a return value from a procedure? ¢ Where do I put local variables? ¢ When a function returns, how does it know where to return? ¢ To answer some of
    [Show full text]
  • MIPS Calling Convention
    MIPS Calling Convention CS 64: Computer Organization and Design Logic Lecture #9 Fall 2018 Ziad Matni, Ph.D. Dept. of Computer Science, UCSB Administrative • Lab #5 this week – due on Friday • Grades will be up on GauchoSpace today by noon! – If you want to review your exams, see your TAs: LAST NAMES A thru Q See Bay-Yuan (Th. 12:30 – 2:30 pm) LAST NAMES R thru Z See Harmeet (Th. 9:30 – 11:30 am) • Mid-quarter evaluations for T.As – Links on the last slide and will put up on Piazza too – Optional to do, but very appreciated by us all! 11/5/2018 Matni, CS64, Fa18 2 CS 64, Fall 18, Midterm Exam Average = 86.9% Median = 87% 26 22 8 5 3 11/5/2018 Matni, CS64, Fa18 3 Lecture Outline • MIPS Calling Convention – Functions calling functions – Recursive functions 11/5/2018 Matni, CS64, Fa18 4 Function Calls Within Functions… Given what we’ve said so far… • What about this code makes our previously discussed setup break? – You would need multiple copies of $ra • You’d have to copy the value of $ra to another register (or to mem) before calling another function • Danger: You could run out of registers! 11/5/2018 Matni, CS64, Fa18 5 Another Example… What about this code makes this setup break? • Can’t fit all variables in registers at the same time! • How do I know which registers are even usable without looking at the code? 11/5/2018 Matni, CS64, Fa18 6 Solution??!! • Store certain information in memory only at certain times • Ultimately, this is where the call stack comes from • So what (registers/memory) save what??? 11/5/2018 Matni, CS64, Fa18
    [Show full text]
  • UNDERSTANDING STACK ALIGNMENT in 64-BIT CALLING CONVENTIONS for X86-64 Assembly Programmers
    UNDERSTANDING STACK ALIGNMENT IN 64-BIT CALLING CONVENTIONS For x86-64 Assembly Programmers This article is part of CPU2.0 presentation Soffian Abdul Rasad Date : Oct 2nd, 2017 Updated: 2019 Note: This article is released in conjunction with CPU2.0 to enable users to use CPU2.0 correctly in regards to stack alignment. It does not present full stack programming materials. MS 64-bit Application Binary Interface (ABI) defines an x86_64 software calling convention known as the fastcall. System V also introduces similar software calling convention known as AMD64 calling convention. These calling conventions specify the standards and the requirements of the 64-bit APIs to enable codes to access and use them in a standardized and uniform fashion. This is the standard convention being employed by Win64 APIs and Linux64 respectively. While the two are radically different, they share similar stack alignment policy – the stack must be well-aligned to 16-byte boundaries prior to calling a 64-bit API. A memory address is said to be aligned to 16 if it is evenly divisible by 16, or if the last digit is 0, in hexadecimal notation. Please note that to simplify discussions, we ignore the requirement for shadow space or red zone when calling the 64-bit APIs for now. But still with or without the shadow space, the discussion below applies. Also note that the CPU honors no calling convention. Align 16 Ecosystem This requirement specifies that for the 64-bit APIs to function correctly it must work in an ecosystem or runtime environment in which the stack memory is always aligned to 16-byte boundaries.
    [Show full text]
  • Intermediate Representation and LLVM
    CS153: Compilers Lecture 6: Intermediate Representation and LLVM Stephen Chong https://www.seas.harvard.edu/courses/cs153 Contains content from lecture notes by Steve Zdancewic and Greg Morrisett Announcements •Homework 1 grades returned •Style •Testing •Homework 2: X86lite •Due Tuesday Sept 24 •Homework 3: LLVMlite •Will be released Tuesday Sept 24 Stephen Chong, Harvard University 2 Today •Continue Intermediate Representation •Intro to LLVM Stephen Chong, Harvard University 3 Low-Level Virtual Machine (LLVM) •Open-Source Compiler Infrastructure •see llvm.org for full documentation •Created by Chris Lattner (advised by Vikram Adve) at UIUC •LLVM: An infrastructure for Multi-stage Optimization, 2002 •LLVM: A Compilation Framework for Lifelong Program Analysis and Transformation, 2004 •2005: Adopted by Apple for XCode 3.1 •Front ends: •llvm-gcc (drop-in replacement for gcc) •Clang: C, objective C, C++ compiler supported by Apple •various languages: Swift, ADA, Scala, Haskell, … •Back ends: •x86 / Arm / PowerPC / etc. •Used in many academic/research projects Stephen Chong, Harvard University 4 LLVM Compiler Infrastructure [Lattner et al.] LLVM llc frontends Typed SSA backend like IR code gen 'clang' jit Optimizations/ Transformations Analysis Stephen Chong, Harvard University 5 Example LLVM Code factorial-pretty.ll define @factorial(%n) { •LLVM offers a textual %1 = alloca %acc = alloca store %n, %1 representation of its IR store 1, %acc •files ending in .ll br label %start start: %3 = load %1 factorial64.c %4 = icmp sgt %3, 0 br %4, label
    [Show full text]
  • Ryan Holland CSC415 Term Paper Final Draft Language: Swift (OSX)
    Ryan Holland CSC415 Term Paper Final Draft Language: Swift (OSX) Apple released their new programming language "Swift" on June 2nd of this year. Swift replaces current versions of objective-C used to program in the OSX and iOS environments. According to Apple, swift will make programming apps for these environments "more fun". Swift is also said to bring a significant boost in performance compared to the equivalent objective-C. Apple has not released an official document stating the reasoning behind the launch of Swift, but speculation is that it was a move to attract more developers to the iOS platform. Swift development began in 2010 by Chris Lattner, but would later be aided by many other programmers. Swift employs ideas from many languages into one, not only from objective-C. These languages include Rust, Haskell, Ruby, Python, C#, CLU mainly but many others are also included. Apple says that most of the language is built for Cocoa and Cocoa Touch. Apple claims that Swift is more friendly to new programmers, also claiming that the language is just as enjoyable to learn as scripting languages. It supports a feature known as playgrounds, that allow programmers to make modifications on the fly, and see results immediately without the need to build the entire application first. Apple controls most aspects of this language, and since they control so much of it, they keep a detailed records of information about the language on their developer site. Most information within this paper will be based upon those records. Unlike most languages, Swift has a very small set of syntax that needs to be shown compared to other languages.
    [Show full text]
  • Improving the Compilation Process Using Program Annotations
    POLITECNICO DI MILANO Corso di Laurea Magistrale in Ingegneria Informatica Dipartimento di Elettronica, Informazione e Bioingegneria Improving the Compilation process using Program Annotations Relatore: Prof. Giovanni Agosta Correlatore: Prof. Lenore Zuck Tesi di Laurea di: Niko Zarzani, matricola 783452 Anno Accademico 2012-2013 Alla mia famiglia Alla mia ragazza Ai miei amici Ringraziamenti Ringrazio in primis i miei relatori, Prof. Giovanni Agosta e Prof. Lenore Zuck, per la loro disponibilità, i loro preziosi consigli e il loro sostegno. Grazie per avermi seguito sia nel corso della tesi che della mia carriera universitaria. Ringrazio poi tutti coloro con cui ho avuto modo di confrontarmi durante la mia ricerca, Dr. Venkat N. Venkatakrishnan, Dr. Rigel Gjomemo, Dr. Phu H. H. Phung e Giacomo Tagliabure, che mi sono stati accanto sin dall’inizio del mio percorso di tesi. Voglio ringraziare con tutto il cuore la mia famiglia per il prezioso sup- porto in questi anni di studi e Camilla per tutto l’amore che mi ha dato anche nei momenti più critici di questo percorso. Non avrei potuto su- perare questa avventura senza voi al mio fianco. Ringrazio le mie amiche e i miei amici più cari Ilaria, Carolina, Elisa, Riccardo e Marco per la nostra speciale amicizia a distanza e tutte le risate fatte assieme. Infine tutti i miei conquilini, dai più ai meno nerd, per i bei momenti pas- sati assieme. Ricorderò per sempre questi ultimi anni come un’esperienza stupenda che avete reso memorabile. Mi mancherete tutti. Contents 1 Introduction 1 2 Background 3 2.1 Annotated Code . .3 2.2 Sources of annotated code .
    [Show full text]
  • Topic 1D Slides
    Topic I (d): Static Single Assignment Form (SSA) 621-10F/Topic-1d-SSA 1 Reading List Slides: Topic Ix Other readings as assigned in class 621-10F/Topic-1d-SSA 2 ABET Outcome Ability to apply knowledge of SSA technique in compiler optimization An ability to formulate and solve the basic SSA construction problem based on the techniques introduced in class. Ability to analyze the basic algorithms using SSA form to express and formulate dataflow analysis problem A Knowledge on contemporary issues on this topic. 621-10F/Topic-1d-SSA 3 Roadmap Motivation Introduction: SSA form Construction Method Application of SSA to Dataflow Analysis Problems PRE (Partial Redundancy Elimination) and SSAPRE Summary 621-10F/Topic-1d-SSA 4 Prelude SSA: A program is said to be in SSA form iff Each variable is statically defined exactly only once, and each use of a variable is dominated by that variable’s definition. So, is straight line code in SSA form ? 621-10F/Topic-1d-SSA 5 Example 푿ퟏ In general, how to transform an arbitrary program into SSA form? Does the definition of X 푿ퟐ 2 dominate its use in the example? 푿ퟑ 흓 (푿ퟏ, 푿ퟐ) 푿 ퟒ 621-10F/Topic-1d-SSA 6 SSA: Motivation Provide a uniform basis of an IR to solve a wide range of classical dataflow problems Encode both dataflow and control flow information A SSA form can be constructed and maintained efficiently Many SSA dataflow analysis algorithms are more efficient (have lower complexity) than their CFG counterparts. 621-10F/Topic-1d-SSA 7 Algorithm Complexity Assume a 1 GHz machine, and an algorithm that takes f(n) steps (1 step = 1 nanosecond).
    [Show full text]
  • A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion
    A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion Xiaoru Dai, Antonia Zhai, Wei-Chung Hsu, Pen-Chung Yew Department of Computer Science and Engineering University of Minnesota Minneapolis, MN 55455 {dai, zhai, hsu, yew}@cs.umn.edu Abstract obtaining precise data dependence analysis is both difficult and expensive for languages such as C in which Data speculative optimization refers to code dynamic and pointer-based data structures are frequently transformations that allow load and store instructions to used. When the data dependence analysis is unable to be moved across potentially dependent memory show that there is definitely no data dependence between operations. Existing research work on data speculative two memory references, the compiler must assume that optimizations has mainly focused on individual code there is a data dependence between them. It is quite often transformation. The required speculative analysis that that such an assumption is overly conservative. The identifies data speculative optimization opportunities and examples in Figure 1 illustrate how such conservative data the required recovery code generation that guarantees the dependences may affect compiler optimizations. correctness of their execution are handled separately for each optimization. This paper proposes a new compiler S1: = *q while ( p ){ while( p ){ framework to facilitate the design and implementation of S2: *p= b 1 S1: if (p->f==0) S1: if (p->f == 0) general data speculative optimizations such as dead store
    [Show full text]
  • Control Flow Graphs
    CONTROL FLOW GRAPHS PROGRAM ANALYSIS AND OPTIMIZATION – DCC888 Fernando Magno Quintão Pereira [email protected] The material in these slides have been taken from the "Dragon Book", Secs 8.5 – 8.7, by Aho et al. Intermediate Program Representations • Optimizing compilers and human beings do not see the program in the same way. – We are more interested in source code. – But, source code is too different from machine code. – Besides, from an engineering point of view, it is better to have a common way to represent programs in different languages, and target different architectures. Fortran PowerPC COBOL x86 Front Back Optimizer Lisp End End ARM … … Basic Blocks and Flow Graphs • Usually compilers represent programs as control flow graphs (CFG). • A control flow graph is a directed graph. – Nodes are basic blocks. – There is an edge from basic block B1 to basic block B2 if program execution can flow from B1 to B2. • Before defining basic void identity(int** a, int N) { int i, j; block, we will illustrate for (i = 0; i < N; i++) { this notion by showing the for (j = 0; j < N; j++) { a[i][j] = 0; CFG of the function on the } right. } for (i = 0; i < N; i++) { What does a[i][i] = 1; this program } do? } The Low Level Virtual Machine • We will be working with a compilation framework called The Low Level Virtual Machine, or LLVM, for short. • LLVM is today the most used compiler in research. • Additionally, this compiler is used in many important companies: Apple, Cray, Google, etc. The front-end Machine independent Machine dependent that parses C optimizations, such as optimizations, such into bytecodes constant propagation as register allocation ../0 %"&'( *+, ""% !"#$% !"#$)% !"#$)% !"#$- Using LLVM to visualize a CFG • We can use the opt tool, the LLVM machine independent optimizer, to visualize the control flow graph of a given function $> clang -c -emit-llvm identity.c -o identity.bc $> opt –view-cfg identity.bc • We will be able to see the CFG of our target program, as long as we have the tool DOT installed in our system.
    [Show full text]
  • Effective Representation of Aliases and Indirect Memory Operations in SSA Form
    Effective Representation of Aliases and Indirect Memory Operations in SSA Form Fred Chow, Sun Chan, Shin-Ming Liu, Raymond Lo, Mark Streich Silicon Graphics Computer Systems 2011 N. Shoreline Blvd. Mountain View, CA 94043 Contact: Fred Chow (E-mail: [email protected], Phone: USA (415) 933-4270) Abstract. This paper addresses the problems of representing aliases and indirect memory operations in SSA form. We propose a method that prevents explosion in the number of SSA variable versions in the presence of aliases. We also present a technique that allows indirect memory operations to be globally commonized. The result is a precise and compact SSA representation based on global value numbering, called HSSA, that uniformly handles both scalar variables and indi- rect memory operations. We discuss the capabilities of the HSSA representation and present measurements that show the effects of implementing our techniques in a production global optimizer. Keywords. Aliasing, Factoring dependences, Hash tables, Indirect memory oper- ations, Program representation, Static single assignment, Value numbering. 1 Introduction The Static Single Assignment (SSA) form [CFR+91] is a popular and efficient represen- tation for performing analyses and optimizations involving scalar variables. Effective algorithms based on SSA have been developed to perform constant propagation, redun- dant computation detection, dead code elimination, induction variable recognition, and others [AWZ88, RWZ88, WZ91, Wolfe92]. But until now, SSA has only been used mostly for distinct variable names in the program. When applied to indirect variable constructs, the representation is not straight-forward, and results in added complexities in the optimization algorithms that operate on the representation [CFR+91, CG93, CCF94].
    [Show full text]