Pro Le-Guided Automatic Inline Expansion for C Programs

Total Page:16

File Type:pdf, Size:1020Kb

Pro Le-Guided Automatic Inline Expansion for C Programs Pro le-Guided Automatic Inline Expansion for C Programs Pohua P. Chang, Scott A. Mahlke, William Y. Chen and Wen-mei W. Hwu Center for Reliable and High-p erformance Computing Co ordinated Science Lab oratory University of Illinois, Urbana-Champaign SUMMARY This pap er describ es critical implementation issues that must b e addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program mo di cation. An automatic inter- le inliner that uses pro le information has b een implemented and integrated into an optimizing C compiler. The exp erimental results show that this inliner achieves signi cant sp eedups for pro duction C programs. Keywords: Inline expansion, C compiler, Co de optimization, Pro le information INTRODUCTION Large computing tasks are often divided into many smaller subtasks which can b e more easily develop ed and understo o d. Function de nition and invo cation in high level languages provide a natural means to de ne and co ordinate subtasks to p erform the original task. Structured pro- gramming techniques therefore encourage the use of functions. Unfortunately, function invo cation disrupts compile-time co de optimizations such as register allo cation, co de compaction, common sub expression elimination, constant propagation, copy propagation, and dead co de removal. Emer and Clark rep orted, for a comp osite VAX workload, 4.5 of all dynamic instructions are function calls and returns [1]. If we assume equal numb ers of call and return instructions, the ab ove numb er indicates that there is a function call instruction for every 44 instructions executed. Eicke- meyer and Patel rep orted a dynamic call frequency of one out of every 27 to 130 VAX instructions 1 To app ear - Software Practice & Exp erience, John-Wiley 2 [2]. Gross, et al., cited a dynamic call frequency of one out of every 25 to 50 MIPS instructions [3]. Patterson and Sequin rep orted that function call is the most costly source language statement [4]. All these previous results argue for an e ective approach to reducing function call costs. Inline function expansion or simply inl ining replaces a function call with the function b o dy. Inline function expansion removes the function call/return costs and provides enlarged and sp e- cialized functions to the co de optimizers. In a recent study, Allen and Johnson identi ed inline expansion as an essential part of a vectorizing C compiler [5]. Schei er implemented an inliner that takes advantage of pro le information in making inlining decisions for the CLU programming language. Exp erimental results, including function invo cation reduction, execution time reduction, and co de size expansion, were rep orted based on four programs written in CLU [6]. Several co de improving techniques may b e applicable after inline expansion. These include register allo cation, co de scheduling, common sub expression elimination, constant propagation, and dead co de elimination. Richardson and Ganapathi have discussed the e ect of inline expansion and co de optimization across functions [7]. Many optimizing compilers can p erform inline expansion. For example, the IBM PL.8 compiler do es inline expansion of low-level intrinsic functions [8]. In the GNU C compiler, the programmers can use the keyword inl ine as a hint to the compiler for inline expanding function calls [9]. In the Stanford MIPS C compiler, the compiler examines the co de structure e.g. lo ops to cho ose the function calls for inline expansion [10]. Parafrase has an inline expander based on program structure analysis to increase the exp osed program parallelism [11]. It should b e noted that the careful use of the macro expansion and language prepro cessing utilities has the same e ect as inline expansion, where inline expansion decisions are made entirely by the programmers. Davidson and Holler have develop ed an automatic source-to-source inliner for C [12] [13]. Be- To app ear - Software Practice & Exp erience, John-Wiley 3 cause their inliner works on the C source program level, many existing C programs for various computer systems can b e optimized by their inliner. The e ectiveness of their inliner has b een con rmed by strong exp erimental data collected for several machine architectures. In the pro cess of developing an optimizing C compiler, we decided to allo cate ab out 6 man- months to construct a pro le-guided automatic inliner [14]. We exp ect that an inliner can enlarge the scop e of co de optimization and co de scheduling, and eliminate a large p ercentage of function calls. In this pap er, we describ e the ma jor implementation issues regarding a fully automatic inliner for C, and our design decisions. Wehave implemented the inliner and integrated it into our prototyp e C compiler. The inliner consists of approximately 5200 lines of commented C co de, not including the pro ler that is used to collect pro le data. The inliner is a part of a p ortable C compiler front-end that has b een p orted to Sun3, Sun4 and DEC-3100 workstations running UNIX op erating systems. CRITICAL IMPLEMENTATION ISSUES The basic idea of inlinin g is simple. Most of the diculties are due to hazards, missing infor- mation, and reducing the compilation time. Wehave identi ed the following critical issues of inline expansion: 1 Where should inline expansion b e p erformed in the compilation pro cess? 2 What data structure should b e employed to represent programs? 3 How can hazards b e avoided? 4 How should the sequence of inlinin g b e controlled to reduce compilation cost? 5 What program mo di cations are made for inlining a function call? A static function cal l site or simply cal l site refers to a function invo cation sp eci ed by the To app ear - Software Practice & Exp erience, John-Wiley 4 static program. A function cal l is the activityofinvoking a particular function from a particular call site. A dynamic function cal l is an executed function call. If a call site can p otentially invoke more than one function, the call site has more than one function call asso ciated with it. This is usually due to the use of the call-through-p ointer feature provided in some programming languages. The caller of a function call is the function which contains the call site of that function call. The callee of a function call is the function invoked by the function call. a.c b.c c.c d.c parsing parsing parsing parsing compiler optimization opt opt opt co de gen co de generation co de gen co de gen b.o c.o a.o d.o resolving inter-ob ject linker le reference executable image Figure 1: Separate compilation paradigm. Integration into the compilation pro cess: The rst issue regarding inline function expansion is where inlining should b e p erformed in the translation pro cess. In most traditional program devel- opmentenvironments, the source les of a program are separately compiled into their corresp onding ob ject les b efore b eing linked into an executable le see Figure 1. The compile time is de ned as the p erio d of time when the source les are indep endently translated into ob ject les. The link To app ear - Software Practice & Exp erience, John-Wiley 5 time is de ned as the duration when the ob ject les are combined into an executable le. Most of the optimizations are p erformed at compile time, whereas only a minimal amountofwork to link the ob ject les together is p erformed at link time. This simple two-stage translation paradigm is frequently referred to as the separate compilation paradigm. A ma jor advantage of the separate compilation paradigm is that when one of the source les is mo di ed, only the corresp onding ob ject le needs to b e regenerated b efore linking the ob ject les into the new executable le, leaving all the other ob ject les intact. Because most of the translation work is p erformed at compile time, separate compilation greatly reduces the cost of program recompilation when only a small numb er of source les are mo di ed. Therefore, the two- stage separate compilation paradigm is the most attractive for program developmentenvironments where programs are frequently recompiled and usually a small numb er of source les are mo di ed between each recompilation. There are programming to ols, such as the UNIX make program, to exploit this advantage. Our extension to the separate compilation paradigm to allow inlining at compile time is illus- trated in Figure 2. Performing inline function expansion b efore the co de optimization steps ensures that these co de optimization steps b ene t from inlining. For example, functions are often created as generic mo dules to b e invoked for a variety of purp oses. Inlining a function call places the b o dy of the corresp onding function into a sp eci c invo cation, which eliminates the need to cover the service required by the other callers. Therefore, optimizations such as constant propagation, 1 constant folding, and dead co de removal can b e exp ected to b e more e ective with inlining. Performing inline function expansion at compile time requires the callee function source or 1 Note that it is p ossible to do link-time inline expansion and still p erform global optimization . In the MIPS compiler, for instance, a link phase with function inlini ng can optionall y o ccur b efore the optimizer and subsequent compilation phases [15].
Recommended publications
  • Loop Pipelining with Resource and Timing Constraints
    UNIVERSITAT POLITÈCNICA DE CATALUNYA LOOP PIPELINING WITH RESOURCE AND TIMING CONSTRAINTS Autor: Fermín Sánchez October, 1995 1 1 1 1 1 1 1 1 2 1• SOFTWARE PIPELINING 1 1 1 2.1 INTRODUCTION 1 Software pipelining [ChaSl] comprises a family of techniques aimed at finding a pipelined schedule of the execution of loop iterations. The pipelined schedule represents a new loop body which may contain instructions belonging to different iterations. The sequential execution of the schedule takes less time than the sequential execution of the iterations of the loop as they were initially written. 1 In general, a pipelined loop schedule has the following characteristics1: • All the iterations (of the new loop) are executed in the same fashion. • The initiation interval (ÍÍ) between the issuing of two consecutive iterations is always the same. 1 Figure 2.1 shows an example of software pipelining a loop. The DDG representing the loop body to pipeline is presented in Figure 2.1(a). The loop must be executed ten times (with an iteration index i G [0,9]). Let us assume that all instructions in the loop are executed in a single cycle 1 and a new iteration may start every cycle (otherwise the dependence between instruction A from 1 We will assume here that the schedule contains a unique iteration of the loop. 1 15 1 1 I I 16 CHAPTER 2 I I time B,, Prol AO Q, BO A, I Q, B, A2 3<i<9 I Steady State D4 B7 I Epilogue Cy I D9 (a) (b) (c) I Figure 2.1 Software pipelining a loop (a) DDG representing a loop body (b) Parallel execution of the loop (c) New parallel loop body I iterations i and i+ 1 will not be honored).- With this assumption,-the loop can be executed in a I more parallel fashion than the sequential one, as shown in Figure 2.1(b).
    [Show full text]
  • A Methodology for Assessing Javascript Software Protections
    A methodology for Assessing JavaScript Software Protections Pedro Fortuna A methodology for Assessing JavaScript Software Protections Pedro Fortuna About me Pedro Fortuna Co-Founder & CTO @ JSCRAMBLER OWASP Member SECURITY, JAVASCRIPT @pedrofortuna 2 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Agenda 1 4 7 What is Code Protection? Testing Resilience 2 5 Code Obfuscation Metrics Conclusions 3 6 JS Software Protections Q & A Checklist 3 What is Code Protection Part 1 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Intellectual Property Protection Legal or Technical Protection? Alice Bob Software Developer Reverse Engineer Sells her software over the Internet Wants algorithms and data structures Does not need to revert back to original source code 5 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Intellectual Property IP Protection Protection Legal Technical Encryption ? Trusted Computing Server-Side Execution Obfuscation 6 A methodology for Assessing JavaScript Software Protections Pedro Fortuna Code Obfuscation Obfuscation “transforms a program into a form that is more difficult for an adversary to understand or change than the original code” [1] More Difficult “requires more human time, more money, or more computing power to analyze than the original program.” [1] in Collberg, C., and Nagra, J., “Surreptitious software: obfuscation, watermarking, and tamperproofing for software protection.”, Addison- Wesley Professional, 2010. 7 A methodology for Assessing
    [Show full text]
  • A Deep Dive Into the Interprocedural Optimization Infrastructure
    Stes Bais [email protected] Kut el [email protected] Shi Oku [email protected] A Deep Dive into the Luf Cen Interprocedural [email protected] Hid Ue Optimization Infrastructure [email protected] Johs Dor [email protected] Outline ● What is IPO? Why is it? ● Introduction of IPO passes in LLVM ● Inlining ● Attributor What is IPO? What is IPO? ● Pass Kind in LLVM ○ Immutable pass Intraprocedural ○ Loop pass ○ Function pass ○ Call graph SCC pass ○ Module pass Interprocedural IPO considers more than one function at a time Call Graph ● Node : functions ● Edge : from caller to callee A void A() { B(); C(); } void B() { C(); } void C() { ... B C } Call Graph SCC ● SCC stands for “Strongly Connected Component” A D G H I B C E F Call Graph SCC ● SCC stands for “Strongly Connected Component” A D G H I B C E F Passes In LLVM IPO passes in LLVM ● Where ○ Almost all IPO passes are under llvm/lib/Transforms/IPO Categorization of IPO passes ● Inliner ○ AlwaysInliner, Inliner, InlineAdvisor, ... ● Propagation between caller and callee ○ Attributor, IP-SCCP, InferFunctionAttrs, ArgumentPromotion, DeadArgumentElimination, ... ● Linkage and Globals ○ GlobalDCE, GlobalOpt, GlobalSplit, ConstantMerge, ... ● Others ○ MergeFunction, OpenMPOpt, HotColdSplitting, Devirtualization... 13 Why is IPO? ● Inliner ○ Specialize the function with call site arguments ○ Expose local optimization opportunities ○ Save jumps, register stores/loads (calling convention) ○ Improve instruction locality ● Propagation between caller and callee ○ Other passes would benefit from the propagated information ● Linkage
    [Show full text]
  • Attacking Client-Side JIT Compilers.Key
    Attacking Client-Side JIT Compilers Samuel Groß (@5aelo) !1 A JavaScript Engine Parser JIT Compiler Interpreter Runtime Garbage Collector !2 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !3 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !4 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !5 A JavaScript Engine • Parser: entrypoint for script execution, usually emits custom Parser bytecode JIT Compiler • Bytecode then consumed by interpreter or JIT compiler • Executing code interacts with the Interpreter runtime which defines the Runtime representation of various data structures, provides builtin functions and objects, etc. Garbage • Garbage collector required to Collector deallocate memory !6 Agenda 1. Background: Runtime Parser • Object representation and Builtins JIT Compiler 2. JIT Compiler Internals • Problem: missing type information • Solution: "speculative" JIT Interpreter 3.
    [Show full text]
  • Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler
    MATEC Web of Conferences 228, 03008 (2018) https://doi.org/10.1051/matecconf/201822803008 CAS 2018 Research of Register Pressure Aware Loop Unrolling Optimizations for Compiler Xuehua Liu1,2, Liping Ding1,3 , Yanfeng Li1,2 , Guangxuan Chen1,4 , Jin Du5 1Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Digital Forensics Lab, Institute of Software Application Technology, Guangzhou & Chinese Academy of Sciences, Guangzhou, China 4Zhejiang Police College, Hangzhou, China 5Yunnan Police College, Kunming, China Abstract. Register pressure problem has been a known problem for compiler because of the mismatch between the infinite number of pseudo registers and the finite number of hard registers. Too heavy register pressure may results in register spilling and then leads to performance degradation. There are a lot of optimizations, especially loop optimizations suffer from register spilling in compiler. In order to fight register pressure and therefore improve the effectiveness of compiler, this research takes the register pressure into account to improve loop unrolling optimization during the transformation process. In addition, a register pressure aware transformation is able to reduce the performance overhead of some fine-grained randomization transformations which can be used to defend against ROP attacks. Experiments showed a peak improvement of about 3.6% and an average improvement of about 1% for SPEC CPU 2006 benchmarks and a peak improvement of about 3% and an average improvement of about 1% for the LINPACK benchmark. 1 Introduction fundamental transformations, because it is not only performed individually at least twice during the Register pressure is the number of hard registers needed compilation process, it is also an important part of to store values in the pseudo registers at given program optimizations like vectorization, module scheduling and point [1] during the compilation process.
    [Show full text]
  • TASKING VX-Toolset for Tricore User Guide
    TASKING VX-toolset for TriCore User Guide MA160-800 (v6.1) September 14, 2016 Copyright © 2016 Altium Limited. All rights reserved.You are permitted to print this document provided that (1) the use of such is for personal use only and will not be copied or posted on any network computer or broadcast in any media, and (2) no modifications of the document is made. Unauthorized duplication, in whole or part, of this document by any means, mechanical or electronic, including translation into another language, except for brief excerpts in published reviews, is prohibited without the express written permission of Altium Limited. Unauthorized duplication of this work may also be prohibited by local statute. Violators may be subject to both criminal and civil penalties, including fines and/or imprisonment. Altium®, TASKING®, and their respective logos are registered trademarks of Altium Limited or its subsidiaries. All other registered or unregistered trademarks referenced herein are the property of their respective owners and no trademark rights to the same are claimed. Table of Contents 1. C Language .................................................................................................................. 1 1.1. Data Types ......................................................................................................... 1 1.1.1. Half Precision Floating-Point ....................................................................... 3 1.1.2. Fractional Types .......................................................................................
    [Show full text]
  • CS153: Compilers Lecture 19: Optimization
    CS153: Compilers Lecture 19: Optimization Stephen Chong https://www.seas.harvard.edu/courses/cs153 Contains content from lecture notes by Steve Zdancewic and Greg Morrisett Announcements •HW5: Oat v.2 out •Due in 2 weeks •HW6 will be released next week •Implementing optimizations! (and more) Stephen Chong, Harvard University 2 Today •Optimizations •Safety •Constant folding •Algebraic simplification • Strength reduction •Constant propagation •Copy propagation •Dead code elimination •Inlining and specialization • Recursive function inlining •Tail call elimination •Common subexpression elimination Stephen Chong, Harvard University 3 Optimizations •The code generated by our OAT compiler so far is pretty inefficient. •Lots of redundant moves. •Lots of unnecessary arithmetic instructions. •Consider this OAT program: int foo(int w) { var x = 3 + 5; var y = x * w; var z = y - 0; return z * 4; } Stephen Chong, Harvard University 4 Unoptimized vs. Optimized Output .globl _foo _foo: •Hand optimized code: pushl %ebp movl %esp, %ebp _foo: subl $64, %esp shlq $5, %rdi __fresh2: movq %rdi, %rax leal -64(%ebp), %eax ret movl %eax, -48(%ebp) movl 8(%ebp), %eax •Function foo may be movl %eax, %ecx movl -48(%ebp), %eax inlined by the compiler, movl %ecx, (%eax) movl $3, %eax so it can be implemented movl %eax, -44(%ebp) movl $5, %eax by just one instruction! movl %eax, %ecx addl %ecx, -44(%ebp) leal -60(%ebp), %eax movl %eax, -40(%ebp) movl -44(%ebp), %eax Stephen Chong,movl Harvard %eax,University %ecx 5 Why do we need optimizations? •To help programmers… •They write modular, clean, high-level programs •Compiler generates efficient, high-performance assembly •Programmers don’t write optimal code •High-level languages make avoiding redundant computation inconvenient or impossible •e.g.
    [Show full text]
  • Handout – Dataflow Optimizations Assignment
    Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout – Dataflow Optimizations Assignment Tuesday, Mar 19 DUE: Thursday, Apr 4, 9:00 pm For this part of the project, you will add dataflow optimizations to your compiler. At the very least, you must implement global common subexpression elimination. The other optimizations listed below are optional. You may also wait until the next project to implement them if you are going to; there is no requirement to implement other dataflow optimizations in this project. We list them here as suggestions since past winners of the compiler derby typically implement each of these optimizations in some form. You are free to implement any other optimizations you wish. Note that you will be implementing register allocation for the next project, so you don’t need to concern yourself with it now. Global CSE (Common Subexpression Elimination): Identification and elimination of redun- dant expressions using the algorithm described in lecture (based on available-expression anal- ysis). See §8.3 and §13.1 of the Whale book, §10.6 and §10.7 in the Dragon book, and §17.2 in the Tiger book. Global Constant Propagation and Folding: Compile-time interpretation of expressions whose operands are compile time constants. See the algorithm described in §12.1 of the Whale book. Global Copy Propagation: Given a “copy” assignment like x = y , replace uses of x by y when legal (the use must be reached by only this def, and there must be no modification of y on any path from the def to the use).
    [Show full text]
  • Quaxe, Infinity and Beyond
    Quaxe, infinity and beyond Daniel Glazman — WWX 2015 /usr/bin/whoami Primary architect and developer of the leading Web and Ebook editors Nvu and BlueGriffon Former member of the Netscape CSS and Editor engineering teams Involved in Internet and Web Standards since 1990 Currently co-chair of CSS Working Group at W3C New-comer in the Haxe ecosystem Desktop Frameworks Visual Studio (Windows only) Xcode (OS X only) Qt wxWidgets XUL Adobe Air Mobile Frameworks Adobe PhoneGap/Air Xcode (iOS only) Qt Mobile AppCelerator Visual Studio Two solutions but many issues Fragmentation desktop/mobile Heavy runtimes Can’t easily reuse existing c++ libraries Complex to have native-like UI Qt/QtMobile still require c++ Qt’s QML is a weak and convoluted UI language Haxe 9 years success of Multiplatform OSS language Strong affinity to gaming Wide and vibrant community Some press recognition Dead code elimination Compiles to native on all But no native GUI… platforms through c++ and java Best of all worlds Haxe + Qt/QtMobile Multiplatform Native apps, native performance through c++/Java C++/Java lib reusability Introducing Quaxe Native apps w/o c++ complexity Highly dynamic applications on desktop and mobile Native-like UI through Qt HTML5-based UI, CSS-based styling Benefits from Haxe and Qt communities Going from HTML5 to native GUI completeness DOM dynamism in native UI var b: Element = document.getElementById("thirdButton"); var t: Element = document.createElement("input"); t.setAttribute("type", "text"); t.setAttribute("value", "a text field"); b.parentNode.insertBefore(t,
    [Show full text]
  • XL C/C++: Language Reference About This Document
    IBM XL C/C++ for Linux, V16.1.1 IBM Language Reference Version 16.1.1 SC27-8045-01 IBM XL C/C++ for Linux, V16.1.1 IBM Language Reference Version 16.1.1 SC27-8045-01 Note Before using this information and the product it supports, read the information in “Notices” on page 63. First edition This edition applies to IBM XL C/C++ for Linux, V16.1.1 (Program 5765-J13, 5725-C73) and to all subsequent releases and modifications until otherwise indicated in new editions. Make sure you are using the correct edition for the level of the product. © Copyright IBM Corporation 1998, 2018. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents About this document ......... v Chapter 4. IBM extension features ... 11 Who should read this document........ v IBM extension features for both C and C++.... 11 How to use this document.......... v General IBM extensions ......... 11 How this document is organized ....... v Extensions for GNU C compatibility ..... 15 Conventions .............. v Extensions for vector processing support ... 47 Related information ........... viii IBM extension features for C only ....... 56 Available help information ........ ix Extensions for GNU C compatibility ..... 56 Standards and specifications ........ x Extensions for vector processing support ... 58 Technical support ............ xi IBM extension features for C++ only ...... 59 How to send your comments ........ xi Extensions for C99 compatibility ...... 59 Extensions for C11 compatibility ...... 59 Chapter 1. Standards and specifications 1 Extensions for GNU C++ compatibility .... 60 Chapter 2. Language levels and Notices .............. 63 language extensions ......... 3 Trademarks .............
    [Show full text]
  • Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗
    Fast-Path Loop Unrolling of Non-Counted Loops to Enable Subsequent Compiler Optimizations∗ David Leopoldseder Roland Schatz Lukas Stadler Johannes Kepler University Linz Oracle Labs Oracle Labs Austria Linz, Austria Linz, Austria [email protected] [email protected] [email protected] Manuel Rigger Thomas Würthinger Hanspeter Mössenböck Johannes Kepler University Linz Oracle Labs Johannes Kepler University Linz Austria Zurich, Switzerland Austria [email protected] [email protected] [email protected] ABSTRACT Non-Counted Loops to Enable Subsequent Compiler Optimizations. In 15th Java programs can contain non-counted loops, that is, loops for International Conference on Managed Languages & Runtimes (ManLang’18), September 12–14, 2018, Linz, Austria. ACM, New York, NY, USA, 13 pages. which the iteration count can neither be determined at compile https://doi.org/10.1145/3237009.3237013 time nor at run time. State-of-the-art compilers do not aggressively optimize them, since unrolling non-counted loops often involves 1 INTRODUCTION duplicating also a loop’s exit condition, which thus only improves run-time performance if subsequent compiler optimizations can Generating fast machine code for loops depends on a selective appli- optimize the unrolled code. cation of different loop optimizations on the main optimizable parts This paper presents an unrolling approach for non-counted loops of a loop: the loop’s exit condition(s), the back edges and the loop that uses simulation at run time to determine whether unrolling body. All these parts must be optimized to generate optimal code such loops enables subsequent compiler optimizations. Simulat- for a loop.
    [Show full text]
  • Comparative Studies of Programming Languages; Course Lecture Notes
    Comparative Studies of Programming Languages, COMP6411 Lecture Notes, Revision 1.9 Joey Paquet Serguei A. Mokhov (Eds.) August 5, 2010 arXiv:1007.2123v6 [cs.PL] 4 Aug 2010 2 Preface Lecture notes for the Comparative Studies of Programming Languages course, COMP6411, taught at the Department of Computer Science and Software Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, QC, Canada. These notes include a compiled book of primarily related articles from the Wikipedia, the Free Encyclopedia [24], as well as Comparative Programming Languages book [7] and other resources, including our own. The original notes were compiled by Dr. Paquet [14] 3 4 Contents 1 Brief History and Genealogy of Programming Languages 7 1.1 Introduction . 7 1.1.1 Subreferences . 7 1.2 History . 7 1.2.1 Pre-computer era . 7 1.2.2 Subreferences . 8 1.2.3 Early computer era . 8 1.2.4 Subreferences . 8 1.2.5 Modern/Structured programming languages . 9 1.3 References . 19 2 Programming Paradigms 21 2.1 Introduction . 21 2.2 History . 21 2.2.1 Low-level: binary, assembly . 21 2.2.2 Procedural programming . 22 2.2.3 Object-oriented programming . 23 2.2.4 Declarative programming . 27 3 Program Evaluation 33 3.1 Program analysis and translation phases . 33 3.1.1 Front end . 33 3.1.2 Back end . 34 3.2 Compilation vs. interpretation . 34 3.2.1 Compilation . 34 3.2.2 Interpretation . 36 3.2.3 Subreferences . 37 3.3 Type System . 38 3.3.1 Type checking . 38 3.4 Memory management .
    [Show full text]