Freelance Graphics

AIM Compilation Products and Technology Compilers: Still going strong after ~50 years Kevin Stoodley IBM Distinguished Engineer Toronto Laboratory [email protected] April 1, 2003 © 2002 IBM Corporation 1 AIM Compilation Products Outline IBM Toronto Lab stuff Proebsting's and Moore's laws Compilers then and now What's new and interesting (at least in my little world) Proebsting's law revisited (and hopefully refuted) © 2002 IBM Corporation 2 AIM Compilation Products IBM Toronto Lab: Canada's Premier Software Development Facility © 2002 IBM Corporation 3 AIM Compilation Products IBM Toronto Lab: Canada's Premier Software Development Facility Largest software development facility in Canada Third largest lab in the IBM Software Group Worldwide Missions Data Management Application Development Tools ISO 9001:2000 Electronic Commerce Certified Leading e-business Technologies Relational Database Java Development Computer Language Compilers and Tools WebSphere Development Tools Media Design Globalization © 2002 IBM Corporation 4 AIM Compilation Products Compilation Products and Technology Group ~210 reg. staff (+ ~28 coop terms/yr) who cover development, test, support and information development 20 year organizational history in compiler technology Strong ties to IBM Research (TJ Watson, Haifa, Tokyo) Academic collaborations (U of A, U of T, Catalunya, Illinois) Designated core competency site in IBM for this technology Product missions: C/C++ p-Series running AIX and Linux C/C++ on z- and i-Series running zOS and OS/400 Fortran 95+ on p-Series running AIX and Linux Component missions: Java JIT compilers for servers (PowerPC, X86, IA64, 390) Java JIT compilers for embedded (PowerPC, X86, ARM, MIPS, SH4) XML parsers (numerous platforms) XSLT processors (numerous platforms) Miscellaneous pretty much any IBM compiler that needs an optimizing backend for a platform already supported (Cobol, PL/I, Pascal (!), etc) Secret stuff (but nothing that would impress my kids) © 2002 IBM Corporation 5 AIM Compilation Products Moore's Law "CPU performance doubles roughly every 18 months" Diverse sources of improvement lead to this simply expressed but remarkably long-standing and robust "result": Fabrication/process directly and indirectly, this is the "biggee" Low level circuit design High level design Macro architecture (memory nest, interconnect, etc) Micro architecture (out-of-order, superscalar, etc) Instruction Set Architecture (ISA) despite the "press", this is perhaps the weakest lever it's certainly the most controversial and prone to zealotry Compiler technology improvements © 2002 IBM Corporation 6 AIM Compilation Products Proebsting's Law (you can look this up in google) Compiler optimization R&D has led to a four-fold performance improvement over the last 36 years Roughly this is 4% a year (not quite as good as 60) Based on the (generous) observation of the ratio between the performance of an unoptimized (CPU bound) program and the same program compiled with full optimization Implication, according to Proebsting, is that we optimizing compiler researchers should turn our talents to more fruitful endeavours © 2002 IBM Corporation 7 AIM Compilation Products So, I didn't change jobs. Why not? Well, 4X is actually pretty good and it's what we signed up for Compilers don't directly get to manipulate the biggest levers (process, circuit design, logic design) Compiler's major levers ISA Micro-architecture Memory nest (cache geometry, latency, etc) 4X is very good bang for the buck semiconductor, CPU architecture and system design R&D spending dramatically outstrip compiler optimization R&D spending Those CPU people keep stealing/obviating compiler optimizer ideas and putting them into hardware out-of-order and speculative execution (twice now) instruction trace caches Important class of programs does get more than 4X Mostly scientific code © 2002 IBM Corporation 8 AIM Compilation Products 4X is a point in time, but the ground isn't stable Compiler has to keep re-winning that 4X improvement ISA changes CISC, RISC, VLIW, EPIC cool new instruction from some bright hardware person Micro-architectural changes in-order v out-of-order speculative execution branch prediction hints and caches instruction grouping Macro-architectural changes large pages pre-fetch streams High level language "improvements" polymorphism, inheritance, type opacity (OO) dynamic typing and loading lazy evaluation generics/templates the next big leap in costly, syntactic sugar © 2002 IBM Corporation 9 AIM Compilation Products We have to work pretty hard just to keep from falling behind! 1200 +12% 1000 * PA-8700 750MHz 800 SPARC 900MHz +21% * Itanium 800MHz 600 Pentium 2.2GHz Alpha 1.0GHz 400 Power4 1.3GHz * estimate Power4 + compiler 200 0 SPECint2000 SPECfp2000 © 2002 IBM Corporation 10 AIM Compilation Products Attaining high performance on Power4 class processors Significant challenges to achieving correctness and high performance Weakly consistent storage model nagging problems with correctness due to absent or subtly incorrect synchronization increased impact of synchronization sequences on overall performance even with lwsync Microarchitectural changes completely different modelling required for instruction scheduling instruction selection needs to be tuned to avoid previously common instructions which are "cracked" Miscellaneous large page support streams branch prediction hints Let's not even talk about IA64 © 2002 IBM Corporation 11 AIM Compilation Products And then there's the cost... Cost (order of Cost magnitude) Growth New Silicon Process $ 1 billion very fast New CPU Architecture $ 100 million fast New Compiler $ 2 million slow © 2002 IBM Corporation 12 AIM Compilation Products Inside a (static) Compilation (then) C source C++ source Fortran source C Front C++ Front Fortran End End Front End Wcode Wcode Wcode Driver Wcode Wcode TOBEY Back End Object Code © 2002 IBM Corporation 13 AIM Compilation Products Inside a (static) Compilation (now) C source C++ source Fortran source C Front C++ Front Fortran End End Front End Wcode Wcode Wcode++ Driver TPO ASTI Wcode Wcode Wcode TOBEY Wcode Back End Profile data Object Code © 2002 IBM Corporation 14 AIM Compilation Products Inside TPO Compile Time Optimization Wcode from FE Decode Control Flow Analysis Store Motion Constant Propagation Redundant Condition Elimination Control or Intraprocedural Copy Propagation Loop Normalization Alias Changed? Optimizations Alias Analysis Loop Unswitching Dead Store Elimination Loop Unrolling NSSA Loop Fusion Scalar Replacement Loop Loop Distribution Loop Parallelization Unimodular Trans Loop Vectorization Optimizations Unroll-and-jam Code Motion and Commoning NLOOPOPT Collection Wcode Encode to BE © 2002 IBM Corporation 15 AIM Compilation Products Data Flow Optimization Loop Simplify Build NVNSPFY NCPSPFY control flow Build SSA Value Constant Copy numbering propagation propagation Shadow NVNSSA NGCPSSA NCOPYSSA NSHADSSA analysis Store Dead code Derived motion elimination NDSTORES stores NFSM,NSMSSA NDSSSA Right Pointer number of alias Control flow shadows analysis or alias changed NRNSSSA NPTRSSA Partial Loop constant Loop unrolling normalization First time only propagation NPCPSSA NUNRTPO NLNORM © 2002 IBM Corporation 16 AIM Compilation Products Late Data Flow Optimizations Data Flow More Granular Optimization Aliasing Loop (MGA) NMGATPO Loop Idioms NIIDIOM Redundant condition elimination NRCE Loop unswitching NUNSW © 2002 IBM Corporation 17 AIM Compilation Products Loop Optimization Early Data Control Flow Optimization Data Flow Optimization Flow Loop Normalization Loop Nest Aggressive Copy Propagation Canonization Maximal Loop Fusion Parallel Loops Loop Nest Partitioning High Level Loop Interchange Transformations Loop Unroll and Jam Loop Parallelization Serial Parallel Loop Loops Outlining Inner Loop Unrolling Low Level Loop Vectorization Transformations Strength Reduction Redundancy Elimination Code Motion © 2002 IBM Corporation 18 AIM Compilation Products Loop Nest Canonization and Distribution Loop Early Loop Elimination Interchange NLOOPRM NEUNIMOD Data Loop Dependence Fusion Analysis NLOOPFUSE Find Gather for Reductions Blocks NREDFIND NGATHBLK Early Early Scalar Loop Commoning Replacement Distribution NECSE NESCALREP Node NLOOPDIST Splitting NNSPLIT © 2002 IBM Corporation 19 AIM Compilation Products Loop Nest Optimization (pre-Parallelization) Gather for Nests Loop NGATHNEST Distribution Temp Vector Vectorization Allocation NLOOPDIST NLOOPVECT Data Dependence Analysis Guard Code Folding Sinking NFOLDGRD NSINKCODE Loop Loop Unroll Loop Interchange and Jam Parallelization NUNIMOD NOUTUNROLL Prefetch Scalar NLOOPPAR Optimization Replacement NDUMLOAD NSCALREP © 2002 IBM Corporation 20 AIM Compilation Products Inside an Link-time Compilation Imports/Exports Object files Libraries Linker Options TPO Wcode partitions Driver TOBEY Object Files Linker Executable or shared library © 2002 IBM Corporation 21 AIM Compilation Products Inside TPO Link Time Optimization Symbol Backward parameter & global def/use Resolution Alias backward properties Analysis NMU# Call Graph LEVEL(2) copy and constant Completion Forward Data-flow propagation LEVEL(1) Analysis pointer alias analysis dead code elimination NFPASS Inlining closure of context Alias NINLINE sensitive LEVEL(0) Closure pointer alias relationships Data NINTPTR Coalescing invariant code motion Backward NGLOBCOAL common subexpression Data-flow elimination Analysis Function loop optimization Partitioning NBPASS © 2002 IBM Corporation

Freelance Graphics

Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ?

User-Directed Loop-Transformations in Clang

Elimination of Memory-Based Dependences For

A General Compilation Algorithm to Parallelize and Optimize Counted Loops with Dynamic Data-Dependent Bounds Jie Zhao, Albert Cohen

Foundations of Scientific Research

Polyhedral-Model Guided Loop-Nest Auto-Vectorization Konrad Trifunović, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen

Compiler Construction

Synthesis and Exploration of Loop Accelerators for Systems-On-A-Chip

Portable Section-Level Tuning of Compiler Parallelized Applications

Mipsprotm Fortran 77 Programmer's Guide

Unified Polyhedral Modeling of Temporal and Spatial Locality

Power and Energy Impact by Loop Transformations