Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Under the Hood of the GNU Compiler Collection Morgan Deters Ron Cytron [email protected] [email protected]

Programming Logic Group Distributed Object Computing Laboratory Technical University of Catalonia Washington University Barcelona, Spain St. Louis, Missouri

CopyrightCopyright is is held held by by the the author/owner(s). author/owner(s). PLDIPLDI’08,’08, June June 9, 9, 2008, 2008, Tuscon, Tuscon, Arizona, Arizona, USA USA Copyright ©Copyright Deters and Ron Cytron Morgan 2005–2008

Demystifying GCC Morgan Deters and Ron Cytron Tutorial Objectives • Introduce the internals of GCC 4.3.0 (March 2008) ƒ Java and ++ front-ends ƒ Optimizations ƒ Back-end structure • How to modify, or write your own ƒ Front end • New languages, new features ƒ Middle end • Analysis, optimization ƒ Back end • Machine-specific targets • How to debug/improve GCC PLDI 2008 2 Tuscon, Arizona Tutorial Objectives 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

CC G

Demystifying Introduction

What is GCC?

Why use GCC?

What does compilation with GCC look like?

PLDI 2008 3 Tuscon, Arizona 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron What is GCC ?

• A compiler for multiple languages… ƒ C ƒ C++ ƒ Java ƒ Objective-C/C++ ƒ FORTRAN ƒ Ada

PLDI 2008 4 Tuscon, Arizona Introduction 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron What is GCC ? • …supporting multiple targets arc arm avr bfin c4x cris crx fr30 frv h8300 i386 ia64 iq2000 m32c m32r m68hc11 m68k mcore mips mmix mn10300 mt pa pdp11 rs6000 s390 sh sparc stormy16 v850 vax xtensa TheseThese areare code code generators;generators;variantsvariants areare alsoalso supportedsupported ((e.g.e.g.powerpcpowerpc is is a a “variant” “variant” of of thethe rs6000 rs6000 code code generator) generator)

PLDI 2008 5 Tuscon, Arizona Introduction 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron What GCC is not

• GCC is not ƒ an assembler (see GNU binutils) ƒ a C (see glibc) ƒ a debugger (see gdb) ƒ an IDE

PLDI 2008 6 Tuscon, Arizona Introduction 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Why use GCC as an & platform?

• Research is immediately usable by everyone ƒ Large development community and user base ƒ GCC is a modern, practical compiler • multiple architectures, full standard languages, optimizations • debugging support • You can meet GCC halfway ƒ modular: hack some parts, rely on the others • Can incorporate bug fixes that come along ƒ minor version upgrades (e.g. 3.3.x Æ 3.4.x) – no big deal ƒ major version upgrades (e.g. 3.x Æ 4.x) – more of a pain • Need not maintain code indefinitely (if incorporated)

PLDI 2008 7 Tuscon, Arizona Introduction 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The GCC project and the GPL

• Open-source ƒ GNU General Public License (GPL) • Changes made to GCC source code or associated libraries must also be GPLed • However, compiler and libraries can be used/linked against in non-GPL development YourYour improvementsimprovements toto GCCGCC mustmust bebe open-source,open-source, butbut youryour customerscustomers needneed notnot open-sourceopen-source theirtheir programsprograms toto useuse youryour stuffstuff

PLDI 2008 8 Tuscon, Arizona Introduction 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Typical structure of GCC compilation

sourcesource programprogram

gcc/g++/gcj

assembly compiler assembly assembler linker compiler programprogram assembler linker

ELFELF object object

PLDI 2008 9 Tuscon, Arizona Introduction 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Inside the compiler

compiler (C, C++, Java)

parser / parser / treetree semantic RTLRTL passes passes semantic optimizationsoptimizations checkerchecker

targettarget arch arch gimplifiergimplifier expanderexpander instructioninstruction selection selection

trees RTL

PLDI 2008 10 Tuscon, Arizona Introduction 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

CC G

Demystifying GCC Basics

How do you build GCC?

How do you navigate the source tree?

PLDI 2008 11 Tuscon, Arizona 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron GCC Basics: Getting Started

• Requirements to build GCC ƒ usual suite of tools (C compiler, assembler/linker, GNU Make, tar, awk, POSIX shell) • For development ƒ GNU m4 and GNU autotools (autoconf/automake/libtool) ƒ gperf ƒ bison, flex ƒ autogen, guile, gettext, perl, Texinfo, diffutils, patch, … • Obtaining GCC sources ƒ gcc..org or local mirror (see gcc.gnu.org/mirrors.html) ƒ get gcc-core package, then language add-ons • gcc-java requires gcc-g++

PLDI 2008 12 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Source Metrics

• 492 Mbytes of material downloaded to build GCC • 2.7 Gbytes after build • As of 4.3.0 ƒ Need mpfr – Multiple precision floating point arithmetic ƒ Need gmp – Multiple precision integer arithmetic ƒ Need Eclipse – for Java front-end

PLDI 2008 13 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Building GCC from sources

• Configure it in a separate build directory from sources ƒ /path/to/source/directory/configure options… ƒ --prefix=install-location ƒ --enable-languages=comma-separated-language-list • To see the list of available languages: grep language= */config-lang.in ƒ --enable-checking • turns on sanity checks (especially on intermediate representation) • Build it ! ƒ Environment variables useful when debugging compiler/runtime • CFLAGS stage 1 flags (using host C compiler) • BOOT_CFLAGS stage 2 and stage 3 flags (using stage 1 GCC) • CFLAGS_FOR_TARGET flags for new GCC building target binaries • CXXFLAGS_FOR_TARGET flags for new GCC building libstdc++/others • GCJFLAGS flags for new GCC building Java runtime •‘-O0 –ggdb3’ is recommended when debugging

PLDI 2008 14 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Building GCC from sources

• Build it ! continued… ƒ make bootstrap (to bootstrap) or make (to not) • bootstrap useful when compiling with non-GCC host compiler • during development, non-bootstrap is faster and also better at recompiling just those sources that have changed ƒ use make’s -j option to speed things up on MP/dual core ƒ make bootstrap-lean • cleans up between stages, uses less disk ƒ make profiledbootstrap • faster compiler produced, but need GCC host • –j unsupported • Install it ! ƒ make install

PLDI 2008 15 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Building a cross-compiler

• Code generator can be built for any target ƒ runtime libraries then are built using that code generator

• Since GCC outputs assembly, you actually need a full cross development toolchain ƒ Dan Kegel’s crosstool automates a GNU/Linux cross chain for popular configurations: • Linux kernel headers • GNU binutils •glibc •gcc • see kegel.com/crosstool

PLDI 2008 16 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron GCC Basics: Getting Around

• Other tools recommended when hacking GCC

GNU Screen attach/reattach terminal sessions etags navigation to source definitions (emacs) ctags navigation to source definitions (vi) c++filt demangle C++/Java mangled symbols readelf decompose ELF files objdump object file dumper/disassembler gdb GNU debugger

PLDI 2008 17 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron GCC Drivers

• gcc, g++, gcj are drivers, not compilers ƒ They will execute (as appropriate): • compiler (cc1, cc1plus, jc1) • Java program main entry point generation (jvgenmain) • assembler (as) • linker (collect2)

• Differences between drivers include active #defines, default libraries, other behavior ƒ but can use any driver for any source language

PLDI 2008 18 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Most useful driver options for debugging

-E preprocess, don’t compile -S compile, don’t assemble -H verbose header inclusion -save-temps save temporary files -print-search-dirs print search paths -v verbose (see what the driver does) -g include debugging symbols

--help get command line help --version show full version info -dumpversion show minimal version info

PLDI 2008 19 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron For extra help

man gcc basic option assistance

info gcc using gcc in-depth; language extensions etc.

info gccint internals documentation

Top-level INSTALL directory in distribution provides help on configuring and building GCC

PLDI 2008 20 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Tour of GCC source

INSTALL configuration/installation documentation boehm-gc the Boehm garbage collector config architecture-specific configure fragments contrib contributed scripts gjar a replacement for the jar tool fixincludes source for a program to fix host header files when they aren't ANSI-compliant gcc the main compiler source include headers used by GCC (libiberty mostly) intl support for languages other than English

PLDI 2008 21 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Tour of GCC source, cont’d

libcpp source for C preprocessing library libffi Foreign Function Interface library (allows function callers and receivers to have different calling conventions) libiberty useful utility routines (symbol tables etc.) used by GCC and replacement functions for common things not provided by host libjava source for standard Java library libmudflap source for a pointer instrumentation library libstdc++-v3 source for standard C++ library maintainer-scripts utility scripts for GCC maintainers zlib compression library source

PLDI 2008 22 Tuscon, Arizona GCC Basics 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

CC G Front-end Middle-endMiddle-end Back-endBack-end

Demystifying The GCC Front-End

Option processing Controlling drivers and hooking up front-ends The C, C++, and Java front-ends The GENERIC high-level intermediate representation

PLDI 2008 23 Tuscon, Arizona 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The GCC Front-End

• gcc, g++, gcj driver entry point ƒ main (gcc/gcc.c)

• cc1, cc1plus, jc1 share a common entry point ƒ toplev_main (gcc/toplev.c) • actual main in gcc/main.c – just calls toplev_main() – can be overridden by front-end

PLDI 2008 24 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Command-line option processing

• In gcc/ directory common.opt option definitions opts.{c,h} common_handle_option() c-opts.c c_common_handle_option() c.opt C compiler option definitions java/lang.opt Java compiler option definitions java/lang.c java_handle_option()

• These are cc1, cc1plus, jc1 option handling routines ƒ drivers just pass on arguments as declared in spec files

PLDI 2008 25 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron common.opt

• Parsed by awk scripts at build time to generate options.c, options.h • Simple format ƒ Language specifications and option stanzas • Each option stanza contains 1. option name 2. space-separated options list 3. documentation string for --help output

PLDI 2008 26 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Properties of command-line options

• Available properties for use in .opt option spec files are

Common option is available for all front-ends Target option is target-specific Joined argument is mandatory and may be joined Separate argument is mandatory and may be separate JoinedOrMissing optional argument, must be joined if present RejectNegative there is not an associated “no-” option UInteger argument expected is a nonnegative integer Undocumented undocumented; do not include in --help output Report --fverbose-asm should report the state of this option

PLDI 2008 27 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Properties of options cont’d

Var(var-name)set var-name to true (or argument) if present VarExists do not define variable in resulting options.c Init(value) static initializer for variable Mask(name) associated with a bit in target_flags bit vector; MASK_name is automatically #defined to the bitmask; TARGET_name is automatically #defined as an expression that is 1 when the option is used, 0 when not InverseMask(other, [this]) option is inverse of another option with Mask(other); if this is given, #define TARGET_this. MaskExists don’t #define again; use for synonymous options Condition(cond) option permitted iff preprocessor cond is true

PLDI 2008 28 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Language-specific options

• gcc/c.opt, gcc/java/lang.opt, gcc/cp/lang.opt • Special processing in gcc/java/lang.c • Specify valid language-names as an option

PLDI 2008 29 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

Adding Command-Line Options

OOPSLA 2006 (revised) 30 Portland, Oregon The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Controlling the drivers: spec files gcc/gcc.c specs for gcc driver gcc/cp/lang-specs.h additional specs for g++ driver gcc/java/lang-specs.h additional specs for gcj driver

%{E|M|MM:%(trad_capable_cpp)%{E|M|MM:%(trad_capable_cpp) %(cpp_options)%(cpp_options) %(cpp_debug_options)}%(cpp_debug_options)} %{!E:%{!M:%{!MM:%{!E:%{!M:%{!MM: %{traditional|ftraditional:%{traditional|ftraditional: %eGNU%eGNU CC nono longerlonger supportssupports -traditional-traditional withoutwithout -E}-E} %{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp)%{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp) %(cpp_options)%(cpp_options) -o-o %{save-temps:%b.i}%{save-temps:%b.i} %{!save-temps:%g.i}%{!save-temps:%g.i} \n\n cc1cc1 -fpreprocessed-fpreprocessed %{save-temps:%b.i}%{save-temps:%b.i} %{!save-temps:%g.i}%{!save-temps:%g.i} %(cc1_options)}%(cc1_options)} %{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp:%{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp: cc1cc1 %(cpp_unique_options)%(cpp_unique_options) %(cc1_options)}}}%(cc1_options)}}} %{!fsyntax-only:%(invoke_as)}}}%{!fsyntax-only:%(invoke_as)}}} adapted from gcc/gcc.c gcc/gcc.c contains documentation on spec language Use -dumpspecs to see specifications

PLDI 2008 31 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The C front-end

• C front-end is in gcc/ directory ƒ parse entry point c_common_parse_file (c-opts.c) • workhorse is c_parse_file (c-parser.c)

c-common.def IR codes for C compiler c-common.c functions for C-like front-ends c-convert.c type conversion c-cppbuiltin.c built-in preprocessor #defines c-decl.c declaration handling c-dump.c IR-dumping c-errors.c pedantic warning issuance c-format.c format checking for printf-like functions c-gimplify.c lowering of IR (and documentation)

PLDI 2008 32 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The C front-end, cont’d

c-incpath.c include path generation for preprocessor c-lang.c language infrastructure, front-end hookups c-lex.c lexical analyzer (manually coded) c-objc-common.c some functions for C and Objective-C c-opts.c option processing, some init stuff c-parser.c parser (based on an old bison parser) c-pch.c precompiled header support c-ppoutput.c preprocessing-only support (-E option) c-pragma.c support for #pragma pack and #pragma weak c-pretty-print.c used to pretty-print expressions in error messages c-semantics.c statement list handling in IR c-typeck.c functions to build IR, type checks gccspec.c driver-specific tasks for gcc driver

PLDI 2008 33 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The C++ front-end

• In subdirectory gcc/cp/ ƒ same parse entry point as C compiler

call.c function/method invocation lookup and handling class.c building (the runtime artifacts of) classes etc. cp-gimplify.c IR lowering cp-lang.c language hooks for C++ front-end cp-objcp-common.c common bits for C++ and Objective-C++ cvt.c type conversion cxx-pretty-print.c C++ pretty-printer decl.c declaration and variable handling decl2.c additional declaration and variable handling dump.c IR dumping

PLDI 2008 34 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The C++ front-end, cont’d

error.c C++ error-reporting callbacks except.c C++ exception-handling support expr.c IR lowering for C++ friend.c C++ “friend” support init.c data initializers and constructors lex.c the C++ lexical analyzer mangle.c C++ method.c method handling; default constructor generation name-lookup.c context-aware name (type, var, namespace) lookup optimize.c constructor/destructor cloning parser.c the C++ parser pt.c parameterized type (template) support ptree.c IR pretty-printing repo.c C++ template repository support

PLDI 2008 35 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The C++ front-end, cont’d

rtti.c support for run-time type information search.c type search in the presence of multiple inheritance semantics.c semantic checking tree.c C++ front-end specific IR functionality typeck.c functionality dealing with types, conversion typeck2.c types, conversion, type errors g++spec.c driver-specific tasks for g++ driver

PLDI 2008 36 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The Java front-end

• In subdirectory gcc/java/ ƒ parse entry point java_parse_file (jcf-parse.c)

boehm.c per-type bitmask building for Boehm GC buffer.{c,h} expandable buffer data type builtins.c builtin/inline functions for Java (like Math.min()) check-init.c checks over IR for uninitialized variables class.c IR building of classes, class-references, vtables, etc. constants.c class file constant pool handling decl.c Java declaration support (misc.) except.c Java exception support expr.c Java expressions (misc.) gjavah.c source for gcjh program java-gimplify.c IR lowering

PLDI 2008 37 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The Java front-end

jcf-depend.c class file dependency tracking jcf-dump.c source for jcf-dump program jcf-io.c class file I/O utility functions jcf-parse.c entry point for compiling Java files jcf-path.c CLASSPATH-sensitive search jcf-reader.c generic, pluggable class file reader jcf-write.c class file writer jv-scan.c source for jv-scan program jvgenmain.c source for jvgenmain program jvspec.c Java option specs lang.c language hooks, options processing mangle.c symbol-mangling routines mangle_name.c symbol-mangling routines

PLDI 2008 38 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The Java front-end

parse-scan.y minimal, fast parser for syntax checking (GONE) parse.y Java (source-language) parser (GONE 4.3) resource.c Support for --resource option typeck.c routines related to types and type conversion verify-glue.c interface between verifier and compiler verify-impl.c bytecode verifier win32-host.c for Windows; case-sensitive filename matching zextract.c read class files from zip/jar archives keyword.gperf Java keyword specification

PLDI 2008 39 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

Multiple “front-ends” for Java (< 4.3.x) • common entry point at java_parse_file ƒ gcc/java/jcf-parse.c • compile .java Æ .o ƒ gcc/java/parse.y • compile .class Æ .o (or .jar Æ .so) ƒ gcc/java/expr.c (with gcc/java/jcf-reader.c) ƒ expand_byte_code, process_jvm_instruction • compile .java Æ .class (with –C option) ƒ gcc/java/parse.y with flag_emit_class_files set ƒ unusual back-end (as if syntax checking only) • In 4.3.x, ecj1 (Eclipse) used for .java Æ .class PLDI 2008 40 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The “treelang” front end: Essential front-end components • configure fragment (config-lang.in) • language-specific options (lang.opt) • filename handling for driver (lang-specs.h) • treelang-specific tree codes (treelang-tree.def) • front-end hookups to toplev.c (treetree.c) ƒ see gcc/langhooks.h for documentation • flex scanner (lex.l) • bison parser (parse.y) • structural functions (tree1.c)

PLDI 2008 41 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

Adding a new front-end to GCC

OOPSLA 2006 (revised) 42 Portland, Oregon The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron GENERIC trees

• Front-ends are written in C !

• We’d like to have… ƒ tree node base class • subclasses for expressions etc.

• Instead we have ƒ union tree_node (gcc/tree.h) • each field is a struct components of union

PLDI 2008 43 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Structs vs. unions

low memory struct union field 4 fieldfield 2 2 field 4 fieldfield 3 3 fieldfield 1 1 fieldfield 1 1

fieldfield 2 2 fields overlap in memory; you’re on your own for type safety ! fieldfield 3 3

fieldfield 4 4 high memory

PLDI 2008 44 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The tree_node union

Everything is a tree ! low memory union tree_node common

int_cst int_cst typetype identifieridentifier expexp … field_declfield_decl

typedef union tree_node *tree; high memory

PLDI 2008 45 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The tree_node union

• The “common” part contains ƒ code (kind of tree – declaration, expression, etc.) ƒ chain (for linking trees together) ƒ type (type of the represented item – also a tree) ƒ flags • side effects • addressable • access flags (used for other things in non-declarations) • 7 language-specific flags

PLDI 2008 46 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Macros for accessing tree parts

• In the common part ƒ TREE_* • TREE_CODE(tree) • TREE_TYPE(tree) • TREE_SIDE_EFFECTS(tree) etc. • For specific trees ƒ type trees •TYPE_* – TYPE_FIELDS(tree) gets a list of fields in the type – TYPE_NAME(tree) gets the type’s associated decl

PLDI 2008 47 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Expression trees

• Lots of tree codes used for expressions ƒ gcc/tree.def defines all standard tree codes ƒ LT_EXPR less-than conditional ƒ TRUTH_ORIF_EXPR short-circuiting OR conditional ƒ MODIFY_EXPR assignment ƒ NOP_EXPR type promotion (typically) ƒ SAVE_EXPR store in temporary for multiple uses ƒ ADDR_EXPR take address of • Front-end extensions to GENERIC permitted ƒ gcc/c-common.def ƒ gcc/cp/cp-tree.def e.g. DYNAMIC_CAST_EXPR ƒ gcc/java/java-tree.def e.g. SYNCHRONIZED_EXPR PLDI 2008 48 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron A few useful front-end functions

build() expression tree building – pass tree code, tree type, and (arbitrary number of) operands

fold() simple tree restructuring and optimization; mostly useful for constant folding

gcc_assert() assertion verification – if it fails it gives an “internal compiler error” report with source file and line number under compilation (as well as source file and line number in compiler code)

PLDI 2008 49 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Code naming conventions

• Preprocessor macros ALL UPPERCASE • Variables/functions all lowercase with underscores • Predicates end in “_P” or “_p” • Global flags start with “flag_” • Global trees (vary somewhat with front-end) ƒ null_node (or null_pointer_node) ƒ integer_zero_node ƒ void_type_node ƒ integer_unsigned_type_node (or unsigned_int_type_node) • Tree accessor macros FROM_TO (e.g. TYPE_DECL)

PLDI 2008 50 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

Modifying the front-end

OOPSLA 2006 (revised) 51 Portland, Oregon The GCC Front-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Gimplification

•GENERIC + extensions Æ GIMPLE ƒ GIMPLE is a subset of GENERIC ƒ based on SIMPLE from McGill’s McCAT group •GIMPLE is just like GENERIC but ƒ no language extensions • front-end gimplify_expr callback ƒ 3-address form (with temporary variables) ƒ control structures lowered to goto

PLDI 2008 52 Tuscon, Arizona The GCC Front-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

CC G Front-endFront-end Middle-end Back-endBack-end

Demystifying The GCC Middle-End

Optimization of trees

Static Single-Assignment form

The Register Transfer Language intermediate representation

PLDI 2008 53 Tuscon, Arizona 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron The middle-end in context Front-end Middle-end Gimplification TreeTree optimizationsoptimizations Gimplification TreeTreeTree optimizationsoptimizations optimizations

ExpansionExpansion into into RTL RTL

RTLRTL passespasses RTLRTLRTL passespasses passes RegisterRegister allocation allocation

RTLRTL passespasses RTLRTLRTL passespasses passes

Back-end

PLDI 2008 54 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Optimizations over the tree representation • Managed by pass manager in gcc/passes.c ƒ init_optimization_passes orders the passes ƒ passes represented by a tree_opt_pass struct (tree-pass.h) even though it does RTL now too • “gate” function – whether or not to run optimization • “execute” function – implementation of pass • property bitmaps – properties required, destroyed, and created • “todo” bitmaps – run internal GC, dump the tree, verify SSA form, etc.

PLDI 2008 55 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Passes and subpasses

• Passes can be used to group subpasses • all_passes contains all_optimization_passes ƒ all_optimization_passes has optimizations in order • pass_tree_loop contains loop optimizations

PLDI 2008 56 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

Adding a tree optimization pass

OOPSLA 2006 (revised) 57 Portland, Oregon The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Debugging middle-end tree passes

Command-line options for dumping trees: -fdump-tree-X output after pass X -fdump-tree-original output initial tree (before all opts) -fdump-tree-optimized output final GIMPLE (after all opts) -fdump-tree-gimple dump before & after gimplification -fdump-tree-inlined output after function inlining -fdump-tree-all output after each pass (Make sure you specify an –O level or you might not get anything.)

Passes available for dumping in GCC 4.1.1 (see info page): cfg, vcg, ch, ssa, salias, alias, ccp, storeccp, pre, fre, copyprop, store_copyprop, dce, mudflap, sra, sink, dom, dse, phiopt, forwprop, copyrename, nrv, vect, vrp

PLDI 2008 58 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Debugging middle-end tree passes

Can specify options for tree dumps: • address print address of each tree node • slim less output; don’t dump all scope bodies • raw raw tree output (rather than pretty-printed C-like trees) • details detailed output (not supported by all passes) • stats statistics (not supported by all passes) • blocks basic block boundaries • vops output virtual operands for each statement • lineno output line #s • uid output decl’s unique ID along with each variable • all all except raw, slim, and lineno\ e.g. -fdump-tree-dse-details detailed post-DSE output -fdump-tree-all-all (almost) everything

PLDI 2008 59 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Static Single-Assignment (SSA) form

CytronCytron etet al.al. EfficientlyEfficiently cocomputingmputing staticstatic singlesingle assignmentassignment formform andand thethe controlcontrol dependencedependence graph.graph. ACMACM TOPLAS,TOPLAS,OctoberOctober 1991.1991. • (Pure) functional languages have nice properties for optimization ƒ single-assignment: one assignment to each variable ƒ static single-assignment: next best thing • each variable assigned at one static location in the program ƒ makes it clearer where data is produced • reduces complexity of many optimization algorithms • removes association of variable uses over its lifetime

PLDI 2008 60 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron SSA renaming (1)

yy == 1010 xx == 11 yy == 10;10;

/*/* computecompute 2^y2^y */*/ x = 1; x = 1; false whilewhile (y(y >> 0)0) {{ yy << 00 ?? xx == xx ** 2;2; yy == yy -- 1; 1; true }}

xx == xx ** 22 EXIT yy == yy -- 1 1 EXIT modelmodel control control flow flow

PLDI 2008 61 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron SSA renaming (2)

yy1 == 1010 x1 = 1 x11 = 1 y = 10; y11 = 10;

/*/* computecompute 2^y2^y */*/ x = 1; x11 = 1; while (y > 0) { y < 0 ? false while (y11 > 0) { y1 < 0 ? x = x * 2; 1 x22 = x11 * 2; y = y -1; y22 = y11 -1; true }}

xx2 == xx1 ** 22 y 2 = y 1 -1 EXITEXIT y22 = y11 -1 versionversion allall variablesvariables

PLDI 2008 62 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron SSA renaming (3)

y = 10; y11 = 10; y = 10 y11 = 10 xx1 == 11 /*/* computecompute 2^y2^y */*/ 1 x = 1; x11 = 1; while(true) { while(true) { x = φ(x , x ) x = φ(x , x ); x33 = φ(x11, x22) x33 = φ(x11, x22); y = φ(y , y ) y = φ(y , y ); y33 = φ(y11, y22) y33 = φ(y11, y22); if (y > 0) if (y33 > 0) false break; y < 0 ? break; y33 < 0 ? x = x * 2; x22 = x33 * 2; true y = y -1; y22 = y33 -1; } } xx2 == xx3 ** 22 y 2 = y 3 -1 EXITEXIT y22 = y33 -1 insertinsert “phi”“phi” nodes nodes

PLDI 2008 63 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Into and out of SSA form in GCC

pass_build_ssapass_build_ssa gcc/tree-into-ssa.c

SSApass_del_ssa optimizations pass_del_ssapass_del_ssapass_del_ssa pass_del_ssapass_del_ssa

pass_del_ssapass_del_ssa gcc/tree-outof-ssa.c

PLDI 2008 64 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Dealing with SSA form in GCC

• Given a tree node n with code =PHI_NODE PHI_RESULT(n) get lhs of φ PHI_NUM_ARGS(n) get rhs count PHI_ARG_DEF(n, i) get ssa-name PHI_ARG_EDGE(n, i) get edge PHI_ARG_ELT(n, i) tuple (ssa-name, edge)

• Given a tree node n with code = SSA_NAME SSA_NAME_DEF_STMT(n) get defining statement SSA_NAME_VERSION(n) get SSA version #

PLDI 2008 65 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron A few useful functions in the middle-end

walk_use_def_chains(var, func, data) start at ssa-name var, calling func at each point up the chain; data is a generic pointer for use by func — see tree-ssa.c and internals docs (info gccint)

walk_dominator_tree(dom-walk-data, basic-block) start at basic-block and walk children in dominator relationship; dom-walk-data provides several callbacks — see domwalk.h and internals docs (info gccint)

PLDI 2008 66 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

Implementing an optimization from start to finish

OOPSLA 2006 (revised) 67 Portland, Oregon The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron RTL expansion and optimization

• Expansion performed by pass_expand (gcc/cfgexpand.c) ƒ Back-end has a say in this

• As of GCC 4.1.x, RTL passes are carried out by same pass manager that works on trees

• pass_final (at end) outputs assembly

PLDI 2008 68 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron RTL expansion

• Machine description (.md) files for target CPU define_expand and define_insn match standard names and generate RTL; assist in expansion of GIMPLE

define_expanddefine_expand takestakes five five args: args: define_insndefine_insn takestakes five five args: args: 1.1. name name 1.1. name name (or (or empty empty string) string) 2.2. RTL RTL template template 2.2. RTL RTL template template 3.3. condition condition (C) (C) 3.3. condition condition (C) (C) 4.4. preparation preparation statements statements (C) (C) 4.4. output output template template (assembly, (assembly, or or CC that that generates generates assembly) assembly) 5.5. attributes attributes (optional) (optional)

PLDI 2008 69 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron e.g. in gcc/config/i386/i386.md

((define_expanddefine_expand ""movhimovhi"" [[(set(set (match_operand:HI(match_operand:HI 00 "nonimmediate_operand""nonimmediate_operand" "")"") (match_operand:HI(match_operand:HI 11 "general_operand""general_operand" ""))""))]] """" "ix86_expand_move"ix86_expand_move (HImode,(HImode, operands);operands); DONE;"DONE;")) (define_expand name RTL condition prep-stmts) expands operation name to RTL (if applicable) runs prep-stmts (match_operand[:mode] Npredicateconstraint) tries to match movhi operand N of mode if predicate general_operand is any imm/mem/reg valid for mode (set dest src) RTL for assignment; here, the implementation for movhi PLDI 2008 70 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron e.g. in gcc/config/i386/i386.md (2)

((define_insndefine_insn ""jumpjump"" [[(set(set (pc)(pc) (label_ref(label_ref (match_operand(match_operand 00 """" "")))"")))]] """" "jmp\t%l0""jmp\t%l0" [[..attributes.....attributes...])]) (define_insn name RTL condition output attributes) expands operation name to RTL (if applicable) condition can emit additional instructions (match_operand[:mode] Npredicateconstraint) tries to match jump operand N of mode if predicate (set (pc) label) RTL for an unconditional jump PLDI 2008 71 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Standard instruction names movM mulhisi3 btruncM fixMN2 reload_inM mulqihi3 roundM fixunsMN2 movstrictM umulqihi3 ceilM ftruncM movmisalignM umulsidi3 nearbyintM fix_truncMN2 load_multiple smulM rintM fixuns_truncMN2 store_multiple umulM copysignM truncMN2 vec_setM divmodM ffsM extendMN2 vec_extractM udivmodM clzM zero_extendMN2 vec_initM ashlM ctzM extv pushM ashrM popcountM extzv addM lshrM3 parityM insv subM rotlM3 one_cmplM movMODE mulM3 rotrM3 cmpM addMODE sminM negM tstM sCOND smaxM3 absM movmemM bCOND reduc_smin_M sqrtM movstr cbranchMODE reduc_smax_M cosM setmemM jump reduc_umin_M sinM cmpstrnM call reduc_umax_M expM cmpstrM call_value reduc_splus_M logM cmpmemM call_pop reduc_uplus_M powM strlenM untyped_call vec_shl_M atan2M floatMN2 return vec_shr_M floorM floatunsMN2 untyped_return PLDI 2008 72 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Standard instruction names (more!) nop sync_compare_and_swapMODE indirect_jump sync_compare_and_swap_ccMODE casesi sync_addMODE tablejump sync_subMODE doloop_end sync_old_addMODE doloop_begin sync_old_subMODE save_stack_block sync_new_addMODE allocate_stack sync_new_subMODE check_stack sync_lock_test_and_setMODE nonlocal_goto sync_lock_releaseMODE nonlocal_goto_receiver stack_protect_set exception_receiver stack_protect_test builtin_setjmp_setup decrement_and_branch_until_zero builtin_setjmp_receiver canonicalize_funcptr_for_compare builtin_longjmp eh_return prologue epilogue sibcall_epilogue trap conditional_trap prefetch memory_barrier PLDI 2008 73 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Uses for the machine description

1. RTL expansion ƒ define_expand and define_insn with standard names ƒ ones with unknown names are ignored ƒ if a needed name isn’t given, GCC crashes 2. RTL adjustment s ƒ define_split, define_peephole asssees on p pas izatition ptimmiza 3. Hard register allocation (reloading) oopti ƒ register description, preferencing 4. RTL template matching ƒ define_insn (name ignored) ƒ assembly generated PLDI 2008 74 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron RTL optimization passes (gcc/passes.c)

• RTL expansion • Register movement (avoid rt moves) • CFG cleanup/“jump optimization” • Optimize mode switching • EH landing pad injection • Modulo scheduling (loop pipelining) • Local CSE • Instruction scheduling • Global CSE • Register allocation (reloading) • Loop CSE/unroll/peel/unswitch • Optimize stack operations • Jump bypassing • Peephole optimizations • Branch prob. instrumentation/analysis (define_peephole) • If conversion (conditional moves etc.) • Basic block reordering (profile-driven) • Tracer (trail dup for superblock • Variable tracking (debug support) formation) • Delayed branch scheduling • Web construction (split pseudoreg’s) • Branch shortening • Pseudoregister liveness analysis (DSE, • Register-to-stack conversion auto-inc/dec addressing) •Final • Instruction combination • Debugging information dump • Partition basic blocks (hot and cold)

PLDI 2008 75 Tuscon, Arizona The GCC Middle-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

CC G Front-endFront-end Middle-endMiddle-end Back-endBack-end

Demystifying The GCC Back-End

Register allocation

Instruction selection

Debugger support

PLDI 2008 76 Tuscon, Arizona 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Register allocation

• RTL pseudo-registers Æ hard registers

• Proceeds in several passes 1. Register class scan (preference registers) 2. Register allocation within basic blocks 3. Register allocation for remaining registers 4. Reload (renumbering, spilling)

PLDI 2008 77 Tuscon, Arizona The GCC Back-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Instruction selection

((define_insndefine_insn "jump""jump" [[(set(set (pc)(pc) (label_ref(label_ref (match_operand(match_operand 00 """" "")))"")))]] """" ""jmp\t%l0jmp\t%l0"" [[..attributes.....attributes...])]) (define_insn name RTL condition output attributes) matches RTL (if applicable) (match_operand[:mode] Npredicateconstraint) tries to match jump operand N of mode if predicate

jmp\t%l0 Generates x86 jmp with %l0 (operand 0 as label) PLDI 2008 78 Tuscon, Arizona The GCC Back-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

A machine description tour

OOPSLA 2006 (revised) 79 Portland, Oregon The GCC Back-End 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Debugger support

• Specifying –g to the compiler inserts debugging symbols in the assembly output •DWARF2 format ƒ embedded within ELF ƒ a tree of debug info entries (compilation unit at the root) • each with a linked list of attributes ƒ DWARF2 manual: ftp.freestandards.org/pub/dwarf/dwarf-2.0.0.pdf • Once assembled, “readelf –w” interprets them

PLDI 2008 80 Tuscon, Arizona The GCC Back-End 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron

CC G

Demystifying Runtime Issues

Object layout Virtual method lookup The Boehm garbage collector crt stuff

PLDI 2008 81 Tuscon, Arizona 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Simple object layout (C++)

class A { class A { A::myMethodA::myMethod vtablevtable public:public: intint x;x; A::otherA::other xx virtual void myMethod(); virtual void myMethod(); vtable for A instances of A virtualvirtual voidvoid other();other(); };};

classclass BB :: publicpublic AA {{ B::myMethod vtable public:public: B::myMethod vtable intint y;y; A::otherA::other xx virtualvirtual voidvoid myMethod();myMethod(); virtual void third(); virtual void third(); B::thirdB::third yy };}; vtable for B instances of B

PLDI 2008 82 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Simple object layout (C++)

A::myMethodA::myMethod vtablevtable

A::otherA::other xx vtable for A instances of A

B::myMethodB::myMethod vtablevtable

A::otherA::other xx subobject of B A sub-vtable A ofsub-vtable B A B::thirdB::third yy vtable for B instances of B

PLDI 2008 83 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Object layout (Java)

classclass B B pointer pointer

GCGC descriptor descriptor vtablevtable finalizefinalize xx hashCodehashCode yy subobject of B A equals equals instances of B toString sub-vtable Object of B toString sub-vtable A ofsub-vtable B A cloneclone

myMethodmyMethod

otherother

thirdthird vtable forB PLDI 2008 84 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron But more complicated for C++ !

First,First, classes classes might might not not have have virtual virtual functions functions ! ! classclass AA {{ public:public: intint x;x; voidvoid myMethod();myMethod(); xx void other(); void other(); instances of A };};

classclass BB :: publicpublic AA {{ public: public: xx intint y;y; voidvoid myMethod();myMethod(); yy void third(); subobject of B A void third(); instances of B };};

PLDI 2008 85 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron But more complicated for C++ !

classclass AA {{ public:public: Second,Second, classes classes might might have have multiple multiple bases bases ! ! intint x;x; virtual void one(); virtual void one(); A::oneA::one vtablevtable }; }; vtable for A xx classclass BB {{ instances of A public:public: B::two vtable intint y;y; B::two vtable virtual void two(); vtable for B virtual void two(); yy }; }; instances of B

classclass CC :: publicpublic A,A, publicpublic BB {{ public:public: intint z;z; ?? ?? virtualvirtual voidvoid three();three(); vtable for C instances of C };}; PLDI 2008 86 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Object layout for multiple bases

A::oneA::one vtable for A

vtablevtable A::oneA::one vtablevtable

xx subobject of C A —— xx instances of A —— vtablevtable B::twoB::two yy B::twoB::two vtable for B C::threeC::three zz vtable for C instances of C vtablevtable subobject B of C yy Requires “this pointer-adjustment” instances of B Requires “this pointer-adjustment”

PLDI 2008 87 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Multiple bases, cont’d

ButBut whatwhat aboutabout dynamic_castdynamic_cast ?! ?!

A::oneA::one vtablevtable subobject of C A [ [offset offset = = – – 4 4 ] ] xx —— vtablevtable B::twoB::two yy C::threeC::three zz vtable for C instances of C subobject B of C

PLDI 2008 88 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Multiple bases, cont’d

Top-levelTop-level offsetoffset

[ [offset offset = = 0 0 ] ] —— A::oneA::one vtablevtable subobject of C A [ [offset offset = = – – 4 4 ] ] xx —— vtablevtable B::twoB::two yy C::threeC::three zz vtable for C instances of C subobject B of C

PLDI 2008 89 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Multiple bases, finished*

ButBut whatwhat aboutabout C++C++ typetype infoinfo ?! ?!

[ [offset offset = = 0 0 ] ] ptr.ptr. typeinfo typeinfo C C A::oneA::one vtablevtable subobject of C A [ [offset offset = = – – 4 4 ] ] xx ptr.ptr. typeinfo typeinfo C C vtablevtable B::twoB::two yy C::threeC::three zz vtable for C instances of C subobject B of C **therethere are are further further complications, complications, but but we’ll we’ll leave leave it it here here PLDI 2008 90 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Java and C++ share object layout

[ [offset offset = = 0 0 ] ] nullnull typeinfo typeinfo classclass B B pointer pointer GCGC descriptor descriptor finalizefinalize hashCodehashCode equalsequals toStringtoString cloneclone myMethodmyMethod otherother third third vtable for (Java) B

PLDI 2008 91 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Virtual method lookup (C++, Java)

[ [offset offset = = 0 0 ] ] null typeinfo Now,Now, virtual virtual method method null typeinfo invocationinvocation is is a a snap snap ! ! vtablevtable classclass B B pointer pointer xx GCGC descriptor descriptor y finalize CompilerCompiler knows knows method method y finalize offsetoffset within within vtable vtable instance of B hashCodehashCode equalsequals toString SoSo it it generates generates an an indirect indirect toString accessaccess through through instance instance pointer… pointer… cloneclone myMethodmyMethod …and…and invokes invokes the the method method otherother through the pointer found in vtable third through the pointer found in vtable third vtable for (Java) B

PLDI 2008 92 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron The Boehm garbage collector

Boehm,Boehm, H.,H., SpaceSpace EfficientEfficient ConservativeConservative GarbageGarbage Collection.Collection. InIn ACMACM PLDI’91.PLDI’91. • Conservative mark & sweep garbage collector ƒ designed to operate in a hostile environment as a drop-in replacement for malloc ƒ “conservative” means it cannot distinguish between pointers and non-pointers ƒ Java is considerably less “hostile” than C/C++ • can’t hide pointers from the compiler

PLDI 2008 93 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron Java and Boehm GC

• Java front-end generates class pointer masks ƒ stows them in vtable

ƒ computed in gcc/java/boehm.c [ [offset offset = = 0 0 ] ] null typeinfo • Class too big for a pointer mask ? null typeinfo classclass B B pointer pointer ƒ use a count of reference fields GCGC descriptor descriptor ƒ use a “mark procedure” finalize • Where to look … ƒ boehm-gc/doc contains docs ƒ libjava/prims.cc contains GC-aware allocation routines

PLDI 2008 94 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron crt stuff (“C runtime”)

• crt1.o, crti.o, crtn.o* provided by glibc crt1.o sets up libc before main() is even invoked crti.o prologue for .init and .fini crtn.o epilogue for .init and .fini • crtbegin.o, crtend.o* provided by GCC crtbegin.o contributes frame_dummy() call to .init; calls static data destructors in .fini crtend.o calls static data constructors in .init ƒ code in gcc/crtstuff.c **andand some some variations variations PLDI 2008 95 Tuscon, Arizona Runtime Issues 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

CC G

Demystifying Miscellany

PLDI 2008 96 Tuscon, Arizona 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Tips on modifying code

• Find something similar ƒ browse source code • build off it • use a debugger ƒ -fdump-tree-* ƒ -d* ƒ debug_tree/browse_tree

PLDI 2008 97 Tuscon, Arizona Miscellany 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

CC G

Demystifying Wrap-up

Running GCC under GDB Obtaining development versions of GCC Reporting bugs in GCC What’s next for GCC

PLDI 2008 98 Tuscon, Arizona 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Running GCC under GDB

• Inevitably, hacking a compiler will result in ƒ segfault ƒ assertion fault ƒ incorrect code generation • Remember to attach debugger to the compiler, not the driver •“gcc –v …,” then use GDB on the actual front-end

PLDI 2008 99 Tuscon, Arizona Wrap-up 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron

In Greater Depth

Debugging GCC

OOPSLA 2006 (revised) 100 Portland, Oregon Wrap-up 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Demystifying GCC Morgan Deters and Ron Cytron Obtaining development versions of GCC

• All GCC development is in the open ƒ design discussions ƒ change logs ƒ bugs

• Subversion (SVN) repository ƒ public read access ƒ for details: gcc.gnu.org/svn.html ƒ clients available from subversion.tigris.org/

PLDI 2008 101 Tuscon, Arizona Wrap-up 2008年8月12日星期二

Demystifying GCC Morgan Deters and Ron Cytron What to do if you find a bug in GCC

• Check to see if bug is present in SVN version • Check to see if bug is in bug database ƒ http://gcc.gnu.org/bugzilla/ • Collect version information (gcc --version)

• Guidelines: http://gcc.gnu.org/bugs.html • Report it: http://gcc.gnu.org/bugzilla/

PLDI 2008 102 Tuscon, Arizona Wrap-up 2008年8月12日星期二

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008 Demystifying GCC: PLDI 2008 Under the Hood of the GNU Compiler Collection

Thanks! Demystifying GCC

Morgan Deters Ron Cytron [email protected] [email protected]

Programming Logic Group Distributed Object Computing Laboratory Technical University of Catalonia Washington University Barcelona, Spain St. Louis, Missouri Copyright ©Copyright Deters and Ron Cytron Morgan 2005–2008

Copyright (c) 2005-2008 Morgan Deters and Ron Cytron June 2008