Low-Overhead Debugging Support for an LLVM IR Interpreter

Submitted by Jacob Kreindl, BSc. Submitted at Institut für Systemsoftware Supervisor o.Univ.-Prof. Dipl.-Ing. Dr.Dr.h.c. Hanspeter Source-Level Mössenböck Co-Supervisors Debugging Support Dipl.-Ing. Dr. Matthias Grimmer Dipl.-Ing. Manuel in an LLVM-IR Rigger Interpreter April 2018 Master Thesis to obtain the academic degree of Diplom-Ingenieur in the Master’s Program Computer Science JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 2 Abstract Sulong executes programs that are compiled to LLVM IR, the language-independent intermediate representation of source code used by the LLVM compiler infrastructure, on the Java Virtual Machine (JVM). The interpreter is based on the Truffle language implementation framework and is part of the GraalVM project. Truffle provides a versatile debugger backend which en- ables source-level inspection of programs it executes even across language boundaries. This thesis describes how Sulong leverages debug information available in LLVM IR bitcode files to support this feature. First, the thesis describes how Sulong relates programs it executes to locations in their original source code. It further presents the necessary extensions to the interpreter’s execution data structures that enable the debugger backend to facilitate source-level single-stepping and set breakpoints. Next, the thesis defines multiple layers of abstraction from LLVM IR as well as Sulong’s internal data model that provide a source-level view of an interpreted program’s runtime state which includes language-specific display of types, scopes and values. It also introduces specialized data structures to efficiently represent runtime debug information. This thesis demonstrates the capabilities of the presented approach by inspecting a native Ruby extension implemented in C++ at runtime. A performance evaluation further shows that runtime overhead in terms of execution time introduced by instrumentation and symbol inspection is negligible in many cases. 3 Kurzfassung Sulong führt Programme auf der Java Virtual Machine (JVM) aus, die zu LLVM IR kompiliert wurden, der sprachunabhängigen Repräsentation von Quellcode, die die LLVM Compiler Infras- truktur verwendet. Der Interpreter basiert auf dem Truffle Framework zur Implementierung von Programmiersprachen und ist Teil des GraalVM Projekts. Truffle stellt ein vielseitiges Debugger Backend zur Verfügung, welches das Inspizieren von Programmen, die es ausführt, ermöglicht, auch über mehrere Programmiersprachen hinweg. Diese Masterarbeit beschreibt, wie Sulong Debug Information in LLVM IR Bitcode Dateien verwendet, um dieses Feature zu unterstützen. Die Masterarbeit beschreibt zunächst, wie Sulong die Programme, die es ausführt, auf den ur- sprünglichen Quellcode zurückführt. Des weiteren präsentiert sie die notwendigen Erweiterun- gen zu den Datenstrukturen des Interpreters, die es dem Debugger Backend ermöglichen, Gast- programme auf Level derer ursprünglichen Programmiersprache Schritt-für-Schritt auszuführen und Haltepunkte in ihnen zu setzen. Des weiteren definiert die Masterarbeit mehrere Abstraktionsschichten von LLVM IR und Su- longs internem Datenmodell, die eine Sicht auf den Laufzeitzustand eines Gastprogramms ermöglichen, die dessen ursprünglicher Programmiersprache entspricht, was sprachspezifische Darstellung von Typen, Scopes und Werten beinhaltet. Sie beschreibt auch spezialisierte Daten- strukturen zur effizienten Repräsentation von Debug Information zur Laufzeit. Diese Masterarbeit demonstriert die Fähigkeiten des präsentierten Ansatzes durch das In- spizieren einer Ruby Erweiterung zur Laufzeit, die in C++ implementiert wurde. Eine Performance- Evaluierung zeigt des weiteren, dass der Laufzeit Overhead in Form von Ausführungszeit, den Instrumentierung und Symbolinspektion erzeugen, in vielen Fällen vernachlässigbar ist. Contents 4 Contents 1 Introduction7 1.1 Motivation . .7 1.2 Goals and Scope . .8 1.3 Thesis Structure . .9 2 System Overview 10 2.1 The LLVM Compiler Infrastructure . 10 2.2 GraalVM . 12 2.2.1 Truffle . 12 2.2.2 Sulong . 16 3 LLVM IR 17 3.1 General Structure . 17 3.1.1 Scopes . 18 3.1.2 Data flow . 19 3.1.3 Control flow . 20 3.1.4 Type system . 20 3.2 Representations . 21 3.3 Binary Encoding . 22 3.3.1 Symbol Table . 23 3.3.2 High-Level File Structure . 24 4 Debug Information 26 4.1 LLVM IR Metadata . 26 4.1.1 Structure . 27 4.1.2 Encoding . 28 4.2 LLVM IR Debug Information . 31 4.2.1 Locations . 32 Contents 5 4.2.2 Symbols . 32 4.2.3 Types . 33 4.2.4 Scopes . 35 4.3 Value Mapping . 37 4.3.1 Locals . 37 4.3.2 Globals . 38 5 Stepping & Breakpoints 40 5.1 Location Information . 40 5.1.1 Representation in Truffle . 41 5.1.2 Representation in Sulong . 42 5.2 Truffle Instrumentation . 44 5.2.1 Node Implementation . 45 5.2.2 Wrapper Implementation . 47 5.3 Dynamically Halting Execution in an Instrumented AST . 49 6 Source-Level Symbol Inspection 52 6.1 Symbol Information . 52 6.2 Symbol Inspection in Truffle . 55 6.3 Symbol Inspection in Sulong . 57 6.3.1 Value Layer . 57 6.3.2 Representation Layer . 60 6.3.3 Value Tracking . 62 7 Case Study 64 7.1 Program Description . 65 7.2 Analysis . 69 7.2.1 Stepping . 69 7.2.2 Symbol Inspection . 73 8 Evaluation 82 9 Future Work 86 10 Related Work 88 11 Conclusion 90 Contents 6 Bibliography 95 Introduction 7 Chapter 1 Introduction This chapter states the motivation for implementing support for source-level debugging in an LLVM IR interpreter. It also defines the scope of this thesis and provides an outline of its structure. 1.1 Motivation Real world software systems tend to be highly complex collections of source code frequently written in multiple programming languages. They are also often implemented and maintained by many different people. Intricate semantic nuances of many programming languages and necessarily subjective interpretation of even well-defined requirements already make it hard for developers to specify large programs entirely correctly. Different execution environments as well as optimizing compilers and interpreters add another source for possibly subtle bugs. This raises demand for versatile debuggers that help developers to detect errors at runtime under realistic conditions. However, most traditional approaches fail to support source-level inspection across language boundaries. As a result, developers often need to use multiple frontends to debug native extensions for programs in dynamic languages such as Ruby. Sulong is an interpreter for LLVM IR, a common representation of code in various low-level programming languages, including C, C++ and Fortran. It is based on the Truffle framework for implementing high-performance, interoperable Abstract Syntax Tree (AST) interpreters. Sulong Introduction 8 is also part of the GraalVM project where its main use-case is the execution of native extensions for various other Truffle-based implementations of dynamic programming languages. Truffle contains a framework for source-level program instrumentation and debugging. It requires individual language implementations to provide additional information in the AST and to define a language-specific representation of their runtime state. However, supporting it is optional. The goal of this thesis is to implement the necessary features in Sulong. Truffle generally supports source-level debugging across language boundaries in the same frontend. By implementing the corresponding API in Sulong we enable the important use-case of debugging native extensions for dynamic languages. 1.2 Goals and Scope Enabling support for source-level debugging in Sulong requires applying debug information in LLVM IR to relate nodes in the Truffle AST to locations in the source code, reconstruct the original program state from the interpreter’s execution of compiled code and providing a language-specific representation of it to Truffle’s debugger backend. The concrete scope of this thesis consists of the following goals. • Analyzing debug information in LLVM IR: This entails evaluating the information it contains as well as its encoding. • Relating Truffle AST and source code: This includes marking AST nodes that correspond to statements or other programming constructs and relating them to their respective locations in the source code and retaining the source-level scope hierarchy. • Supporting Truffle instrumentation: Extend Sulong’s implementation of AST nodes as required by Truffle’s framework for source-level program instrumentation. The debugging framework requires this to enable single-stepping and breakpoints. • Source-level symbol inspection: This requires abstracting the interpreter’s runtime state to provide a language-specific representation of source-level scope entries and their values in a format suitable for the debugging framework. Introduction 9 1.3 Thesis Structure This thesis starts with a general overview of the technologies it uses in Chapter2. Following this, Chapter3 introduces the basic concepts of LLVM IR and its different formats. Chapter4 continues with a more in-depth analysis of the content and encoding of debug information in LLVM IR files. Next, Chapter5 describes how Truffle implements program instrumentation and what this requires of Sulong. After this, the thesis describes in detail how the interpreter provides a language-specific view of the runtime state of programs it executes in Chapter6. Chapter7 then exercises a case study to demonstrate the previously discussed features. Follow- ing this, Chapter8 provides a short evaluation of the impact enabling debugging support has on execution performance.

Low-Overhead Debugging Support for an LLVM IR Interpreter

Intermediate Representation and LLVM

Ryan Holland CSC415 Term Paper Final Draft Language: Swift (OSX)

LLVM Overview

Improving Program Reconstruction in LLDB Using C++ Modules

Speed Performance Between Swift and Objective-C

Xcode Static Analyzer

Reenix: Implementing a Unix-Like Operating System in Rust

Developer Tools #WWDC16

A Differential Approach to Undefined Behavior

Apple's New Language for Cocoa

Lecture 3 Overview of the LLVM Compiler

Design and Implementation of a Tricore Backend for the LLVM Compiler Framework