Memory Leak Detection and Automatic Garbage Collection for the C-Based Compiler Framework Coconut

Bachelor Informatica Memory leak Detection and Automatic Garbage Collection for the C-Based Compiler Framework CoCoNut Mehmet Faruk Unal January 20, 2020 Informatica | Universiteit van Amsterdam Supervisor(s): Clemens Grelck Signed: Abstract The CoCoNut framework is built to support the user in the development of their own C based compiler. It does this by providing an abstraction layer to interact with the Abstract syntax tree. However, if for some reason memory-leaks occur there are currently no means to identify or clean them automatically. In this thesis we expand the CoCoNut framework with the means to support a memory- leak detector for the set of data-objects which form the AST. This is done by creating a wrapper which will manage all the dynamically allocated data-objects with regards to the AST. Because the CoCoNut framework generates all the constructor and destructor functions of the nodes which are in the AST. The implementation and function of it can be hidden from the user. With this wrapper implemented and now possible to exert more control over the data-objects which form the AST. The operation of an automatic garbage collector or a memory-leak reporter is now made possible. With the automatic-garbage- collector and memory-leak-reporter implemented, the CoCoNut framework is now able to support the user in finding the causes of memory-leaks without having to exert effort. This helps the user in boosting their productivity in developing their compiler. 2 Contents 1 Introduction 4 1.1 Research Question . .5 1.2 Methodology . .5 1.3 Organisation of thesis . .5 2 Background 6 2.1 CoCoNuT . .6 2.1.1 Abstract Syntax Tree . .7 2.1.2 Domain-specific Language . .8 2.1.3 Nodes . .9 2.1.4 Passes . .9 2.1.5 Traversals . 10 2.1.6 Phases & Cycles . 10 2.1.7 Meta-compiler . 11 2.1.8 Code Generation . 11 2.2 Memory Leak . 11 2.3 Reference Counting . 11 2.4 Tracing . 12 2.5 Reclamation of Garbage objects . 13 2.5.1 Sweeping . 14 2.5.2 Compacting . 14 2.5.3 Copying . 14 3 Memory Management and Memory Leak Detection for the AST 17 3.1 Methodology . 17 3.2 AST Memory manager . 18 3.3 Memory leak detection of the AST . 19 3.4 Detection of pointer errors in the AST . 19 4 Automatic Garbage Collector Memory Leak Reporter 21 4.1 Garbage Collector . 21 4.1.1 Mark&Sweep Implementation . 21 4.2 Memory Leak Reporter . 22 5 Related Work 25 5.1 Boehm-Demers-Weiser Garbage collector . 25 6 Conclusions 26 3 CHAPTER 1 Introduction To run code from a high-level language you have two options. One method is to translate the code with a compiler, the other is to run code through an interpreter. An Interpreter is a program which executes instructions written in a high-level language. A Compiler reads code written in a high-level language and tries to generate code of a target language which is semantically equivalent. Developing a compiler for any given platform requires knowledge of the following elements: • The high-level language which it is going to parse. • A method to parse the high-level language. • A way to store the parsed information in the compiler. • Comprehension of the target language the developer wants the compiler to translate to. • Finally, a method to generate semantically equivalent code for the target language from the stored information. With these challenges in mind, the CoCoNut framework is designed to assist the user by providing tools to create a compiler in the C language. The framework provides methodological support for compiler construction to reduce errors and boost productivity. One way to reduce its errors is to reduce the boilerplate code the user would have to write. It does this with its own DSL(domain- specific language) such that the user can describe properties of the compiler, traversals, phases and the model of the AST(Abstract syntax tree). These descriptions are then processed by the meta-compiler which generates boilerplate code for the common structure. After the user completes the compiler, the compiler will be able to read the programming language it is build to parse. From the parsed code the compiler builds an IR(Intermediate Represen- tation) which is an AST consisting of nodes described in its DSL. These nodes are dynamically allocated, since the number of nodes changes depending on the parsed code. During the run-time of the compiler, traversals on the IR will be performed. During a traversal transformations can occur on the IR where nodes can get detached from the AST. When the nodes get detached and are not freed due to user negligence, these nodes end up being garbage-objects/memory-leaks which the compiler can't reach anymore. A Regular C program does not have an inbuilt feature which detects and collects these garbage- objects/memory-leaks. Because in the C programming language the user is responsible for the de-allocation of data-objects stored in the heap. If the user is negligent to free these objects, these objects become unreachable for the program and end up being memory leaks. Unlike a regular C program, the CoCoNut framework does provide the means to support the implementation of a memory-leak detector and by extension a garbage collector. The Intermediate Representation which the compiler uses can be traversed with its traversal system. And the 4 constructor function which build the nodes for the intermediate representation are generated with the meta-compiler. These features overlap with some features a memory-leak detector and garbage collector requires to function. To make it operate properly however, certain choices have to be made and implemented which is discussed in this thesis. 1.1 Research Question CoCoNut Framework has the necessary tools to support a garbage collector, to make this possible the following research question has to be answered: • In what way can memory-leak detection and automatic garbage collection be implemented in the CoCoNut Framework, without adding any more workload to the user of the Frame- work? • With the information available from the CoCoNut framework, what reporting can be provided to the user of the framework in finding with their memory-leaks? 1.2 Methodology To answer the research question, the following actions have to be taken: The current feature set which can support the operation of a garbage collector has to determined. Research into the available garbage collection techniques with their benefits and disadvantages has to be made. A viable garbage collection algorithm has to be chosen to be implemented in the CoCoNut framework. On top of that choice, the missing features to make that viable garbage collection algorithm work has to be determined and implemented in the Framework. Finally, to build the garbage collector now the missing features are present in the CoCoNut framework. As an alternative to garbage collection, memory-leaks can instead be reported to the user by building a memory-leak-reporter. The CoCoNut framework has some degree of reporting, but what information it can currently provide has to be determined. After that information is determined, that information still has to be recorded so it can be reported to the user. Features which it requires to record that information overlaps with features which were also required by the garbage collector. However at this point those features are built and can be expanded to record additional information so a detailed report can be provided to the user. At the end the user has the option to either use the garbage collector or a memory-leak-reporter. 1.3 Organisation of thesis In this thesis we discuss the following subjects. In Chapter 2 the necessary background information to follow the thesis is provided. In Chapter 3 a memory management for the IR(Intermediate Representation) is introduced. In Chapter 4 the Garbage collector and the Memory-leak reporter is presented. 5 CHAPTER 2 Background To help answer the research questions information about the CoCoNut framework and garbage collection concepts have to be presented. In section 2.1, features of the CoCoNut Framework is presented to understand what it can currently provide in assisting the implementation of a garbage-collector. Garbage-collectors are a mean to reclaim data which are no longer considered in need by the application. This can be intentional like in Java or unintentional like in C. In order for a garbage collector to reclaim garbage-objects it first needs to distinguish data-objects which are no longer reachable. In this thesis we present 2 different methods to distinguish garbage-objects in the memory. These methods are reference counting presented in section 2.3 and tracing presented in section 2.4. When the garbage collector reaches the reclamation stage, depending on the method there are different options to reclaim the garbage-objects. Reclamation of the garbage-objects with reference counting is presented in section 2.3. When the garbage collector uses tracing to distinguish garbage-objects, its reclamation could be handled differently depending on the strategy used by the garbage collector. In section 2.5 three strategies are presented which could be used to manage the memory. 2.1 CoCoNuT CoCoNut was in 2017 introduced as a new Framework from a Joint effort of Maico Timmermans and Lorian Coltoff under the supervision of Clemens Grelck. The framework provides methodological support for compiler construction to reduce errors and boost productivity. It makes use of a domain-specific language(DSL) to describe properties of the compiler, traversals, phases and the model of the AST. From this a compiler is build which will convert programs written in the experimental programming language to the C programming language.

Memory Leak Detection and Automatic Garbage Collection for the C-Based Compiler Framework Coconut

Ben Livshits 1 Basic Instrumentation

Memory Leak Or Dangling Pointer

Effective and Efficient Memory Protection Using Dynamic Tainting

Tutorial on Debugging, Memory Leaks and Profiling

Memory Vulnerability Diagnosis for Binary Program

A Statistical Approach for Identifying Memory Leaks in Cloud Applications

Using Cyclic Memory Allocation to Eliminate Memory Leaks

Jamaicavm 8.1 — User Manual

Efficient High Frequency Checkpointing for Recovery and Debugging Vogt, D

Xilinx XAPP1137 Linux Operating System Software Debugging

Cork: Dynamic Memory Leak Detection for Garbage-Collected Languages £

Purify: Fast Detection of Memory Leaks and Access Errors