Do We Still Need New Alias Analyses? Maroua Maalej, Laure Gonnord

Do we still need new Alias Analyses? Maroua Maalej, Laure Gonnord To cite this version: Maroua Maalej, Laure Gonnord. Do we still need new Alias Analyses?. [Research Report] RR-8812, Université Lyon Claude Bernard / Laboratoire d’Informatique du Parallélisme. 2015. hal-01228581 HAL Id: hal-01228581 https://hal.inria.fr/hal-01228581 Submitted on 13 Nov 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Do we still need new Alias Analyses? Maroua MAALEJ, Laure GONNORD RESEARCH REPORT N° 8812 November 2015 Project-Team ROMA ISSN 0249-6399 ISRN INRIA/RR--8812--FR+ENG Do we still need new Alias Analyses? Maroua MAALEJ∗†‡, Laure GONNORD †‡ Project-Team ROMA Research Report n° 8812 — November 2015 — 18 pages Abstract: Alias analysis is one of the most used techniques that aim to optimize languages with pointers. It is no surprise that this topic has received much attention in many fields including compilation and program verification. However, despite all this attention, a deep study of its state-of-the art shows that it is not handled satisfactorily, by far. In this paper we demonstrate that, although many approaches have been proposed, we still cannot rely on alias analysis inside compilers. We illustrate our statement with an overview of the capabilities of state-of-the-art compilers. Key-words: Static Analysis, Alias Analysis, Optimization, Compilers ∗ This work was supported by the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program "Investissements d’Avenir" (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). † Laboratoire de l’Informatique du Parallélisme, 46 Allée d’Italie, Lyon, France ‡ Université Lyon 1, 43 Boulevard du 11 Novembre, Lyon, France RESEARCH CENTRE GRENOBLE – RHÔNE-ALPES Inovallée 655 avenue de l’Europe Montbonnot 38334 Saint Ismier Cedex A-t-on besoin de nouvelles analyses d’alias ? Résumé : L’analyse d’alias est l’une des techniques les plus utilisées pour optimiser les lan- gages avec pointeurs. C’est donc sans surprise que beaucoup d’approches et d’analyses ont été proposées, à des fins d’optimisation et de vérification de programmes. Cependant, une étude approfondie de l’état de l’art montre que les résultats établis dans le domaine de l’analyse de pointeur sont loin d’être satisfaisants. Dans ce rapport nous montrons qu’en dépit de toutes les approches proposées, nous ne pouvons toujours pas compter sur les analyses d’alias dans les compilateurs. Nous illustrons notre propos par une vue d’ensemble des analyses de pointeurs dans les compilateurs actuels. Mots-clés : Analyses statiques, Analyses d’alias, Optimisation, Compilateurs Do we still need new Alias Analyses? 3 1 Introduction Low-level languages like C are widely used to write embedded software and manage hardware in order to make smaller or more efficient programs. They enable the programmer to finely adapt their code to their hardware, so that it runs faster with minimum memory usage. Today variants of the C language are still used to program on multicore or GPU machines. However, since these languages lack abstractions and give the programmer ability to manip- ulate memory, they are hard to program and are vulnerable to errors. To overcome this problem and to help the user produce correct and optimized code, analysis techniques have been widely investigated in both verification [20] and compilation fields [9]. In this paper we will mainly discuss the field of static alias analyses also known as pointer analyses or points-to analyses that have been shown to be undecidable [32]. This kind of analysis attempts to discover, statically (at compile-time), the possible values of a pointer at runtime. Typically, this information is used to optimize programs or prove their correctness, which can be done by eliminating dead code and null pointers, parallelizing, rescheduling, etc. Many approaches that trade precision for speed, speed for precision, or even attempt to balance between both, have been proposed and improved in time and space. However, we shall demonstrate that despite all the attention this topic has received, today’s compilers still rely on rudimentary and imprecise alias analyses, which provide conservative information insufficient to generate efficient code. The contributions of this paper are the following: • an extensive study of the state-of-the-art approaches for alias analyses (Section 2) and their use for compiler optimizations (Section 3); • an experimental evaluation of strengths and weaknesses of current production C compilers (Section 4); • a discussion of the kind of information that would be useful for designing new pointer analyses for compilers (Section 5). RR n° 8812 4 Maroua MAALEJ & Laure GONNORD 2 Alias analysis: context and related work In this section, we introduce the problem of alias analyses and make an overview of existing approaches. We show that the trade-off between precision and scalability is difficult to find, especially when it comes to compiler optimization. 2.1 Problem statement Alias analysis (or “pointer analysis”) is a technique widely used to perform, on languages with pointers, transformations and optimizations such as parallelization [17] and code motion. It is also useful for program verification such as detecting bugs due to array out of bounds accesses [26, 37] and integer overflow [15, 38]. Some of alias analyses are performed for security and performance enforcement [5, 28], deadlock or race detection [33, 14], memory errors detection [7], etc. Such an analysis aims to disambiguate memory locations pointed by pointer variables to find out whether they dereference overlapping regions or not. At runtime, a pair of pointers may have the following behaviors, depicted in Figure 1: • No Alias: p1 and p2 are disjoint. They do not reference overlapping memory regions. This is the favorable case toward optimizations such as rescheduling and code motion, as we will see in Section 3. • Alias : p1 and p2 reference regions that overlap. In this case, we can distinguish between partial alias where p1 and p2 overlap in some way but do not start at the same address and must alias where p1 and p2 alias and start at the same address. ... ... ... ... ... ... ... p p1 p2 p1 2 p2 p1 noalias partial alias must alias Figure 1: Concrete representation of memory regions in case of no, partial and must aliasing. Each represents a memory slot. The literature of alias analysis is abundant and much work have been done in the last few decades. For a quite exhaustive bibliography, see for instance [12] and [21]. In this paper, we shall be limited to related work that are the most relevant for our study of interest. 2.2 Static analyses of pointers Historically, starting from Andersen’s work [1], static analysis for pointers has received much attention [12, 9, 31, 37]. The current state-of-the-art shows numerous analysis approaches. Constraint-based pointer analyses. Andersen [1] and Steensgaard [36] propose to compute points-to sets (the set of variables that a given pointer can reference) by means of solving constraint systems. In these two algorithms, a set of constraints is derived from program statements involving assignments, loads and stores. For instance, for any assignment a=b, the points-to set of b is included in the points-to set of a (for Andersen), or the two points-to sets are equal (for Steensgaard). Solving this set of constraints is done with efficient graph algorithms. The output of these algorithms is not a comparison between pointers but a points-to set for each pointer which would enable further use by means of alias queries. The main drawback of these Inria Do we still need new Alias Analyses? 5 algorithms is that they are not precise enough to properly handle pointer arithmetic: after the assignment py=px+4, px and py are inside the same points-to set. Pointer analyses by Model Checking. In [20], alias analysis information is collected by means of model checking techniques. To do so, the relevant features of the C code is represented as a finite state automaton and the alias query is translated into a property verified as a reacha- bility problem. Although the performance in terms of verification time and in scalability seems promising, we believe that such kinds of program transformation cannot be embedded in compilers, in particular because tracking back information from results obtained after transformation is tedious and error prone. Shape Analyses. Shape analysis [35, 4], also known as storage analysis, is a kind of static alias analysis mostly used to verify programs that perform destructive updating on dynamically allocated storage. Its goal is to give, for each program point, a finite and conservative character- ization of the possible shapes of program data structures allocated in the heap. A Static Shape Graph (SSG), which is a finite, directed and labeled graph, is computed at each program point to approximate stores done during execution until this point. The SSG, where nodes are structures and edges are pointers, is used to check memory safety (looks for memory leaks), and to find out of bounds accesses and discover properties of linked data structures dynamically allocated. Graphs are used to model program stores. Much attention has been paid to this technique in the last decade. However, the effort has mainly been done on precision issues.

Do We Still Need New Alias Analyses? Maroua Maalej, Laure Gonnord

Using the GNU Compiler Collection (GCC)

Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ?

Equality Saturation: a New Approach to Optimization

Memory Tagging and How It Improves C/C++ Memory Safety Kostya Serebryany, Evgenii Stepanov, Aleksey Shlyapnikov, Vlad Tsyrklevich, Dmitry Vyukov Google February 2018

Statically Detecting Likely Buffer Overflow Vulnerabilities

Precise Null Pointer Analysis Through Global Value Numbering

A Formally-Verified Alias Analysis

Aliasing Restrictions of C11 Formalized in Coq

Practical and Accurate Low-Level Pointer Analysis

Aliases, Intro. to Optimization

Teach Yourself Perl 5 in 21 Days

User-Directed Loop-Transformations in Clang