Taming Undefined Behavior in LLVM

Abstract At the programming language level, Scheme R6RS [21, A central concern for an is the design of p. 54] mentions that “The effect of passing an inappro- its intermediate representation (IR) for code. The IR should priate number of values to such a continuation is unde- make it easy to perform transformations, and should also fined.” However, the best-known examples of undefined be- afford efficient and precise static analysis. haviors in programming languages come from C and C++, In this paper we study an aspect of IR design that has re- which have hundreds of them, ranging from simple local ceived little attention: the role of undefined behavior. The IR operations (overflowing signed integer arithmetic) to global for every optimizing compiler we have looked at, including program behaviors (race conditions and violations of type- GCC, LLVM, Intel’s, and Microsoft’s, supports one or more based aliasing rules). Undefined behaviors facilitate opti- forms of undefined behavior (UB), not only to reflect the se- mizations by permitting a compiler to assume that programs mantics of UB-heavy programming languages such as C and will only execute defined operations, and they also support C++, but also to model inherently unsafe low-level opera- error checking, since a conforming compiler implementation tions such as memory stores and to avoid over-constraining can cause the program to abort with a diagnostic when an IR semantics to the point that desirable transformations be- undefined operation is executed. come illegal. The current semantics of LLVM’s IR fails to Some intermediate representations (IRs), such as Java justify some cases of , global value num- bytecode, have been designed to minimize or eliminate un- bering, and other important “textbook” optimizations, caus- defined behaviors. On the other hand, compilers often sup- ing long-standing bugs. port one or more forms of undefined behavior in their IR: We present solutions to the problems we have identified all of LLVM, GCC, the Intel C/C++ compiler, and the Mi- in LLVM’s IR and show that most optimizations currently in crosoft Visual C++ compiler do. At this level, undefined be- LLVM remain sound, and that some desirable new transfor- havior must walk a very thin line: the semantics of the IR mations become permissible. Our solutions do not degrade have to be tight enough that source-level programs can be compile time or performance of generated code. efficiently compiled to IR, while also being weak enough to allow desirable IR-level optimizations to be performed and 1. Introduction to facilitate efficient mapping onto target instruction sets. Undefined behavior (UB) in LLVM falls into two cat- Some programming languages, intermediate representa- egories. First, “immediate UB” for serious errors, such as tions, and hardware platforms define a set of erroneous op- dividing by zero or dereferencing an invalid pointer, that erations that are untrapped and that may cause the system have consequences such as a processor trap or RAM cor- to behave badly. These operations, called undefined behav- ruption. Second, “deferred UB” for operations that produce iors, are the result of design choices that can simplify the unpredictable values but are otherwise safe. Deferred UB is implementation of a platform, whether it is implemented necessary to support speculative execution, such as hoisting in hardware or software. The burden of avoiding these be- potentially undefined operations out of loops. Deferred UB haviors is then placed upon the platform’s users. Because in LLVM comes in two forms: an undef value that mod- undefined behaviors are untrapped, they are insidious: the els a register with indeterminate value, and poison, a slightly unpredictable behavior that they trigger often only shows more powerful form of UB that taints the dataflow graph and itself much later. triggers immediate UB if it reaches a side-effecting opera- The AVR32 processor architecture document [2, p. 51] tion. provides an example of hardware-level undefined behavior: The presence of two kinds of deferred UB, and in par- If the region has a size of 8 KB, the 13 lowest bits in ticular the interaction between them, has often been consid- the start address must be 0. Failing to do so will result ered to be unsatisfying, and has been a persistent source of in UNDEFINED behaviour. discussions and bugs. LLVM has long contained optimiza- tions that are inconsistent with the documented semantics The hardware developers have no obligation to detect or han- and that are inconsistent with each other. The LLVM com- dle this condition. ARM and x86 processors (and, indeed, munity has dealt with these issues by usually fixing a prob- most CPUs that we know of) also have undefined behaviors.

1 2016/11/15 lem when it can be demonstrated to lead to an end-to-end IR—mapped onto basic processor operations with little or no miscompilation: a optimizer malfunction that leads to the overhead. As a concrete example, consider this Swift code: wrong machine code being emitted for some legal source func add(a : Int, b : Int)->Int { program. The underlying, long-standing problems—and, in- return (a & 0xffff) + (b & 0xffff); deed, a few opportunities for end-to-end miscompilations— } remain unsolved. To prevent miscompilation, to permit rigorous reasoning Although a Swift implementation must trap on integer over- about LLVM IR, and to support formal-methods-based tools flow, the compiler observes that overflow is impossible and such as superoptimizers, translation validators, and program emits this LLVM IR: verifiers, we have attempted to redefine the UB-related parts define i64 @add(i64 %a, i64 %b) { of LLVM’s semantics in such a way that: %0 = and i64 %a, 65535 %1 = and i64 %b, 65535 • Compiler developers can understand and work with the %2 = add nuw nsw i64 %0, %1 semantics. ret i64 %2 • Long-standing optimization bugs can be fixed. } • Few optimizations currently in LLVM need to be re- Not only has the checked addition operation been lowered moved. to an unchecked one, but in addition the add instruction has • Compilation time and execution time of generated code been marked with LLVM’s nsw and nuw attributes, indicat- are largely unaffected. ing that both signed and unsigned overflow are undefined. In isolation these flags provide no benefit, but they may enable • The LLVM code base can be migrated to the new se- additional optimizations after this function is inlined. When mantics in a series of modest stages that cause very little the Swift benchmark suite1 is compiled to LLVM, about one breakage. in eight addition instructions has an attribute indicating that This paper describes and evaluates our ongoing efforts. integer overflow is undefined. In this particular example the nsw and nuw flags are re- dundant since an optimization pass could re-derive the fact 2. Undefined Behavior in the IR that the add cannot overflow. However, in general these flags Undefined-behavior-related compiler optimizations are of- and others like them add real value by avoiding the need ten thought of as black magic, even by compiler developers. for potentially expensive static analyses to rediscover known In this section we introduce IR-level undefined behavior and program facts. Also, some facts cannot be rediscovered later, show examples where it enables useful optimizations. even in principle, since information is lost at some compila- tion steps. 2.1 Undefined Behavior 6= Unsafe Programming 2.2 Enabling Speculative Execution Despite the very poor example set by C and C++, there is no The C code in Figure 1 executes undefined behavior if x is inherent connection between undefined behavior (UB) and INT MAX and n > 0, because in this case the signed addition unsafe programming. Rather, UB simply reflects a refusal x + 1 overflows. A straightforward translation of the C code to systematically trap program errors at one particular level into LLVM IR, also shown in Figure 1, has the same domain of the system: the responsibility for avoiding these errors of definedness as the original code: the nsw modifier to the is delegated to a higher level of abstraction. For example, add instruction indicates that it is defined only when signed of course, many safe programming languages have been overflow does not occur. compiled to machine code, the unsafety of which in no We would like to optimize the loop by hoisting the invari- way compromises the high-level guarantees made by the ant expression x + 1. If integer overflow triggered imme- language implementation. Swift and Rust are compiled to diate undefined behavior, this transformation would be ille- LLVM IR; some of their safety guarantees are enforced by gal because it makes the domain of definedness smaller: the dynamic checks in the emitted code, other guarantees are code becomes incorrect when x is INT MAX, regardless of the made through type checking and have no representation at value of n. LLVM works around this problem by adding the the LLVM level. Even C can be used safely if some tool in concept of deferred undefined behavior: the undefined addi- the development environment ensures—either statically or tion is allowed, but the resulting value cannot be relied upon. dynamically—that it will not execute UB. It is easy to see that after hoisting the add, the code remains The essence of undefined behavior is the freedom to avoid safe in the n = 0 case, because x1 is not used. While de- a forced coupling between error checks and unsafe opera- ferred UB is useful, it is not appropriate in all situations. For tions. The checks, once decoupled, can be optimized, for ex- example, division by zero can trigger a processor trap and ample by being hoisted out of loops or eliminated outright. The remaining unsafe operations can be—in a well-designed 1 https://swift.org/blog/swift-benchmark-suite/

2 2016/11/15 for (int i = 0; i < n; ++i) { any particular behavior would require some platforms to in- a[i] = x + 1; troduce potentially expensive code. } 2.4 Beyond Undef In C and C++, a + b > a is semantically identical to b > 0 init: because signed overflow is undefined (assuming a and b are of a signed type like int). If the original expression is br %head translated to this LLVM: head: %add = add %a, %b %i = phi [ 0, %init ], [ %i1, %body ] %cmp = icmp sgt %add, %a %c = icmp slt %i, %n br %c, %body, %exit the optimization becomes illegal since the add instruction wraps around on overflow. Moreover, this problem cannot be body: fixed by defining a version of add that returns undef when %x1 = add nsw %x, 1 there is a signed integer overflow. %ptr = getelementptr %a, %i To see the inadequacy of undef, let a = INT MAX and store %x1, %ptr b = 1. The addition overflows and the expression simplifies %i1 = add nsw %i, 1 to undef > INT MAX, which is always false since there is no br %head value of integer type that is larger than INT MAX. However, the desired optimized expression, b > 0, simplifies to 1 > 0, Figure 1. C code and its corresponding LLVM IR. We want which is true. Thus, the optimization is illegal: it would to hoist the invariant add out of the loop. The nsw attribute change the semantics of the program. means the add is undefined for signed overflow. To justify this transformation, LLVM has a second kind of deferred undefined behavior, the poison value. The original expression is compiled to this code instead: an out-of-bounds store can corrupt RAM. These operations, and a few others in LLVM IR, are immediate undefined be- %add = add nsw %a, %b haviors and programs must not execute them. %cmp = icmp sgt %add, %a The nsw (no signed wrap) attribute on the add instruction 2.3 Undefined Value indicates that it returns a poison value on signed overflow. We need a semantics for deferred undefined behavior. A rea- Poison values, unlike undef, are not restricted to being a sonable choice is to specify that an undefined value repre- value of a given type. Most instructions return poison if any sents any value of the given type. A number of compiler IRs of their inputs is poison. Thus, poison is a stronger form of support this abstraction; in LLVM it is called undef.2 UB than undef. Undef is useful because it lets the compiler avoid mate- Figure 3 shows another example motivating the poison rializing an arbitrary constant in situations—such as the one value. The getelementptr instruction (GEP for short) per- shown in Figure 2—where the exact value does not matter. In forms pointer arithmetic. The GEP there is computing a + this example, assume that cond2 implies cond in some non- i ∗ 4, assuming that a is an array of 4-byte integers. trivial way such that the compiler cannot see it. Thus, there The sign extend operation sext in the loop body handles is no need to initialize variable x at its point of declaration the mismatch in bitwidth between the 32-bit induction vari- since it is only passed to g after being assigned a value from able and 64-bit pointer size. It is the low-level equivalent of f’s return value. If the IR lacked an undef value, the com- casting an int to long in C. Therefore, the GEP in the pro- piler would have to use an arbitrary constant, perhaps zero, gram is actually computing a + sext(i) ∗ 4. We would like to initialize x on the branch that skips the first if statement. to optimize away the sext instruction since sign extension, This, however, increases the code size by one instruction and unlike zero extension, is usually not free at runtime. two bytes on x86 and x86-64. Little optimizations like this If we convert the loop i into long we can add up across a large program. can remove the sign extension within the loop body (at the LLVM uses undef in a few other situations. First, to rep- expense of adding a sign extend of n to the entry basic resent the values of padding in structures, arrays, and bit block). This transformation improves performance by up to fields. Second, operations such as shifting a 32-bit value by 39%, depending on the microarchitecture, since we save one 33 places evaluate to undef. LLVM’s shift operators result in instruction per iteration (cltq — sign extend eax into rax). deferred UB for shift-past-bitwidth because different proces- The transformation is only valid if pointer arithmetic sors produce different results for this condition; mandating overflow is undefined. If it is defined to wrap around, the transformation is not semantics-preserving, since a sequence 2 http://nondot.org/sabre/LLVMNotes/UndefinedValue.txt of values of a signed 32-bit counter is different from a signed

3 2016/11/15 ; test cond entry: testb %dil, %dil br %cond, %ctrue, %cont je ctrue ; return value goes in %eax ctrue: callq f %xf = call @f() br %cont ctrue: ; test cond2 int x; cont: testb %bl, %bl if (cond) %x = phi [ %xf, %ctrue ], [ undef, %entry ] je exit x = f(); br %cond2, %c2true, %exit ; whatever is in %eax ; gets passed to g() if (cond2) c2true: movl %eax, %edi g(x); call @g(%x) callq g (a) (b) (c)

Figure 2. If cond2 implies cond, the C code in (a) does not perform UB by accessing x before it is assigned a value. (b) is Clang’s translation into LLVM IR and (c) is the eventual x86-64. for (int i = 0; i <= n; ++i) { parison at %c would always be true if %n = INT MAX. On the a[i] = 42; other hand, the comparison with 64-bit integers would return } false instead. If overflow is defined to return poison, an induction vari- able overflow would result in %iext = sext(poison), entry: which is equal to poison, which would make the compari- br %head son at %c equal to poison as well. Therefore, this semantics justifies induction variable widening. head: %i = phi [ 0, %entry ], [ %i1, %body ] 3. Inconsistencies in LLVM %c = icmp sle %i, %n In this section we present several examples of problems with br %c, %body, %exit the current LLVM IR semantics. body: 3.1 Duplicate SSA uses %iext = sext %i to i64 %ptr = getelementptr %a, %iext In some CPU micro-architectures, addition is cheaper than store 42, %ptr multiplication. It may therefore be beneficial to rewrite 2×x %i1 = add nsw %i, 1 as x + x. In LLVM IR we want to rewrite: br %head %y = mul %x, 2 Figure 3. C code and corresponding LLVM IR on x86-64. as: We want to eliminate the sext instruction in the loop body. %y = add %x, %x Algebraically, these two expressions are equivalent. How- 64-bit counter’s. Therefore, we would be changing the set of ever, consider the case where %x is undef. In the original stored locations in case of overflow. code, the result can be any even number, while in the trans- For a compiler to perform the aforementioned transfor- formed code the result can be any number. Therefore, the mation, it needs to prove that either the induction variable transformation is wrong because we have increased the set does not overflow, or if it does it is a signed operation and of possible outcomes. therefore it does not matter. As we have seen before, signed This problem happens because each use of undef in integer overflow cannot be immediate UB since that would LLVM can yield a different result. Therefore, it is not cor- prevent hoisting math out of loops. If signed integer overflow rect in general to increase the number of uses of a given SSA returns undef, the resulting semantics are too weak to jus- register in an expression tree, unless it can be proved to not tify the desired optimization: on overflow we would obtain hold the undef value. Even so, LLVM incorrectly performs sext(undef) for %iext, which has all the most-significant similar transformations. bits equal to either zero or one. Therefore, the maximum There are, however, multiple advantages to defining value %i1 could take would be INT MAX and thus the com- undef as yielding a possibly different value on each use.

4 2016/11/15 For example, it helps reduce register pressure since we do } not need to hold the value of an undef in a register to give This transformation assumes that branching on poison is the same value to all uses. Secondly, peephole optimiza- not UB, but is rather a non-deterministic choice. Otherwise, tions can easily assume than an undef takes whatever value if c2 was poison, then loop unswitching would be introduc- is convenient to do a particular transformation, which they ing UB if c was always false (i.e., if the loop never executed). could not easily do if undef had to remain consistent over The goal of global (GVN) is to find multiple uses. Another advantage is to allow duplication of equivalent expressions and then pick a representative one memory loads given that loads from uninitialized memory and remove the remaining (redundant) computations. For yield undef. If undef was defined to return a consistent example, in the following code, variables t, w, and y all hold value for all uses, a duplicated load could potentially re- the same value within the “then” block: turn a different value if loading from uninitialized memory, which would be incorrect. t = x + 1; if (t == y) { 3.2 Hoisting operations past control-flow w = x + 1; Consider this example: foo(w); } if (k != 0) { while (c) { Therefore, GVN can pick y as the representative value and use(1 / k); transform the code into: } t = x + 1; } if (t == y) { Since 1/k is loop invariant, LLVM would like to hoist it foo(y); out of the loop. Hoisting the division seems safe because the } top-level if-statement ensures that division by zero will not However, if y is a poison value and w is not poison, happen. This gives: then we have changed the code from using a regular value if (k != 0) { as function argument to passing a poison value to foo. If int t = 1 / k; GVN followed loop unswitching’s interpretation of branch- while (c) { on-poison (non-deterministic branch), then the transforma- use(t); tion would be unsound. However, if we decide instead that } branch-on-poison is UB, then GVN is fine, since the com- } parison “t == y” would be poison and therefore the orig- Now consider the case where k is undef. Since each inal program would be already executing UB. This, how- use of undef can yield a different result, we can have the ever, contradicts the assumption made by loop unswitching. top-level if-condition being true and still divide by zero, In other words, loop unswitching and GVN require different when this could not have happened in the original program semantics for branch on poison in LLVM IR in order to be if the execution never reached the division (e.g., if c was correct. By assuming different semantics, they perform con- false). Thus, this transformation is unsound. LLVM used to flicting optimizations, enabling end-to-end miscompilations. do it, but stopped after it was shown to lead to end-to-end 3.4 Select and Poison miscompilation.3 LLVM’s ternary select instruction, like the ?: opera- 3.3 Global Value Numbering vs. Loop Unswitching tor in C/C++, uses a Boolean to choose between its argu- When c2 is loop-invariant, LLVM’s loop unswitching opti- ments. Either choice for how select deals with poison— mization transforms code of this form: producing poison if its not-selected argument is poison, or not—could be used as the basis for a correct optimizer. How- while (c) { ever, LLVM’s optimization passes have not consistently im- if (c2) { foo } plemented either choice. The LLVM Language Reference else { bar } Manual4 implies that if either argument to a select is poi- } son, the output is poison. to: The SimplifyCFG pass tries to convert control flow into select instructions: if (c2) { while (c) { foo } br %cond, %true, %false } else { true: while (c) { bar } br %merge

3 http://llvm.org/PR21412 4 http://llvm.org/docs/LangRef.html

5 2016/11/15 false: to: br %merge %v = %x merge: %x = phi [ %a, %true ], [ %b, %false ] This is wrong because %x could be poison, and poison is stronger than undef. Gets transformed into: 3.5 Summary br %merge merge: In this section we showed that undefined behavior, which %x = select %cond, %a, %b was added to LLVM’s IR to justify certain desirable trans- formations, is exceptionally tricky and has lead to conflicting For this transformation to be correct, select on poison assumptions among compiler developers. These conflicts are cannot be UB if branching on poison is not. Moreover, it can reflected in the code base. Although the LLVM developers only be poison when the chosen value at runtime is poison almost always fix overt problems that can be demonstrated to (in order to match the behavior of phi). lead to end-to-end miscompilations, the latent problems we LLVM also performs the reverse transformation, usually have shown here are long-standing and have so far resisted late in the pipeline and for target ISAs where it is preferable attempts to fix them (any fix that makes too many existing to branch rather than do a conditional move. For this trans- optimizations illegal is unacceptable). In the next section we formation to be correct, branch on poison can only be UB if introduce a modified semantics for UB in LLVM that we be- select on a poison condition is also UB. Since we want both lieve fixes all known problems and is otherwise acceptable. transformations to be feasible, we can conclude that the be- havior of branching on poison and select with a poison con- 4. Proposed Semantics dition has to be equivalent. In Sec. 2 we showed that undef and poison enable useful op- If select on a poison condition is UB, it makes it very hard for the compiler to introduce select instructions in replace- timizations that programmers might expect. In Sec. 3, how- ment of arithmetic. E.g., the following transformation that ever, we showed that undef and poison, as currently defined, replaces an unsigned division with a comparison would be are inconsistent with other desirable transformations and that they interact poorly with each other. Our proposal—arrived invalid (which ought to be valid for any constant C < 0): at after many iterations and much discussion, and currently %r = udiv %a, C under discussion with the broader LLVM community—is to tame undefined behavior in LLVM as follows: to: • Remove undef and use poison instead. %c = icmp ult %a, C %r = select %c, 0, 1 • Introduce a new instruction: This transformation is desirable since it removes a poten- %y = freeze %x tially expensive operation like division. However, if select on freeze is a nop unless its input is poison, in which case poison is UB, the transformed program would execute UB if it non-deterministically chooses an arbitrary value of the %a was poison, while the original program would not. As type. All uses of a given freeze return the same value, but we have seen previously, if select (and therefore branch) on different freezes of a value may return different constants. poison is not UB, GVN is unsound, but that is incompatible with the transformation above. • All operations over poison unconditionally return poison Finally, it is often desirable to view select as arithmetic, except phi, select, and freeze. allowing transformations like: %x = select %c, true, %b • Branching on poison is immediate UB. to %x = or %c, %b. This property of equivalence with We believe that the presence of two kinds of deferred arithmetic, however, requires making the return value poison undefined behavior is simply too difficult for developers to if any of the arguments is poison, which breaks soundness reason about: one of them had to go. We define phi and for the phi/branch to select transformation (SimplifyCFG in select to conditionally return poison, and branching on LLVM) above. poison to be UB, because these decisions reduce the number There is a tension between the different semantics that of freeze instructions that would otherwise be needed. select can take and which optimizations can be made sound. A risk of using freeze is that it disables subsequent opti- Currently, different parts of LLVM implement different se- mizations that take advantage of poison. Our observation is mantics for select, which risks end-to-end miscompilation. that many of these optimizations were illegal anyway, and Finally, it is very easy to make mistakes when both un- that it is better to disable them explicitly rather than implic- def and poison are involved. LLVM currently performs the itly. Also, as we show later, we usually do not need to intro- following substitution: duce many freeze instructions. We experimentally show that %v = select %c, %x, undef freeze does not unduly impact performance.

6 2016/11/15 λ . poison if v =poison isz↓(v) or ty∗↓(v) = stmt :: = reg = inst | br op, label, label | store op, op (std) otherwise inst :: = binop attr op, op | conv op | bitcast op | select op, op, op | icmp cond, op, op | hsz ×tyi↓(v) = ty↓(v[0]) ++ ... ++ ty↓(v[sz − 1]) phi ty, [op, label] ..., [op, label] | freeze op | poison if ∃i. b[i]=poison getelementptr op,..., op | load op | isz↑(b) or ty∗↑(b) = (std) otherwise extractelement op, constant | insertelement op, op, constant hsz ×tyi↑(b) = hty↑(b0),..., ty↑(bsz−1)i cond :: = eq | ne | ugt | uge | slt | sle where b = b0 ++ ... ++ bsz−1 ty :: = isz | ty∗ | binop :: = add | udiv | sdiv | shl | and | or For base types, ty↓ transforms poison into the bitvector of attr :: = nsw | nuw | exact all poison bits, and defined values into their standard low- op :: = reg | constant | poison level representation. For vector types, ty↓ transforms val- conv :: = zext | sext | trunc ues element-wise, where ++ denotes the bitvector concate- nation. Conversely, for base types, ty↑ transforms bitwise Figure 4. Partial syntax of LLVM IR statements. Types representations with at least one poison bit into poison, and include arbitrary bitwidth integers, pointers ty∗, and vectors transforms fully defined ones in the standard way. For vector < elems × ty > that have a statically-known number of types, ty↑ transforms bitwise representations element-wise. elements elems. Now we give semantics to selected instructions in Fig- ure 5. It shows how each instruction updates the register file R ∈ Reg and the memory M ∈ Mem, denoted R,M,→ 4.1 Syntax 0 0 R ,M . The value op R of operand op over R is given by: Figure 4 gives the partial syntax of LLVM IR statements. J K LLVM IR is typed, but we omit operand types for brevity, r R = R(r) //register and throughout the paper when these are implicit or non- J K C R = C //constant essential. The IR includes standard unary/binary arithmetic J K poison R = poison //poison instructions, load/store operations, a phi node, a compari- J K son operator, multiple type casting instructions, conditional The load operation Load(M, p, sz) successfully returns the branching, instructions to access and modify vectors, etc. We loaded bit representation only if p is a non-poison address also include the new freeze instruction and the new poison pointing to a valid block of size at least sz in the memory value, while removing the old undef value. M. The store operation Store(M, p, b) successfully stores the bit representation b in the memory M and returns the 4.2 Semantics updated memory only if p is a non-poison address pointing We first define the semantic domains as follows. to a valid block of size at least sizeof(b).

sz Num(sz) :: = { i | 0 ≤ i < 2 } 5. Illustrating the New Semantics isz :: = Num(sz) ]{ poison } Jty∗K :: = Num(64) ]{ poison } In this section we show how the proposed semantics enable Jhsz ×K tyi :: = {0,..., sz − 1} → ty optimizations that cannot be performed soundly today in LLVM. We also show how to encode certain C/C++ idioms MemJ K :: = Num(64) 9 h8×iJ1i K Name :: = { %x, %y,... }J K in LLVM IR for which changes are required in the front-end Reg :: = Name → { (ty, v) | v ∈ ty } (Clang), as well as optimizations that need tweaks to remain J K sound. Here ty denotes the set of values of type ty, which are ei- 5.1 Loop Unswitching ther poisonJ K or fully defined for base types, and are element- wise defined for vector types. The memory Mem is bitwise We showed previously that GVN and loop unswitching defined since it has no associated type. Specifically, Mem could not be used together. With the new semantics, GVN partially maps a 64-bit address to a bitwise defined byte (we becomes sound, since we chose to trigger UB in case of assume, with no loss of generality, that pointers are 64 bits). branch on poison value. Loop unswitching, however, re- The register file Reg maps a name to a type and a value of quires a simple change to become correct. When a branch is that type. hoisted out of a loop, the condition needs to be frozen. E.g., We define two meta operations: conversion between val- while (c) { ues of types and low-level bit representation. These opera- if (c2) { foo } tions are used later for defining semantics of instructions. else { bar } } ty↓ ∈ ty → hsizeof(ty)×i1i ty↑ ∈ Jhsizeof(K Jty)×i1i → tyK is transformed into: J K J K

7 2016/11/15 (r = freeze isz op) (r = freeze ty op) for ty = hn×iszi op = poison v ∈ Num(sz) (r = phi ty [op1,L1],..., [opn,Ln]) R op R = hv0, . . . , vn−1i J K  J K 0  R,M,→ R[r 7→ v],M ∀i. (vi = poison ∧ vi ∈ Num(sz)) opi R = vi 0 J K (coming from Li) ∨ (vi = vi 6= poison) op R = v 6= poison R,M,→ R[r 7→ vi],M J K 0 0 R,M,→ R[r 7→ v],M R,M,→ R[r 7→ hv0, . . . , vn−1i],M

(r = select op, ty op1, op2)

op R = poison op R = 1 op1 R = v1 op R = 0 op2 R = v2 J K J K J K J K J K R,M,→ R[r 7→ poison],M R,M,→ R[r 7→ v1],M R,M,→ R[r 7→ v2],M

(r = add nsw isz op1, op2)

(r = and isz op1, op2) op1 R = poison op2 R = poison J K J K op1 R = poison op2 R = poison R,M,→ R[r 7→ poison],M R,M,→ R[r 7→ poison],M J K J K R,M,→ R[r 7→ poison],M R,M,→ R[r 7→ poison],M op1 R = v1 op2 R = v2 v1 + v2 overflows (signed) J K J K op1 R = v1 6= poison op2 R = v2 6= poison R,M,→ R[r 7→ poison],M J K J K R,M,→ R[r 7→ v1&v2],M op1 R = v1 op2 R = v2 v1 + v2 no signed overflow J K J K R,M,→ R[r 7→ v1 + v2],M

(r = load ty, ty∗ op) (store ty op1, ty∗ op)

(r = bitcast ty1 op to ty2) Load(M, op R, sizeof(ty)) fails Store(M, op R, ty↓( op1 R)) fails J K J K J K op R = v R,M,→ UB R,M,→ UB J K 0 R,M,→ R[r 7→ ty 2↑(ty 1↓(v))],M Load(M, op R, sizeof(ty)) = v Store(M, op R, ty↓( op1 R)) = M J K J K J K R,M,→ R[r 7→ ty↑(v)],M R,M,→ R,M 0

Figure 5. Semantics of selected instructions. if (freeze(c2)) { br %merge while (c) { foo } false: } else { br %merge while (c) { bar } merge: } %c = phi [ %a, %true ], [ %b, %false ]

By using the freeze instruction, we avoid introducing Freeze ensures that no UB is triggered if %c is poison. UB in case c2 is poison and force a non-deterministic choice We believe, however, that this kind of transformation may between the two loops instead. This is a refinement of the be delayed to lower-level IRs where poison usually does not original code fragment, which would trigger UB if c2 was exist. poison and the loop executed at least once. Freeze can be avoided if the branch on c2 is placed in the 5.3 Bit Fields loop pre-header (since then the loop is guaranteed to execute C and C++ have bit fields in structures. These fields are often at least once). The compiler further needs to prove that the packed together to form a single word-sized field (depending branch on c2 was always reachable (e.g., that all function on the ABI). Since in our semantics loads of uninitialized calls before the “if (c2)” statement always return). data yield poison, and bit-field store operations also require 5.2 Reverse Predication a load (even the first store), extra care is needed to ensure that a store to a bit field does not always yield poison. In some CPU architectures it is beneficial to compile a Therefore we propose to lower the following C code: select instruction into a set of branches rather than a con- ditional move. We support this transformation using freeze: mystruct.myfield = foo; %x = select %c, %a, %b into: can be transformed to: %val = load %mystruct %c2 = freeze %c %val2 = freeze %val br %c2, %true, %false %val3 = ...combine %val2 and %foo... true: store %val3, %mystruct

8 2016/11/15 We need to freeze the loaded value, since it might be the while (...) { first store to the bit field and therefore it might be uninitial- x = a / b; ized. If the stored value foo is poison, this bit field store y = freeze(x); operation contaminates the adjacent fields when it is com- use(y) bined through bit masking operations. This is fine, however, } since if foo is poison then UB must have already occurred in the source program and so we can taint the remaining fields. 5.6 Pitfall 2: Semantics of static analyses An alternative way of lowering bit fields is to use vec- Static analyses in LLVM usually return a value that holds tors or use the structure type. These are superior alternatives, only if all of the analyzed values are not poison. For exam- since they allow perfect store-forwarding (no freezes), but ple, if we run the isKnownToBeAPowerOfTwo analysis on currently they are both not well supported by LLVM’s back- value “%x = shl 1, %y”, we get a statement that %x will end. E.g., with vectors: be always a power of two. However, if %y is poison, then %x will also be poison, and therefore it could take any value, %val = load <32 x i1> %mystruct including a non-power-of-two value. %val2 = insertelement %foo, %val, ... Many LLVM analyses are not sound over-approximations store %val2, %mystruct with respect to poison. The main reason is that if poison was Here we assume the word size is 32 bits, and therefore taken into account then most analyses would return the worst we ask LLVM to load a vector of 32 bits instead of loading result (top) most of the time, rendering them useless. a whole word. Since our semantics for vectors define that This semantics is generally fine when the result of the poison is determined per element, a poison bit field cannot analyses are used for expression rewriting, since the original contaminate adjacent fields. and transformed expressions will yield poison when any of the inputs is poison. However, this is not true when dealing 5.4 Load Combining and Widening with code movement past control-flow. For example, we Sometimes it is profitable to combine or widen loads to align would like to hoist the division out of the following loop with the word size of a given CPU. However, if the compiler (assuming a is loop invariant): chooses to widen, say, a 16-bit load into a 32-bit load, then while (c) { care must be taken because the remaining 16 bits may be b = 1 / a; poison or uninitialized and they should not poison the value } the program was originally loading. To solve the problem, we also resort to vector loads, e.g., If the isKnownToBeAPowerOfTwo analysis states that %a = load i16, %ptr a is always a power of two, we are tempted to conclude that hoisting the division is safe since a cannot possibly be can be transformed to: zero. However, a may be poison, and therefore hoisting the division would introduce UB if the loop did not execute. %tmp = load <2 x i16>, %ptr In summary, there is a trade-off for the semantics of %a = extractelement %tmp, 0 static analysis regarding how they treat poison. LLVM is As for bit fields, vector loads make it explicit to the considering extending APIs of relevant analyses to return compiler that we are loading unrelated values, even though up-to results with respect to poison, i.e., the result of an at assembly level it is the still the same load of 32 bits. analysis is sound if a set of values is non-poison. Then it is up to the client of the analysis to ensure this is the case if it 5.5 Pitfall 1: Freeze duplication wants to use the result of the analysis in a way that requires Duplicating freeze instructions is not allowed, since each the value to be non-poison (e.g., to hoist instructions that freeze instruction may return a different value if the input may trigger UB past control-flow). is poison. For example, this blocks loop sinking optimiza- tion (dual of loop invariant code motion). Loop sinking is 6. Implementation beneficial if, e.g., a loop is rarely executed. For example, it We prototyped our new semantics in a development version is not sound to perform the following transformation: of LLVM 3.9 (LLVM/Clang svn revision r273843, from x = a / b; June 27 2016). We made the following modifications to y = freeze(x); LLVM, changing a total of 574 lines of code: while (...) { • Added a new freeze instruction to the IR and to Se- use(y) } lectionDAG (SDAG), and added appropriate translation from IR’s freeze into SDAG’s freeze and then to Ma- to: chineInstruction (MI).

9 2016/11/15 • Fixed loop unswitching to freeze the hoisted condition three instructions (over 2-bit integer arithmetic) and then we (as described in Section 5.1). used Alive [15] to validate both individual passes (InstCom- • Fixed several unsound InstCombine (peephole) transfor- bine, GVN, Reassociation, and SCCP) and -O2. This way mations handling select instructions (e.g., the problems we ensure that Alive and LLVM agree on the semantics of outlined in Section 3.4). the IR and that there are no more invalid optimizations under the new semantics. • Added simple transformations to InstCombine to op- timize spurious uses of freeze, such as transforming freeze(freeze(x)) to freeze(x) and freeze(const) Opportunities for Improvement Our prototype is not per- to const. fect and we leave ample room for future improvement. For example, we discussed in Section 3.2 that hoisting divisions We made a single change to Clang, modifying just one out of loops is currently disabled in LLVM. We did not at- line of code: we changed the lowering of bit field stores to tempt to reactivate this optimization. freeze the loaded value (as described in Section 5.3). Some optimizations need to be made freeze-aware. At Lowering freeze LLVM IR is lowered into SDAG. We in- present they will conservatively fail to optimize since they do troduced a freeze operation in SDAG as well, so a freeze not recognize the new instruction. For example, GVN does in LLVM IR maps directly into a freeze in SDAG. Addi- not yet know how to fold equivalent freeze instructions. We tionally, we had to teach type legalization (SDAG level) to consulted with a GVN expert and we were told it is possible handle freeze instructions with operands of illegal type. For to extend the algorithm to support freeze instructions, with (i.e., when going from SDAG to MI), the caveat that for GVN to be sound for freeze it has to we eliminate freeze operations and we convert poison values replace all uses of a given freeze instruction if it wants to into pinned undef registers. At MI level there is no poison, so replace one of the uses. we do not need freeze anymore. MI’s undef, like LLVM IR’s, The lowering of freeze into assembly is currently sub- may yield a different value for each use. We work around this optimal. Our prototype reserves a register for each poison behavior by pinning undef into a register and introducing a value within a function. However, in certain architectures it copy for each use. We confirmed LLVM does not propagate is possible to either reuse an already pinned register that does undef further and copies are kept. not change within the function (such as EBP or ESP on x86) or use a constant register (such as R0 on ARM that is usually Optimizations We had to implement a few optimizations zero). This strategy would allow us to save up to one register to recover some performance regressions we observed in per poison value and therefore reduce register pressure. early prototypes. These regressions were due to LLVM op- timizers not recognizing the new freeze instruction and con- servatively giving up. For example, on x86 it is usually Limitations of the prototype Our prototype has a few lim- preferable to lower a branch on an and/or operation into a itations that make it unsound in theory, even though we pair of jumps rather than do the and/or operation and then do did not detect any end-to-end miscompilations. These lim- a single jump. This transformation got blocked if the branch itations do not reflect fundamental problems with our pro- was done on a frozen and/or operation. We modified Code- posed semantics, but they require more aggressive changes GenPrepare (a phase right before lowering IR to SDAG) to to LLVM than we have performed so far. Also, bear in mind support freeze. that LLVM was already unsound before our changes, but in For x86, a comparison used only by a conditional branch ways that are harder to fix. is usually moved so that it is placed right before the branch, InstCombine performs a few transformations taking a se- since it is often preferable to repeat the comparison (if lect instruction and producing arithmetic operations. For ex- needed) than save the result to reuse later. Since freeze ample, “select %c, true, %x” is transformed into “or instructions cannot be sunk into loops, this transformation %c, %x”. This transformation is incorrect if %c may be poi- is blocked if the branch is over a frozen comparison. We son. A safe version requires freezing %c for the or operation. changed CodeGenPrepare to transform “freeze(icmp %x, Alternatively, we could change LLVM to canonicalize on the const)” to “icmp (freeze %x), const” when deemed select instead, but that requires improvements to the backend profitable. Note that we cannot do this transformation early to make it recognize the idiom to produce efficient code. in the pipeline since it would break some static analyses Another limitation is related to vectors. We have shown (like scalar evolution)—the transformed expression is a re- that widening can be done safely by using vector operations. finement of the original one. However, LLVM does not yet handle vectors as first-class values, which frequently results in generation of sub-optimal Testing the prototype To test the correctness of the proto- code when vectors are used. Therefore, we did not fix any type, we used LLVM’s and Clang’s test suites. We also used widening done by LLVM (e.g., in GVN, in Clang’s lowering opt-fuzz5 to exhaustively generate all LLVM functions with of bit-fields, or in Clang’s lowering of certain parameters that 5 https://github.com/regehr/opt-fuzz require widening by some ABIs).

10 2016/11/15 7. Performance Evaluation of the total number of IR instructions. The gcc benchmark, The section evaluates the performance of our prototype in however, had 2,198 freeze instructions (0.16% of total), terms of compile time and size and speed of generated code. since it contains a large number of bit-field operations. 7.1 Experimental Setup Run time Change in performance for SPEC CPU 2006 is shown in Figure 6. The results are in the range of ±1.5%, Environment We used two machines with different micro- with slightly different results on the two machines. architectures for evaluation. Machine 1 had an Intel Core i7 For LNT benchmarks, only 26% had different IR after op- 870 CPU at 2.93 GHz, and Machine 2 had an Intel Core i5 timization, and only 77% of those produced different assem- 6600 CPU at 3.30 GHz. Both machines had 8 GB of RAM bly (20% overall resulted in a different binary). Excluding and run Ubuntu 16.04. To get consistent results, we disabled noisy tests, we observed a range of -2% to 1% performance HyperThreading, SpeedStep, Turbo Boost, and ASLR. We change on machine 1 and -4% to 1% on machine 2. 6 used the cpuset tool to grant exclusive hardware resources It is normal that run time results fluctuate a bit when a to the benchmark process. new instruction is added to an IR, since some optimizations Benchmarks We used three benchmarks: SPEC CPU and heuristics need to learn how to handle the new instruc- 2006, LLVM Nightly Test (LNT), and five large single- tion. We did only a fraction of the required work, but the file programs ranging from 7k to 754k lines of code each.7 results are already reasonable, which shows that the seman- SPEC CPU consists of 12 integer-only (CINT) and seven tics can be deployed incrementally. floating-point (CFP) benchmarks (we only consider C/C++ benchmarks). LNT consists of 280 benchmarks with about 8. Related Work 1.5 million lines of code in total. We are not aware of any work directly studying the seman- Measurements We measured running time and peak mem- tics of undefined behavior of IRs. We instead survey work ory consumption of the compiler, running time of compiled related to orthogonal improvements and extensions to IRs, programs, and generated object file size. on undefined behavior, and on formalization of IRs. To estimate compilation and running time, we ran each Compiler IRs Most of the past work on compiler IRs has benchmark three times (except for LNT, where we run five focused on extensions to improve efficiency and enable more times to cope with shorter running times) and took the me- complex static analyses. However, most of this work has dian value. To estimate peak memory consumption, we used ignored undefined behavior. the ps tool and recorded the rss and vsz columns every 0.02 Static single assignment form (SSA [6]) is a functional seconds. For the single file programs, we used the compiler’s IR where each variable is assigned only once. This enables -c option to exclude linking time. To measure object file (time and space) efficient implementations of sparse static size, we recorded the size of .o files and the number of IR analyses. SSA is now used by most production compilers instructions in LLVM bitcode files. All programs were com- (for imperative languages). piled with -O3 and the comparison was done between our There have been multiple proposals to extend SSA, in- prototype and the version of LLVM from which we forked. cluding gated SSA [18] and static single information form 7.2 Results (SSI [1]), whose goal is to enable efficient path-sensitive analyses. In memory SSA form [17, 22], memory is treated Compile time On both machines, compile time was largely like scalar variables and it is put in SSA form, such that unaffected by our changes. Most benchmarks were in the load/store instructions take memory as a parameter. This en- range of ±1%. There were a few exceptions with small ables, for example, simple identification of redundant loads files, such as the “Shootout nestedloop” benchmark where and easy movement of memory-related instructions. compilation time increased by 21% to 36 ms. The reason was More recently, Horn clauses have been proposed as a that an optimization () did not kick in because compiler IR [9], following the trend of recent software veri- of not knowing about freeze, which then caused a different fication tools [10]. There have also been proposals to expose set of optimizations to fire in the rest of the pipeline. parallel constructs in IRs using multiple paradigms, such as Memory consumption For most benchmarks, peak mem- π-threads [19], INSPIRE [11], SPIRE [13], and Tapir [20]. ory consumption was unchanged, and we observed a maxi- Undefined Behavior Undefined behavior is not a topic mum increase of 2% for bzip2, gzip, and oggenc. upon which there is widespread agreement either at the Object code size We observed changes in the range of source or IR levels [8, 16]. Several tools have been devel- ±0.5%. Freeze instructions represented about 0.03%–0.05% oped in the past few years to detect execution of undefined behavior in programs, such as IOC [7] and STACK [23]. 6 https://github.com/lpechacek/cpuset.git Kang et al. [12] give a formalization of C’s integer to 7 http://people.csail.mit.edu/smcc/projects/ single-file-programs/ and https://sqlite.org/2016/ pointer cast, which contains implementation-defined and un- sqlite-amalgamation-3140100.zip defined behavior features.

11 2016/11/15 1.5 Machine 1 1.5 Machine 2 Machine 1 1 Machine 2 1

0.5 0.5

0 0

-0.5 -0.5

Change in Performance (%) -1 -1 Change in Performance (%)

-1.5 -1.5

gcc mcf lbm astar milc bzip2 sjeng namd dealIIsoplex gobmk hmmer h264ref povray omnetpp sphinx3 perlbench libquantum xalancbmk Figure 6. Change in performance in % for SPEC CPU 2006: CINT on the left, CFP on the right. Positive values indicate that performance improved, and negative values indicate that performance degraded.

Formalization of IRs Vellvm [24] is a formalization of UB; arithmetic operations taking poison as input often just parts of the LLVM IR in Coq. It has limited support for yield poison). undef, and none for poison. ICC exploits UB due to signed overflow11 as does MSVC12 Alive [15] formalizes parts of the LLVM IR as part of its and GCC.13 As far as we know, they have not formalized SMT-based VCGen. Alive supports undef and poison values, their semantics, but they appear to use a similar concept but has limited support for control-flow manipulating trans- to LLVM’s poison. At least MSVC seems to suffer from formations. Souper8 (a superoptimizer for LLVM) includes similar problems as the ones we have outlined in this work limited support for undefined behavior. for LLVM. It is likely that MSVC could fix their IR in a CompCert [14] includes full formalization of multiple way similar to our solution. Similarly, the Firm develop- three-address code-based IRs that are used throughout its ers acknowledge several bugs with their handling of “Bad” pipeline. There is also an extension to CompCert that in- values; it is not clear whether it is a fundamental problem cludes the formalization of an SSA-based IR [3]. with the semantics of their IR or if these are implementation Chakraborty and Vafeiadis [5] formalize the semantics of problems. parts of the concurrency-related instructions of LLVM IR. CompCert [14] IR also has a deferred UB value called undef, which is essentially the same as poison in LLVM. 9. Undefined Behavior in Other Compilers Since branching on undef triggers UB in CompCert, certain optimizations like loop unswitching are unsound. Most production compilers have a concept like LLVM’s In summary, most modern compiler IRs support reason- undef, since it is simple, innocent-looking, and has tangible ing based on undefined behavior, but this reasoning has re- benefits. There are two common semantics for undef: one ceived little up-front design work or formal attention. where each use of undef may get a different value, as in LLVM and the Microsoft Phoenix compiler; and another where all uses of undef get the the same value, as in Firm [4], 10. Conclusion the Microsoft Visual C++ compiler (MSVC) and the Intel Undefined behavior in a compiler IR, which is not necessar- C/C++ Compiler (ICC). ily related to undefined behavior in any given source lan- GCC attempts to explicitly initialize uninitialized regis- guage, gives optimizers the freedom to perform desirable ters to zero while other implementation details work together transformations. We have presented the first detailed look to provide consistent values in other cases. However, this at IR-level undefined behavior that we are aware of, and we does not appear to be part of GCC’s semantics because its have described difficult, long-standing problems with the se- optimizations are sufficiently aggressive to come up with mantics of undefined behavior in LLVM IR. These problems multiple values for the same uninitialized variable.9 are present to some extent in other modern optimizing com- Firm additionally has the concept of a “Bad” value,10 the pilers. We developed and prototyped a modified semantics use of which triggers UB. This semantics is stronger than for undefined behavior that meets our goals of justifying LLVM’s poison (where the use of poison is not necessarily most of the optimizations that LLVM currently performs,

8 https://github.com/google/souper 11 https://godbolt.org/g/egCqqm 9 https://godbolt.org/g/r4PX4A 12 https://godbolt.org/g/ojfRVd 10 http://pp.ipd.kit.edu/firm/Unknown_and_Undefined 13 https://godbolt.org/g/gtEbXx

12 2016/11/15 putting the semantics of LLVM IR on firm ground, and not [18] K. J. Ottenstein, R. A. Ballance, and A. B. MacCabe. The pro- significantly impacting either compile time or quality of gen- gram dependence web: A representation supporting control- erated code. , data-, and demand-driven interpretation of imperative lan- guages. In PLDI, 1990. References [19] F. Peschanski. Parallel computing with the pi-calculus. In DAMP, 2011. [1] C. S. Ananian. The static single information form. Master’s thesis, MIT, 1999. [20] T. B. Schardl, W. S. Moses, and C. E. Leiserson. Tapir: Embedding fork-join parallelism into LLVM’s intermediate [2] Atmel Inc. AVR32 architecture document, Apr. 2011. URL representation. In LCPC, 2016. http://www.atmel.com/images/doc32000.pdf. [21] M. Sperber, R. K. Dybvig, M. Flatt, A. van Straaten, [3] G. Barthe, D. Demange, and D. Pichardie. Formal verification R. Kelsey, W. Clinger, J. Rees, R. B. Findler, and J. Matthews. of an SSA-based middle-end for CompCert. ACM Trans. Revised6 report on the algorithmic language Scheme, Sept. Program. Lang. Syst., 36(1):4:1–4:35, Mar. 2014. 2007. URL http://www.r6rs.org/final/r6rs.pdf. [4] M. Braun, S. Buchwald, and A. Zwinkau. Firm—a graph- [22] B. Steensgaard. Sparse functional stores for imperative pro- based intermediate representation. Technical Report 35, Karl- grams. In ACM SIGPLAN Workshop on Intermediate Repre- sruhe Institute of Technology, 2011. URL http://digbib. sentations, 1995. ubka.uni-karlsruhe.de/volltexte/1000025470. [23] X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar- [5] S. Chakraborty and V. Vafeiadis. The formalization of LLVM Lezama. Towards optimization-safe systems: analyzing the concurrency. In CGO, 2017. impact of undefined behavior. In SOSP, 2013. [6] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. [24] J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. For- Zadeck. Efficiently computing static single assignment form malizing the LLVM intermediate representation for verified and the control dependence graph. ACM Trans. Program. program transformations. In POPL, 2012. Lang. Syst., 13(4):451–490, Oct. 1991. [7] W. Dietz, P. Li, J. Regehr, and V. Adve. Understanding integer overflow in C/C++. In ICSE, 2012. [8] M. A. Ertl. What every compiler writer should know about programmers. In KPS, 2015. [9] G. Gange, J. A. Navas, P. Schachte, H. Søndergaard, and P. J. Stuckey. Horn clauses as an intermediate representation for program analysis and transformation. Theory and Practice of Logic Programming, 15(4-5):526–542, July 2015. [10] S. Grebenshchikov, N. P. Lopes, C. Popeea, and A. Ry- balchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. [11] H. Jordan, S. Pellegrini, P. Thoman, K. Kofler, and T. Fahringer. INSPIRE: The Insieme parallel intermediate rep- resentation. In PACT, 2013. [12] J. Kang, C.-K. Hur, W. Mansky, D. Garbuzov, S. Zdancewic, and V. Vafeiadis. A formal C memory model supporting integer-pointer casts. In PLDI, 2015. [13] D. Khaldi, P. Jouvelot, F. Irigoin, C. Ancourt, and B. Chap- man. LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications. In Work- shop on the LLVM Compiler Infrastructure in HPC, 2015. [14] X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, July 2009. [15] N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with Alive. In PLDI, 2015. [16] K. Memarian, J. Matthiesen, J. Lingard, K. Nienhuis, D. Chis- nall, R. N. M. Watson, and P. Sewell. Into the depths of C: Elaborating the de facto standards. In PLDI, 2016. [17] D. Novillo. Memory SSA – a unified approach for sparsely representing memory operations. In Proc. of the GCC Devel- opers’ Summit, 2007.

13 2016/11/15