Modeling the Invariance of Virtual Pointers in LLVM

Modeling the Invariance of Virtual Pointers in LLVM Piotr Padlewski Krzysztof Pszeniczny Richard Smith Google Research Google Research Google Research [email protected] [email protected] [email protected] Abstract Indirect calls also limit the extent of compiler analyses and optimizations, making it much more difficult to Devirtualization is a compiler optimization that replaces perform inter-procedural reasoning and thus hinder many indirect (virtual) function calls with direct calls. It is useful intra-procedural optimizations, e.g. inlining, intra- particularly effective in object-oriented languages, such procedural register allocation, constant propagation or as Java or C++, in which virtual methods are typically function attributes inference. abundant. Modeling the lifetime of a C++ pointer in the LLVM We present a novel abstract model to express the life- IR is nontrivial, because bit-wise equal C++ pointers, times of C++ dynamic objects and invariance of virtual viz. having the same bit pattern, may nevertheless table pointers in the LLVM intermediate representation. denote objects with different lifetimes. They are thus The model and the corresponding implementation in not equivalent and cannot be replaced with each other. Clang and LLVM enable full devirtualization of virtual However, this information is not reflected in the LLVM calls whenever the dynamic type is statically known and IR, because equal values are interchangeable there. The elimination of redundant virtual table loads in other following C++ code, when translated to the LLVM IR cases. and optimized1, makes the issue apparent: Due to the complexity of C++, this has not been achieved 1 void test() { by any other C++ compiler so far. Although our model 2 auto*a= newA; 3 external_fun(a); was designed for C++, it is also applicable to other 4 languages that use virtual dispatch. Our benchmarks 5 A* b = new(a) B; 6 external_fun(b); show an average of 0.8% performance improvement on 7 } real-world C++ programs, with more than 30% speedup in some cases. The implementation is already a part of 1 define void @test() { 2 %new= call i8* @NEW(i64 8) the upstream LLVM/Clang and can be enabled with the 3 %a= bitcast i8* %1 to %struct.A* -fstrict-vtable-pointers flag. 4 call void @A_CTOR(%struct.A* %a) 5 call void @external_fun(%struct.A* %a) 6 %b= bitcast i8* %1 to %struct.B* Keywords: devirtualization, invariant, virtual pointer, 7 call void @B_CTOR(%struct.B* %b) 8 ; Because %a and %b are known to be equal, indirect call, fat pointer 9 ; the compiler decided to use %a. 10 call void @external_fun(%struct.A* %a) 11 ret void 1 Introduction 12 } Devirtualization is a compiler optimization that changes Here, the so-called “placement new” was used to con- arXiv:2003.04228v1 [cs.PL] 22 Feb 2020 virtual (dynamic) calls to direct (static) calls. The former struct a new object in pre-allocated memory (here: in the introduce a performance penalty compared to the latter, memory pointed by a). As can be seen in this example, as they cannot be inlined and are harder to speculatively %a is passed to the second call of external_fun instead execute. of %b, as the compiler figured out that they are equivalent. On the other hand, if external_fun(b) had been Indirect calls can also serve as an attack vector if an replaced with external_fun(a), the behavior would be adversary can replace the virtual pointer of an object, undefined per the C++ standard. e.g. by performing buffer overflow. Moreover, a recently discovered vulnerability called Spec- In this paper, we present a novel way to model C++ tre [13, 17] showed that the indirect branch predictor is dynamic objects’ lifetimes in LLVM capable of leveraging susceptible to side-channel attacks where observable side their virtual pointer’s invariance to aid devirtualization. effects could lead to private data leakage. The currently 1The LLVM IR code was simplified to make it clearer. Mangled used mitigation technique (the so-called ‘retpolines’ [26]) names, types or unimportant attributes will be similarly simplified effectively eliminates indirect branch prediction entirely. in other listings. 1 Piotr Padlewski, Krzysztof Pszeniczny, and Richard Smith This is a previously unresolved problem, blocking the Similar optimizations can be also achieved using just-in- wide deployment of devirtualization techniques. time compilation. However, it is rarely used in the case of C++. Another technique is to leverage the invariance of a 2 Related Work virtual pointer and having loaded the pointer once, reuse it for multiple virtual calls, or even devirtualize all virtual There are multiple optimizations done in hardware that calls if the dynamic type of object is statically known. speed up the execution of virtual calls. The most im- The latter holds e.g. for automatic objects in C++. portant one is the indirect branch predictor that makes In Java, the virtual pointer is set only once in the most speculative execution of indirect calls possible. More- derived constructor. There is also no mechanism for over, some CPU architectures provide store buffers [11, changing the dynamic type of an existing object, and 11.10 Store Buffer] [4, 2.4.5.2 L1 DCache] [2, 8.5.1. the lack of manual memory management means that no Store buffer] [18], which temporarily hold writes before reference ever points to a dead object. Hence, one can committing them to memory. When a load operation is simply inform the optimizer that virtual pointers are performed, the CPU returns a value from the store buffer invariant. This technique was used in Falcon [9, 24]– if it exists for the given address. Thus, after construction an LLVM-based Java JIT compiler, resulting in 10-20% of an object, as long as its virtual pointer’s value is in speedups2. the store buffer, loading it is more efficient. Moreover, some LLVM front-ends decided to create a Additionally, there are also multiple software techniques richer language-specific intermediate language used be- used for devirtualization. Speculative devirtualization [14, tween abstract syntax tree and the LLVM IR, such as 15] introduces direct calls to known implementations by SIL (Swift), MIR (Rust), or Julia IR (Julia). Unfortu- guarding them with comparisons of virtual pointers or nately, Clang translates its AST directly to the LLVM virtual function pointers. This enables inlining and re- IR. lies on the branch predictor instead of the indirect call Previous experimental LLVM-based models [19–22] that predictor, which is more accurate especially on older tried to express lifetime of pointers to dynamic objects architectures. However, only implementations visible were unfortunately flawed. They failed to prevent the in the current translation unit can be speculated. An the compiler from being easily confused by equality of indirect call is thus still required for unknown implemen- pointers to objects with different lifetimes. Hence, they tations. could not be enabled by default, as they could lead to a miscompilation, e.g.: A further similar technique called indirect call promo- tion [12] uses profiling data of the program to pick the 1 void g() { most frequent callsites and insert direct calls guarded by 2 A *a = newA; 3 a->virt_meth(); comparison of function pointers similarly to speculative 4 A *b = new(a) B; devirtualization. 5 if (a == b) { 6 // Here the compiler is exposed to the fact 7 // thata ==b, so it may replace theSSA Another approach is to use information about the entire 8 // value ofb witha, which would result in 9 // an erroneous call toA::virt_meth. program using a family techniques called whole program 10 b->virt_meth(); optimizations (WPO), also referred as link-time opti- 11 } 12 } mization (LTO) [3, 7, 8, 16, 25] as it often happens during link-time. Having information about the whole program, a compiler can derive facts that might be unknown or 3 Problems Specific To C++ hard to model in a single translation unit, e.g. a virtual pointer being invariant, or the definition of virtual tables, In C++ functions are allowed to modify any accessible or that a function is effectively final (meaning it is not memory, so the compiler needs to be very conservative overridden by other types). Using information about the with any assumptions. Moreover, compilation of sepa- whole program, GCC in LTO mode can prune unreach- rate modules – called translation units [10, 5.2 Phases able branches created by speculative devirtualization. of translation] – is usually independent, which makes LLVM’s whole-program devirtualization [1] is also able compilation highly parallelizable, but on the other hand to identify virtual functions that only return constants, and then put these constants in the vtable itself and 2Personal communication from Philip Reames, Director of Com- replace calls with simple loads. piler Technology at Azul Systems 2 Modeling the Invariance of Virtual Pointers in LLVM limits the ability of the compiler to reason about func- 1 structA{ tions from other translation units. 2 int value; 3 }; 4 External functions are functions defined in a different 5 structB:A{}; translation unit than the currently compiled one, so 6 7 void test() { that the body of a function is not visible to the compiler, 8 auto *a= newA; unless performing whole program optimization. External 9 a->value = 0; 10 A *b = new(a)B; functions are problematic when reasoning about virtual 11 b->value = 42; calls in C++ because without additional knowledge the 12 if (a == b) { 13 printf ("%d%d", a->value, b->value); compiler has to conservatively assume that the virtual 14 } pointer might be clobbered (overwritten). 15 } The following code snippet demonstrates some of the The behavior of this piece of code is undefined, because shortcomings of traditional C++ devirtualization tech- although the pointers a and b refer to the same memory niques that do not rely on any additional notion of virtual location and are bit-wise equal, a cannot be derefer- pointer invariance.

Load more