Semantic Equivalence Checking for HHVM Bytecode
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Equivalence Checking for HHVM Bytecode Nick Benton Facebook London [email protected] ABSTRACT But what do we mean by ‘the same’? Just running diff on the We describe a semantic differencing tool used to compare the byte- output of the two compilers requires literal textual equality, which codes generated by two different compilers for Hack/PHP at Face- is too strict a requirement. Two compilers for the same language book. The tool is a prover for a simple relational Hoare logic for will naturally produce code that varies in inessential details, such low-level code and is used in testing, allowing the developers to as the use of local variables, ordering of basic blocks, etc. Trying to focus on semantically significant differences between the outputs make a new code generator (written in OCaml) agree on all such of the two compilers. details with an existing one (written in C++) would both be unnec- essary work and create technical debt in the form of contorted code. CCS CONCEPTS The unappealing alternative is manually checking huge diff out- puts and attempting to classify individual differences as significant • Theory of computation → Logic and verification; Hoare or unimportant. What we really wanted was a mechanical way to logic; Invariants; Pre- and post-conditions; • Software and check that the two compilers produce code that is behaviourally its engineering → Software verification; Automated static equivalent, i.e. produces the same observable behaviour in all con- analysis; Compilers; Software testing and debugging; Object texts. (We also do not want performance regressions, but ignore oriented languages; that aspect of testing here.) This paper describes an automated ACM Reference Format: tool, semdiff, that we built to check HHAS files for behavioural Nick Benton. 2018. Semantic Equivalence Checking for HHVM Bytecode. equivalence. In The 20th International Symposium on Principles and Practice of Declarative Behavioural equivalence is obviously undecidable, and writing a Programming (PPDP ’18), September 3–5, 2018, Frankfurt am Main, Germany. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3236950.3236975 sound-but-incomplete fully automatic analysis that can cope with significant equivalence-preserving differences, such as those be- 1 INTRODUCTION tween optimized and unoptimized code, is generally extremely hard. However, in our case, the two compilers already produced (or were HHVM, the HipHop Virtual Machine [14], is a high performance, supposed to produce) ‘essentially’ the same code. Thus we were JIT-compiled implementation of PHP. HHVM generates the Face- able to get good results with a fairly simple, fast analysis, that book site, as well as being used by Wikipedia, Slack, and others. concentrates on ‘trivial’ differences. Most of the code running on HHVM at Facebook is now written semdiff is essentially a prover for a restricted version of Rela- in Hack, a new language based on PHP, whose features include an tional Hoare Logic (RHL) for low-level code. It was integrated into expressive gradual type system, improved collections, lambdas, and the test process for HackC, running over the compiled code from support for asynchronous programming [16]. Hack was originally millions of lines of Hack and PHP, and freed the HackC team to implemented using a combination of an independent typechecker concentrate on eliminating significant differences in the output of plus some extensions to HHVM’s PHP front-end. We have now the two compilers. built HackC, a new compiler for Hack. HackC emits HHAS, which is a textual form of HHBC, the HipHop ByteCode already used as an intermediate language inside HHVM. 1.1 semdiff architecture Moving to an entirely new compiler for Hack code is not some- semdiff takes two hhas files as input. These are each parsed in thing to be undertaken lightly. The existing HHVM implementation an unsurprising way into the representation of hhbc that is used is mature and well-tested, and we needed to have a similar level by HackC. There are around 20 OCaml record types defining the of confidence in the new compiler. A conservative decision was nested structure of an bytecode file. For example, a program has a therefore taken that, in the first instance, the new compiler should list of literal array constants, a list of top-level function definitions, a produce the same target code as the old one for our existing code- list of class definitions, a list of type definitions, and a distinguished base and tests. main function. A function has a list of attributes, a name, a body, Permission to make digital or hard copies of all or part of this work for personal or and a collection of boolean flags saying if it is async, a generator, classroom use is granted without fee provided that copies are not made or distributed and so on. Some of the record types have a non-trivial number of for profit or commercial advantage and that copies bear this notice and the full citation fields (e.g. a class currently has 24) and for nearly all of them, we on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, just want to check for literal equality, set equality, or map equality. to post on servers or to redistribute to lists, requires prior specific permission and/or a We reduce the pain of writing the associated boilerplate by using fee. Request permissions from [email protected]. combinators: PPDP ’18, September 3–5, 2018, Frankfurt am Main, Germany © 2018 Association for Computing Machinery. type 'a compare = { ACM ISBN 978-1-4503-6441-6/18/09...$15.00 https://doi.org/10.1145/3236950.3236975 comparer : 'a -> 'a -> int * (int * edit_sequence); PPDP ’18, September 3–5, 2018, Frankfurt am Main, Germany Nick Benton size_of : 'a -> int; iterators, and memo tables, two kinds of exception handler, and string_of : 'a -> string instructions for dynamically loading classes and functions. } As well as the stacks, hhas allows the use of named locals, such $x _3 An 'a compare comprises a comparison function for values of as , and ‘unnamed’ numbered temporaries, such as . If v is a SetL type 'a, a function that computes a rough indication of the size variable, then the instruction v copies the value on the top PopC of values of type 'a, and a string_of function. The comparison of the stack into v (without popping), discards the value on CGetL function returns an edit distance (which is zero if the two values the top of the stack, and v pushes the contents of v onto are appropriately equal), a size, and an edit sequence, which can the evaluation stack. (We will describe other instructions as we be thought of as a string representation of the delta between the encounter them.) two values. Values of type 'a compare are built compositionally Figure 1 shows an example of some rather artificial Hack source according to the structure of 'a and the desired notion of equality. code. Two different bytecode sequences that might correspond to For example, there is a generic comparer for 'a lists that takes a that source are shown in Figure 2. comparer for 'a as argument, and computes the Levenshtein edit One can see that the two target files differ in their control flow, distance between the two lists. There is also a join combinator with simpler exception handling on the right, and different labels $pairs that takes two (independent) comparers for the same type and and jumps. For example, if is empty then the iterator created L0 L2 combines them into one that does both comparisons, and so on. The on the left on line 12 will jump to and then to before returning, L4 combinators are used to build up a value of type program compare, whereas the corresponding one on the right will just jump to _6 and it is the result of the associated comparer that is returned by and return. The uses of local variables are also different. Local on semdiff. the left, which stores the current value of the iterator, corresponds _8 _7 The interesting comparer is that for the lists of instructions that to local on the right. More subtly, local on the left, holding $f _8 are part of the bodies of functions and methods. Here we do not the result of the call to , also corresponds to local on the right. want to use our generic list comparer, as that would just yield a syn- Other differences include the reversal of the tests on line 50 onthe UnsetL tactic, rather than a semantic, comparison. Instead, we implement left and line 44 on the right, and the way instructions on a comparer that tries to prove that two sequences of instructions the left are not matched on the right. have the same observable behaviour under the assumption that Nevertheless, most of the instructions do correspond in a fairly they are started in equivalent contexts (comprising the rest of the simple way, and the two implementations are semantically equiva- program, the parameters passed in, etc.). The intention is that this lent. comparison is sound: if it reports that two bodies are equivalent then they really are according to the (intended) semantics of the bytecode. If the comparison fails, then it might be that there is an 3 RELATIONAL HOARE LOGIC FOR observable difference, or it might be that our analysis is too weak LOW-LEVEL CODE to establish equivalence. In such a case, we report the state of the Traditional Hoare logic for structured programs works with judge- prover when it got stuck and the syntactic difference between the ments ` fPg C fQg, meaning that if command C is started in a state two sequences, and hand over to a human being to decide whether satisfying the precondition P then if it terminates, it does so in a this is a false positive or not.1 state satisfying postcondition Q.