A Formal Verification Tool for Ethereum VM Bytecode

A Formal Verification Tool for Ethereum VM Bytecode

A Formal Verification Tool for Ethereum VM Bytecode Daejun Park Yi Zhang Manasvi Saxena University of Illinois at University of Illinois at University of Illinois at Urbana-Champaign, USA Urbana-Champaign, USA Urbana-Champaign, USA Runtime Verification, Inc., USA Runtime Verification, Inc., USA Runtime Verification, Inc., USA [email protected] [email protected] [email protected] Philip Daian Grigore Ros, u Cornell Tech, USA University of Illinois at IC3, USA Urbana-Champaign, USA Runtime Verification, Inc., USA Runtime Verification, Inc., USA [email protected] [email protected] ABSTRACT usually written in a high-level language such as Solidity3 or Vyper4, In this paper, we present a formal verification tool for the Ethereum and then it is compiled down to the Ethereum Virtual Machine 5 Virtual Machine (EVM) bytecode. To precisely reason about all (EVM) bytecode that actually runs on the blockchain. possible behaviors of the EVM bytecode, we adopted KEVM, a In this paper, we present a formal verification tool for the EVM complete formal semantics of the EVM, and instantiated the K- bytecode. We chose the EVM bytecode as the verification target framework’s reachability logic theorem prover to generate a correct- language so that we can directly verify what is actually executed by-construction deductive verifier for the EVM. We further opti- without the need to trust the correctness of the compiler. To pre- mized the verifier by introducing EVM-specific abstractions and cisely reason about the EVM bytecode without missing any EVM lemmas to improve its scalability. Our EVM verifier has been used quirks, we adopted KEVM [4], a complete formal semantics of the to verify various high-profile smart contracts including the ERC20 EVM, and instantiated the K-framework’s reachability logic the- token, Ethereum Casper, and DappHub MakerDAO contracts. orem prover [10] to generate a correct-by-construction deductive program verifier for the EVM. While it is sound, the initial out- Demo Video URL: https://youtu.be/4XBcAclq0Vk of-box EVM verifier was relatively slow and failed to prove many CCS CONCEPTS correct programs. We further optimized the verifier by introducing custom abstractions and lemmas specific to EVM that expedite proof • Software and its engineering → Software verification; searching in the underlying theorem prover. We have been using the EVM verifier to verify the full functional correctness of high-profile KEYWORDS smart contracts including multiple ERC20 token contracts [13], Ethereum, smart contracts, formal verification, K framework Ethereum’s Casper6 contract, and DappHub’s MakerDAO7 con- ACM Reference Format: tract. Our verification tool and artifact is publicly available at[11]. Daejun Park, Yi Zhang, Manasvi Saxena, Philip Daian, and Grigore Ros, u. 2018. A Formal Verification Tool for Ethereum VM Bytecode. In Proceedings Contributions. We describe our primary contributions: of the 26th ACM Joint European Software Engineering Conference and Sympo- • We present a formal verification tool for the EVM bytecode sium on the Foundations of Software Engineering (ESEC/FSE ’18), November that is capable and scalable enough to verify various high- 4–9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA,4 pages. profile, safe-critical smart contracts. Moreover, our verifier https://doi.org/10.1145/3236024.3264591 is the first tool, to the best of our knowledge, that adoptsa complete formal semantics of EVM, being able to completely 1 INTRODUCTION reason about all possible corner-case behaviors of the EVM 1 Smart contract failures have caused millions of dollars of lost funds, bytecode. See Section5 for comparison to other tools. and rigorous formal methods are required to ensure the correctness • We enumerate important, concrete challenges in verifying 2 and security of contract implementations. The smart contract is the EVM bytecode, and propose EVM-specific abstractions 1https://blog.ethereum.org/2016/06/19/thinking-smart-contract-security/ and lemmas to mitigate the challenges. (Section2&3) 2https://blog.ethereum.org/2016/09/01/formal-methods-roadmap/ • We present a case study of completely verifying high-profile ERC20 token contracts. We enumerate divergent behaviors Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed we found across these tokens, illuminating potential secu- for profit or commercial advantage and that copies bear this notice and the full citation rity vulnerabilities for any API clients assuming consistent on the first page. Copyrights for components of this work owned by others than ACM behavior across ERC20 implementations. (Section4) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. 3http://solidity.readthedocs.io/en/v0.4.24/ ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA 4https://vyper.readthedocs.io/en/latest/index.html 5 © 2018 Association for Computing Machinery. http://yellowpaper.io/ ACM ISBN 978-1-4503-5573-5/18/11...$15.00 6https://eips.ethereum.org/EIPS/eip-1011 https://doi.org/10.1145/3236024.3264591 7https://makerdao.com/ ESEC/FSE ’18, November 4–9, 2018, Lake Buena Vista, FL, USA D. Park, Y. Zhang, M. Saxena, P. Daian, and G. Ros, u 2 EVM VERIFICATION CHALLENGES trivial once the above is compiled down to EVM. The compiled Verifying the EVM bytecode is challenging, especially due to the EVM bytecode of the above conditional expression can be encoded internal byte-manipulation operations that require non-linear inte- in the SMT-LIB format as follows: ger arithmetic reasoning, which is undecidable in general [7]. Here (not (= (chop (+ (bool2int (= b 0)) we provide a few examples of the challenges. (bool2int (> (chop (+ a b)) a)))) 0)) (chop x) 256 (bool2int x) Byte-Manipulation Operations. The EVM provides three types of where denotes (x mod 2 ), and is de- (ite x 1 0) storage structures: a local memory, a local stack, and the global stor- fined by . However, Z3 fails (timeout) to prove that a b 256 age. Of these, only the local memory is byte-addressable (i.e., repre- the above SMT formula is equivalent to + < 2 . sented as an array of bytes), while the others are word-addressable Hash Collision. Precise reasoning about the SHA3 hash9 is criti- (i.e., each represented as an array of 32-byte words). Thus, a 32-byte cal. Since it is not practical to consider the hash algorithm details (i.e., 256-bit) word needs to be split into 32 chunks of bytes to be every time the hash function is called in the EVM bytecode, an stored in the local memory, and those 32 chunks need to be merged abstraction for the hash function is required. Designing a sound back to be loaded in either the local stack or the global storage. but efficient abstraction is not trivial because while the SHA3hash These byte-wise splitting and merging operations can be formal- is not cryptographically collision-free, the contract developers as- 8 ized using non-linear integer arithmetic operations, as follows. sume collisions will not occur during normal execution of their th Suppose x is a 256-bit integer. Let xn be the n byte of x in its two’s contracts.10 A naive way of capturing the assumption would be to complement representation, where the index 0 refers to the least simply abstract the SHA3 hash as an injective function. However, it significant bit (LSB), defined as follows: is not sound simply because of the pigeonhole principle, and thus def n we need to be careful when abstracting the hash function. xn = (x=256 ) mod 256 Let merge be a function that takes as input a list of bytes and re- 3 EVM-SPECIFIC ABSTRACTIONS turns the corresponding integer value under the two’s complement K’s reachability logic theorem prover can be seen as a symbolic interpretation, recursively defined as: model checker equipped with coinductive reasoning about loops def and recursions (refer to [10] for details of the underlying theory merge(x ··· x x ) = merge(x ··· x ) * 256 + x when i > j i j+1 j i j+1 j and implementation). The prover, in its current form, often dele- def merge(xi ) = xi gates domain reasoning to SMT solvers. The performance of the underlying SMT solvers is critical for the overall performance. The where * and + are multiplication and addition over words (modulo 256 domain reasoning involved in the EVM bytecode verification is not 2 ). If the byte-wise operations are blindly encoded as SMT theo- tractable in many cases, especially due to non-linear integer arith- rems, then Z3, a state-of-the-art SMT solver, times out attempting metic. We had to design custom abstractions and lemmas to avoid ··· to prove “x = merge(x31 x0)”. The SMT query can be simplified the non-tractable domain reasoning and improve the scalability. to allow Z3 to efficiently terminate, for example, by omitting the modulo reduction for multiplication and addition in merge with Abstraction for Local Memory. We present an abstraction for the additional reasoning about the soundness of the omission. Despite EVM local memory to allow word-level reasoning. As mentioned these improvements, the merge operation still incurs severe perfor- in Section2, since the local memory is byte-addressable, the load mance penalties as solving the large formula is required for every and store operations involve the conversion between a word and load/store into memory, an extremely common operation. a list of bytes, which is not tractable to reason about in general. Our abstraction helps to make the reasoning easier by abstracting Arithmetic Overflow. Since EVM arithmetic instructions perform away the byte-manipulation details of the conversion.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us