Towards a Toolchain for Exploiting Smart Contracts on the Ethereum Blockchain
Total Page:16
File Type:pdf, Size:1020Kb
Towards a Toolchain for Exploiting Smart Contracts on the Ethereum Blockchain by Sebastian Kindler M.A., University of Bayreuth, 2011 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in the Computer Science Program Faculty of Computer Science Supervisor: Prof. Dr. Stefan Traub Second Assessor: Prof. Dr. Markus Schäffter External Assessor: Dr. Henning Kopp Ulm University of Applied Sciences March 22, 2019 Abstract The present work introduces the reader to the Ethereum blockchain. First, on a con- ceptual level, explaining general blockchain concepts, and viewing the Ethereum blockchain in particular from different perspectives. Second, on a practical level, the main components that make up the Ethereum blockchain are explained in detail. In preparation for the objective of the present work, which is the analysis of EVM bytecode from an attacker’s perspective, smart contracts are introduced. Both, on the level of EVM bytecode and Solidity source code. In addition, critical assem- bly instructions relevant to the exploitation of smart contracts are explained in detail. Equipped with a definition of what constitutes a vulnerable contract, further practical and theoretical aspects are discussed: The present work introduces re- quirements for a possible smart contract analysis toolchain. The requirements are viewed individually, and theoretical focus is put on automated bytecode analysis and symbolic execution as this is the underlying technique of automated smart contract analysis tools. The importance of semantics is highlighted with respect to designing automated tools for smart contract exploitation. At the end, a min- imal toolchain is presented, which allows beginners to efficiently analyze smart contracts and develop exploits. i Contents Introduction1 1 Preliminaries3 1.1 Blockchain.............................3 1.2 Ethereum.............................. 10 1.2.1 Ethereum from Different Perspectives........... 10 1.2.2 Ethereum World State σ .................. 13 1.2.3 Ethereum Account Types.................. 15 1.2.4 Ethereum Transactions................... 16 1.2.5 Ethereum Virtual Machine (EVM)............. 23 1.2.6 Ethereum Peer-to-Peer Network.............. 26 1.3 Smart Contracts........................... 28 1.3.1 Smart Contracts at EVM Bytecode Level......... 29 1.3.2 Solidity and the Structure of Smart Contracts....... 35 1.4 Vulnerability of Smart Contracts.................. 43 1.4.1 Critical Bytecode Instructions in Smart Contracts..... 43 1.4.2 Exploitation of Critical Instructions............ 46 1.4.3 Defining Vulnerable Smart Contracts and Exploits.... 49 ii 2 Towards a Smart Contract Exploit Development Toolchain 50 2.1 Requirements Analysis....................... 50 2.2 Requirement 1: EVM Bytecode Deployment............ 53 2.3 Requirement 2: Manual EVM Bytecode Analysis (Tools)..... 55 2.4 Requirement 3: Automated Analysis (Theory)........... 57 2.4.1 Symbolic Execution.................... 57 2.5 Requirement 4: Automated Exploit Development......... 70 2.6 Toolchain for Automated EVM Bytecode Analysis and Exploit Development............................ 73 Conclusions 74 iii Introduction The concept of smart contracts was introduced in 1994 by cryptographer Nick Szabo [46], who offers the following definition: A smart contract is a computerized transaction protocol that executes the terms of a contract. The general objectives of smart contract design are to satisfy common contractual conditions (such as payment terms, liens, confidentiality, and even enforcement), minimize exceptions both malicious and accidental, and minimize the need for trusted intermediaries. Related economic goals include lowering fraud loss, arbitration and enforcement costs, and other transaction costs1. A smart contract is as binding as a legal contract. The bytecode of a smart contract constitutes the contractual conditions, to which users subject themselves when they execute the respective smart contract. However, unlike a legal contract, a smart contract can neither be circumvented nor fought in court. As programs, and from the perspective of the end user, smart contracts execute precisely the way they are designed. In this sense, Ethereum blockchain technology is an implementation of a decentralized crypto-law system [51]. In contrast to national legal systems, Ethereum non-contract account owners do not decide by which law they want to abide, but rather by which law they want to be bound. Once called via a message call transaction, smart contract execution cannot be stopped, and the contractual conditions are binding in the absolute sense. However, if program code is what constitutes the law, then programming errors are part of the law as well. Hence, by definition, the abuse of programming errors in a 1The term transaction costs goes back to the article The problem of Social Cost by Ronald Coase [10], the theses of which were later summarized as the Coase theorem [12]. Transaction costs have a negative connotation and refer to the time and effort as well as the resources that are required to negotiate the exchange of legal entitlements. According to the interpretation [12] of Coase’s proposal, from the perspective of efficiency, the original allocation of resources is of no concern as long as transactions of legal entitlements are costless. Reducing transaction costs facilitates the efficient exchange of legal entitlements and increases cooperation between competing parties. 1 decentralized system such as Ethereum cannot constitute a violation of the system’s crypto-law. Any condemnation of attacks against error-prone smart contracts builds on the remnants of thinking in centralized legal systems. A decentralized system thus eradicates such thinking. Public blockchain implementations such as Ethereum are transparent but trustless environments. However, the trust people put in Ethereum does not depend on centralized legal institutions. Rather, people put their trust in algorithms [35] the security of blockchain technology is build on. Regarding the consensus algorithm, i.e., the proof-of-work, such trust may be justified. However, complete trust in the correct execution of arbitrary programs seems ill-placed. Especially, when these programs manage ’real’ people’s money: Ethereum smart contracts own Ether worth millions of US dollar, and heists [33] on the Ethereum blockchain have shown how vulnerable and insecure smart contracts can be. To comprehend the severity of smart contract vulnerabilities as well as the importance of a toolchain for smart contract vulnerability analysis, the subsequent work serves as a thorough introduction to the Ethereum blockchain. The present work introduces the reader to the Ethereum blockchain. First, on a conceptual level, explaining general blockchain concepts, and viewing the Ethereum blockchain in particular from different perspectives. Second, on a practical level, the main components that make up the Ethereum blockchain are explained in detail. In preparation for the objective of the present work, which is the analysis of EVM bytecode from an attacker’s perspective, smart contracts are introduced. Both, on the level of EVM bytecode and Solidity source code. In addition, critical assembly instructions relevant to the exploitation of smart contracts are explained in detail. Equipped with a definition of what constitutes a vulnerable contract, further practical and theoretical aspects are discussed: The present work introduces requirements for a possible smart contract analysis toolchain. The requirements are viewed individually, and theoretical focus is put on automated bytecode analysis and symbolic execution as this is the underlying technique of automated smart contract analysis tools. The importance of semantics is highlighted with respect to designing automated tools for smart contract exploitation. At the end, a minimal toolchain is presented, which allows beginners to efficiently analyze smart contracts and develop exploits. 2 1 Preliminaries 1.1 Blockchain Blockchain as an append-only linked list The term blockchain refers to a data structure, which can be loosely described as an append-only linked list, whose data content and sequence of data elements are immutable. In comparison, a standard singly linked list is a linear sequence of individual data elements, each of which contains some data and a reference to the next element. Thus, the elements themselves implement the list by referencing the address of the respective next element as shown in Figure1. Each of the elements in the singly linked list can be modified, moved within the sequence or be deleted. Moreover, new elements can be inserted at any point in the sequence: at the beginning, the middle or the end. Thus, a singly linked list is neither append-only nor is it immutable with regards to data content and sequence of data elements. address : 0x0004 address : 0x000A address : 0x0032 data 12 data 35 data 17 next 0x000A next 0x0032 next Null F irst element Second element Last element Figure 1: A singly linked list consisting of three elements, each of which stores an integer value and references the next element by pointing to its address. In contrast, a blockchain is designed as an append-only linked list: the linear sequence of previously added data elements is immutable, so is the data stored in the data elements. The data elements on a blockchain are referred to as blocks. Each block is comprised of two sections: (1) a block header that contains various pieces of information particular