Evaluating Memory-Hard Proof-Of-Work Algorithms on Three Processors

Evaluating Memory-Hard Proof-of-Work Algorithms on Three Processors Zonghao Feng Qiong Luo Hong Kong University of Science and Hong Kong University of Science and Technology Technology Kowloon, Hong Kong Kowloon, Hong Kong [email protected] [email protected] ABSTRACT Recently, database researchers have started analyzing pri- Most public blockchain systems, exemplified by cryptocur- vate blockchains [20], scaling up blockchains [19], and devel- rencies such as Ethereum and Monero, use memory-hard oping blockchain-based storage engines [43], databases [23, proof-of-work (PoW) algorithms in consensus protocols to 35], cloud data management [33], and data provenance [39]. maintain fair participation without a trusted third party. In this paper, we study the performance of several represen- The memory hardness, or the amount of memory access, tative memory-hard proof-of-work (PoW) algorithms, which of these PoW algorithms is to prevent the dominance of are key components in leading cryptocurrency systems. custom-made hardware of massive computation units, in Cryptocurrencies are public blockchain systems, where particular, application-specific integrated circuit (ASIC) and distributed participants do not fully trust each other but field-programmable gate array (FPGA) machines, in the sys- must reach consensus for their data blocks (transactions) tem. However, it is unclear how effective these algorithms to be appended to the blockchain. PoW algorithms [22] are are on general-purpose processors. used to reach such consensus. The core idea is that a service In this paper, we study the performance of representa- requestor must prove to a service provider that the requestor tive memory-hard PoW algorithms on the CPU, the Graph- has spent a certain amount of computational effort, and that ics Processing Unit (GPU), and the Intel Knights Landing the provider can verify this effort in a short time. (KNL) processors. We first optimize each algorithm for indi- Because the success of appending data blocks to the chain vidual processors, and then measure their performance with depends on the speed of executing PoW algorithms, cryp- number of threads and memory size varied. Our experimen- tocurrency participants, or called miners, utilize powerful tal results show that (1) the GPU dominates the CPU and computers for this purpose. These computers include not the KNL processors on each algorithm, (2) all algorithms only general-purpose CPUs and GPUs, but also application- scale well with number of threads on the CPU and KNL, specific integrated circuit (ASIC) and field-programmable and (3) the size of accessed memory area affects each al- gate array (FPGA) machines. Such imbalance in comput- gorithm differently. Based on these results, we recommend ing power between participants challenges the democratiza- CryptoNight with scratchpads of different sizes as the most tion of the system, and may create opportunities for security egalitarian PoW algorithm. threats. To prevent the dominance of powerful computers, in par- PVLDB Reference Format: ticular, ASICs and FPGAs, recent cryptocurrencies adopt Zonghao Feng and Qiong Luo. Evaluating Memory-Hard Proof- memory-hard PoW algorithms, which require a large amount of-Work Algorithms on Three Processors. PVLDB, 13(6): 898- of memory access. The goal is to have a similar PoW perfor- 911, 2020. mance on various processors. These PoW algorithms effec- DOI: https://doi.org/10.14778/3380750.3380759 tively discourage the use of ASICs and FPGAs because their memory performance is limited and will be costly to boost. 1. INTRODUCTION However, it is unclear how effective these memory-hard PoW algorithms achieve the democratization purpose on general- Blockchain technologies have been under rapid develop- purpose computers, as CPUs and GPUs differ considerably ment, and a major application is cryptocurrencies, such as on processor architectures as well as memory performance. Bitcoin [34], Ethereum [44], and Monero [42]. Promising ap- Therefore, we study the performance of memory-hard PoW plications have also been demonstrated in healthcare [21], algorithms on three types of general-purpose processors - supply chain [36], financial services [46], and other areas. the CPU, the GPU, and the Intel Xeon Phi Knights Land- ing (KNL) processors. We select three representative memory-hard PoW algo- This work is licensed under the Creative Commons Attribution- rithms for our study. The first algorithm under study is NonCommercial-NoDerivatives 4.0 International License. To view a copy CryptoNight [18], which is one of the most popular memory- of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For hard PoW algorithms, designed by the CryptoNote founda- any use beyond those covered by this license, obtain permission by emailing [email protected]. Copyright is held by the owner/author(s). Publication rights tion. Its performance is bound by memory latency because licensed to the VLDB Endowment. the algorithm performs a sequence of random memory ac- Proceedings of the VLDB Endowment, Vol. 13, No. 6 cesses mixed with reads and writes in a 2MB scratchpad ISSN 2150-8097. in the memory. The second algorithm we study is Ethash DOI: https://doi.org/10.14778/3380750.3380759 898 [24], which is another notable memory-hard PoW algorithm, asset database that is shared among multiple sites. In this used by Ethereum [44]. It works by fetching data from a database, transactions are recorded in a chain of blocks con- randomly generated dataset, specifically a directed acyclic nected by hash pointers, as shown in Figure 1. This chain graph (DAG) in memory, which is typically several gigabytes of blocks is called the blockchain, and the first block of the in size. The third algorithm under study is Cuckoo Cycle blockchain is called the genesis block. Except the genesis [41]. It finds cycles in a randomly generated bipartite graph block, each block contains the hash value of the previous by visiting all edges and marking vertices whose degrees are block. Thus, the content of each block cannot be modified, less than or equal to one. The bipartite graph takes several given that the hash function is secure. Nodes (participants) gigabytes of memory, the accesses to the graph are sequen- can freely join or leave the Bitcoin network, and every node tial reads, and those to the mark arrays are random reads can verify the validity of past transactions. This way, every and writes. node can be the validator. To save storage space and reduce In this paper, we study memory-hard PoW algorithms verification time, the Merkle tree (a binary hash tree) is used on general-purpose processors, because these techniques are to store the transactions. not only essential for public blockchains, but also relevant to in-memory data-intensive operations. However, few stud- Block ies have been performed to compare the performance of a Block Header Block Header memory-hard PoW algorithm between general-purpose pro- Previous Hash Previous Hash cessors or analyze the performance characteristics of these algorithms. To the best of our knowledge, this study is the Timestamp Difficulty Nonce Timestamp Difficulty Nonce first to fill the gap. Merkle Root Merkle Root Specifically, we first take the open-source versions of the memory-hard PoW algorithms, and implement and optimize Hash01 Hash23 them on the CPUs, the GPUs, and the KNL. We then evaluate the overall performance of these optimized implementa- Hash0 Hash1 Hash2 Hash3 tions in their default parameter settings. Next, we vary the number of threads and the size of accessed memory area to Tx0 Tx1 Tx2 Tx3 examine the scalability and memory performance characteristics. Finally, we analyze the results on processor profiling, memory throughput, and power consumption. Figure 1: The Bitcoin blockchain Our contributions can be summarized as follows: • We evaluate the overall performance of each optimized Blockchain systems can be categorized into two types - algorithm on seven processors (two CPUs, one KNL, public or private. Public blockchains or permissionless block- and four GPUs). chains keep the decentralization feature, i.e., nodes are un- trusted and can freely join or leave the network. In contrast, • We identify the performance factors of each algorithm in private blockchains or permissioned blockchains, the iden- and examine their effect. tities of participants are known, the data might be private, and trusted third parties may act as transaction validators. • We collect and analyze various hardware profiling re- Due to propagation delays or malicious attacks, nodes sults, such as cache hit rate, memory throughput, and may have different views of the blockchain, called forks. The power consumption. forks may result in double spending problems, i.e., an asset is spent in two distinct transactions. To resolve this issue, con- • We provide suggestions for selecting PoW algorithms sensus protocols must be developed. For example, Bitcoin's in blockchains and designing egalitarian PoW algo- consensus protocol is that only the longest chain among the rithms based on our experiment results. forks is accepted as the finalized transactions, because the The remainder of this paper is organized as follows. We longest chain is likely to be the work of honest nodes. The first introduce the background on blockchain, PoW algo- assumption is that honest nodes in the blockchain system rithms, and computer processors in Section 2. Then we possess more than half of the computation power, so that describe details of the memory-hard PoW algorithms under these honest nodes can always produce the longest chain. study in Section 3. The implementations and optimizations Finally, forks are not always undesirable. Some consensus of each algorithm on different processors are presented in protocols accept forks and turn the blockchain into a tree or Section 4, and the experimental results along with our anal- graph, in order to improve the performance [32]. ysis and findings are shown in Section 5. Finally, we discuss 2.2 Proof-of-Work Algorithms related work in Section 6 and conclude in Section 7.

Load more