<<

-based Metadata Protection for Archival Systems Practical Experience Report Arnaud L’Hutereau∗, Dorian Burihabwa†, Pascal Felber†, Hugues Mercier†, Valerio Schiavoni†

∗Polytech Nantes, France. E-mail: [email protected] †University of Neuchatel,ˆ Switzerland. E-mail: fi[email protected]

Abstract—Long-term archival storage systems must protect approaches do not provide solutions in case of catastrophic from powerful attackers that might try to corrupt or censor failures wiping the entire metadata storage service. (part of) the documents. They must also protect the correspond- This practical experience report proposes and evaluates ing metadata , which is essential to maintain and rebuild the stored data. In this practical experience report, we METABLOCK, a system to protect archived metadata against present METABLOCK, a metadata protection system leveraging tampering, censorship and failures that leverages smart con- the Ethereum distributed ledger. We combine METABLOCK with tracts executed on the Ethereum blockchain. METABLOCK an existing secure long-term data archival system to provide a is generic and can securely and efficiently store metadata, scalable design that allows external auditing, data validation validate metadata, and communicate over an authenicated and efficient data repair. We reflect on our experiences in using a blockchain for metadata protection, with the goal of channel with any storage system. To illustrate our contribution, providing valuable insights and lessons for developers of such we instantiate METABLOCK on top of an existing archival secure systems, by highlighting the potential and limitations of system building on top of STEP- [36] and RECAST the approach. Our prototype is available on Github. [19]. Both these systems protect archived data using data entanglement [17] and error-correcting codes [39], and do I.INTRODUCTION not require massive replication. However, STEP-archives and RECAST consider the protection of metadata as an orthogonal Despite the recent explosion of cryptocurrencies [48], [32], problem, and thus do not protect it. METABLOCK studies [31] and blockchain technologies [45], [16], protecting large how to build mechanisms to further protect metadata against amounts of archived data against tampering remains a difficult tampering and failures on top of such systems, filling this gap. problem, both in theory and in practice. Tamper-proof data The remainder of this paper is organized as follows. Sec- archival has traditionally been achieved with write once read tionII presents the related work, in particular recent proposals many (WORM) hardware approaches and restrictions on phys- for metadata protection. Then, in Section III we provide a ical access to storage media [44]. , although well- brief background on STEP-archives, RECAST and Ethereum, suited for financial transactions and smart contracts, essentially as these are the three existing building blocks upon which massively replicate data at all system nodes and are utterly we report our experience. We detail our threat model in impractical [51] to large amounts of data such as SectionIV. The architecture and our implementation are medical records and genomic data. But depending on the presented respectively in SectionV and SectionVI. We eval- application, metadata can typically be orders of magnitudes uate the performance of METABLOCK to execute validation, smaller than the data itself, and is thus better suited for reconstruction and auditing operations in Section VII. Finally, blockchain technologies. we report on our lessons learned in Section VIII and conclude Metadata management in distributed storage systems has in SectionIX. been extensively studied for the past decades [43]. With a focus on dynamicity and hashing-based techniques [23], [54], II.RELATED WORK solutions have mostly focused on smart metadata partitioning. Protecting archived data against tampering and censorship Only a small number of these solutions (e.g. [29]) have has a rich history and includes, among others, censorship- focused on maintenance of consistency in the metadata. En- resistant file sharing [53], data entanglement [52], WORM tering the age of the cloud, the shared-nothing philosophy was storage [44], and proofs of storage [41], [30]. We refer readers leveraged by systems architects to scale their storage service. to [36] for a recent overview of the state of the art. Replicating metadata in prototype systems such as [46] has Many solutions were proposed to protect metadata using been proposed to boost performance. In widespread production blockchains and other technologies, although most of them storage systems such as Hadoop [42] where the metadata either only provide data access-control and assume that meta- service is typically centralized, replicating metadata has been data cannot be corrupted, or do not provide implementations. known to bring direct benefits [38] including better load bal- The closest solutions to METABLOCK are the decentralized ancing and general throughput improvements. However these storage systems Sia [50], Storj [14], IPFS [21], as well as

1 docker List of uncle List of uncle Recast blocks headers blocks headers ➊ KV Store ➍…. Client Data Blocks Block N Header Block N+1 Header REST Client Proxy REST Service docker Python, Bottle Prev hash Nonce Timestamp Uncle Hash Prev hash Nonce Timestamp Uncle Hash docker ➍ KV Store KV Store Data Blocks … … … … … … …. Metadata ➎ State root TSX root Receipt root State root TSX root Receipt root ➌ request reply docker docker Entanglement ZooKeeper KV Store Tsx Merkle tree Receipt Merkle Python, C, C++ Tsx Merkle tree Receipt Merkle ➋➏ Data Blocks for block N tree for block N for block N+1 tree for block N+1

Fig. 1. The RECAST architecture Fig. 2. Ethereum blockchain

part of the codeword and increase the size of the document recently ICO-founded utility coins such as Filecoin [33] and metadata. The S source blocks are not archived. The strength Maidsafe [6] (yet to launch). Each of these systems includes of the entanglement can be tweaked by changing the values its own storage marketplace, where storage space is sold or of the parameters, and how the entangled blocks are chosen. bought by its users. These solutions rely on erasure codes to The randomness of the entanglement introduces asymmetry protect data against misbehaving storage providers and node in the system. On the one hand, the system can quickly and failures. They can use blockchains for contracts and payments recursively repair recoverable damage. On the other hand, for storage. These blockchains can also protect metadata in the precomputing an efficient irrecoverable attack is NP-hard [35], form of lists of hashes and Merkle trees generated from data [36], and an attacker wishing to destroy a target document files for recovery from failures and unavailable storage nodes, must destroy a large number of other documents. Entan- as well as proofs of storage. By comparison, the metadata glement also affects how RECAST manages metadata: since protection offered by METABLOCK is conceptually simpler: entangled blocks are part of the codewords, RECAST must we do not use Merkle trees and therefore cannot leverage keep track of these entanglement links in order to perform proofs of storage. This design choice stems from the fact that decoding and repair operations, In the existing proposal [19], the data protection guarantees inherited from STEP[36] and the architecture of RECAST suffers from a single point of RECAST [19] are stronger. failure on its metadata service. An attacker could easily take III.BACKGROUNDAND THREAT MODEL down the entire system by compromising the key-value store holding the metadata. While a quick workaround to this To better grasp how METABLOCK works, we briefly in- problem could be the replication of this metadata service troduce two of the main building blocks of the system, (potentially to numerous remote locations), managing trust and namely RECAST, based on STEP-archives, and the Ethereum availability of these replicas might prove challenging. blockchain. We observe however that our proposed metadata protection techniques could be easily adapted to alternative B. Distributed ledgers implementations of archival sytems or distributed ledgers. Distributed legders (also called blockchains) are gaining a lot of traction in several sectors of the information world. A. RECAST Long-Term Archival System and STeP-archives Through the combination of a peer-to-peer network and an RECAST [19] is a distributed censorship-resistant archival electronic currency, bitcoin [37] offers participants a traceable system with strong durability guarantees based on STEP[35], transaction processing system. Participants store bitcoins in [36]. It combines erasure coding and data entanglement [18] wallets, later exchanged via transactions. The transactions to provide increased redundancy and protection against data are sent to the network and collected by miner nodes. Their corruption and tampering. Figure1 depicts the high-level goal is to mine the transactions in a block for a fee. Indeed, architecture of RECAST. The numbers inside the black circles miners have to solve a difficult and expensive cryptographic indicate the execution flow of the operations. Erasure coding challenge, known as proof-of-work (PoW), in order to add the ensures that the system tolerates and recovers from the loss block to the chain. Once successfully mined, the new block of erased blocks composing documents up to a configurable is added to the blockchain, leaving an unalterable trace of the erasure threshold. Data entanglement further increase this transaction, and the miners are paid for the successful work. threshold: blocks are randomly linked across documents, al- The combination of massive replication in the blockchain and lowing recursive reconstruction of failed blocks. an auditable transaction log offers a transparent and resilient In a STEP-archive with parameters (S, T, e, P ) each newly storage mechanism for small data payloads. archived document is partitioned into S source blocks, entan- The Ethereum [24] blockchain is similar to the Bitcoin gled with T blocks already archived (pointers), and encoded ledger. The structure of the chain is depicted in Figure2. using an erasure code to generate P parity blocks that can Ethereum integrates alternative consensus protocols such as correct e block erasures. Only the P parity blocks are archived proof-of-stake [25], mining that favors transactions from big- in the system, although the T pointers to entangled blocks are ger wallets, and proof-of-authority [8], a mining-free consen-

2 Clients 1. Get new 2. Store metadata Storage Messages metadata MetaBlock in smart contract service Asynchronous MQ Validation KV store Node Client Proxy Data blocks REST client REST service … 3. Inform that … metadata is securely stored in blockchain KV store KV store Blockchain Metadata Data blocks Node RabbitMQ Async. messaging KV store Fig. 4. METABLOCK: validation workflow. Metadata Ethereum Node Geth Metadata storage MetaBlock … METABLOCK connects these two components through a series Node Geth of event-driven routines that perform work on messages placed Blockchain in the queue. These METABLOCK internal messages can be divided into two categories: validation and reconstruction Fig. 3. Architecture of METABLOCK, integrated with RECAST and the Ethereum blockchain. messages. With these messages, RECAST can issue requests to store metadata in the blockchain or signal when an entry from its metadata service is corrupt or unavailable. In addition, sus algorithm based on majority voting among authenticated the METABLOCK operator can launch audits and reuse the nodes named validators. In addition, it popularizes the concept reconstruction pipeline to correct errors detected during that of smart contracts [10]. Specifically, a smart contract allows process. the execution of custom code on the blockchain for a fee RECAST generates two types of metadata entries for each measured in gas. These contracts are written in Solidity [26], archived document, one for the document itself (called meta- a contract-oriented programming language for the Ethereum doc) and another for each of the P chunks of data archived virtual machine (EVM), and can operate on transaction data. as part of the document (called metablocks). The information The correct implementation of smart contracts still presents for each of these metadata entries is then stored, using an several open challenges. While approaches to write secure appropriate data structure for each type, in a dedicated smart contracts exist [22], [47], open vulnerabilities can be and have contract. Hence, a STEP-archive with parameter P requires been exploited by participants in the blockchain. For instance P +1 transactions per archived document, one for the metadoc contracts have been linked to fraudulent activities [20], and and P for the metablocks. The metadoc of a document caused significant damages (e.g., DAO theft incident [40]). includes the address of all the blocks that make up the codeword of the document, including the pointers to entangled IV. THREAT MODEL blocks. These pointers reference blocks already archived and We assume that the attacker is able to tamper with the as such do not require new metablocks. We emphasize that our metadata service of a storage system. We consider tampering work is generic and can easily be tailored to file systems that with the metadata service as any action undertaken to ma- use replication or standard erasure-correcting codes without liciously modify, censor, corrupt or otherwise delete part or entanglement. all the metadata. These actions include completely shutting We detail the interactions between the different components down access to the metadata service, as it would preclude of METABLOCK in the remainder of this section. access to centralized metadata services acting as single points of failures. Consequently, we design METABLOCK as an A. Validation independent system capable of durably storing and validating The main feature of METABLOCK lies in the resilience and metadata and communicating over a secure channel with the integrity guarantees it provides for the information it stores. storage system. We discuss the advantages and drawbacks of This is made possible by the use of Ethereum’s smart contracts this low coupling of the two systems (data and metadata) in to process and store the metadata in a three-step validation further sections. process, which we detail next. The first step of this process is the reception of the new metadata. In addition to storing V. ARCHITECTURE the information in the storage system, the metadata must also This section presents the design and architecture be sent to METABLOCK. This is done by sending a copy of of METABLOCK. To better illustrate its behaviour, Figure3 the metadata to METABLOCK’s message queue for validation. depicts METABLOCK integrated with RECAST, although The following steps are executed asynchronously to minimize its design could easily accomodate the integration with the impact on the performance of the original storage system alternative storage archival systems. and keep the latencies below safety margins. To operate on the METABLOCK is made of two components: a message queue input data, METABLOCK formats the metadata to send into the and a blockchain. The message queue serves as the interface smart contract; this is the second step of the workflow. This for the storage system (e.g. RECAST), whereas the blockchain operation produces a new transaction that is processed by the serves as the backend for metadata storage. The heart of blockchain. This results in a hash used to guarantee that the

3 1. Get entries 2. Read metadata (if any), and (4) format and replace old metadata in the host Messages to rebuild MetaBlock in blockchain Asynchronous MQ system database by METABLOCK metadata. METABLOCK can Reconstruction Node either do an on-the-fly reconstruction initiated by the host system following a request from a potential client, or a full … 3. Delete old entries in reconstruction during a self-audit. database (if needed) KV store Blockchain Metadata 4. Insert blockchain metadata Node C. Audit

Fig. 5. METABLOCK: reconstruction workflow. The METABLOCK’s audit aims to check all metadata from the blocks stored in the blockchain of METABLOCK and those stored in the host system of RECAST. This process operates metadata has properly been stored in the blockchain and thus on a snapshot of the metadata’s host system database, to avoid validated. The entry is now available and can be queried by concurrent modifications while the auditing is ongoing. Audit sending a read request to any of the nodes in the blockchain. works in two distinct steps. First, for each metadata entry The archival system (e.g.,RECAST) can now rely on the stored in METABLOCK, it checks if this entry is present in correctness of the metadata to perform data entanglement, the metadata database of the host system, RECAST. To do leveraging the availability of this hash to ensure that the entries this, it compares all fields, one by one, to make sure that both selected are trustworthy. In other words, the entanglement structure and values are identical. If a different value is found process is guaranteed to be performed on documents whose for a field, or if an entry is not found in the metadata database, metadata is stored and validated in the blockchain; this is the entry is sent to the reconstruction engine. Otherwise, when fundamental because in our entangled system discrepancies both entries match exactly, a new “audit” field is created between a document and its metadata can propagate to other to indicate that this entry has been verified during the first documents. The third and final step of the validation signals to step. Second, METABLOCK scans the metadata in the other the storage system’s metadata service that the corresponding direction, comparing each entry of the metadata database and entry has been stored in the blockchain. matching it to the corresponding entry in the METABLOCK. While METABLOCK’s validation process can be broken If an entry is not present in the blockchain and the validation down into simple steps, it is important to remember its asyn- field is not set to true (the entry is not being validated in a chronous nature at integration time. Indeed, while a metadata block), it means that the entry is useless and it is removed entry is being validated, it is already available within the from the database. Likewise, if an entry in the database is storage system’s metadata service, possibly for a long time, different from the corresponding entry in METABLOCK, it is before it is considered valid and trustworthy. The goal of removed, and the name is sent to the reconstruction engine. a METABLOCK administrator should be to minimize this In order to keep the process as light as possible and to useless time by increasing the number of instances reading mitigate the cost of the second step, METABLOCK only the metadata from the message queue, or decreasing the block integrally checks the entries in the metadata database without generation time. Increasing the number of processes in charge an “audit” field. of reading new metadata from the message queue is a good to deal with sudden spikes in validation requests Metablock’s validation and repair capabilities could benefit but does not minimize the average validation time, whereas other storage solutions beyond RECAST. Its asynchronous decreasing the block generation time leads to a lower average nature ensures that its integration comes with a minimal perfor- validation time but incurs a greater number of blocks and mance impact while still providing better integrity guarantees a greater storage overhead in the blockchain. It is therefore than traditional . While some specific engineering recommended that users of METABLOCK fine-tune these two work may be needed, the bulk of the integration boils down parameters in order to find the best validation performance- to implementing a producer for the METABLOCK’s insertion cost tradeoff given their priorities. queue and a consumer for its repair queue.

B. Reconstruction VI.IMPLEMENTATION The reconstruction feature of METABLOCK operates asyn- As described in SectionV,M ETABLOCK is made of two chronously. The entry point of this feature is a dedicated main components, namely a message queue and a blockchain. channel of the asynchronous message queue. Reconstruction These two components are tied together by event-triggered requests can come from two different places: the backing scripts. In the reminder of this section, we provide details on storage system, when it detects inconsistencies during the our implementation of the system as well as the technologies retrieval of metadata in the metadata database, and the audit deployed and their use. feature of METABLOCK, described in the next section, which The first component is the message queue that serves as verifies the completeness of the metadata system. Figure5 the interface between METABLOCK and the host system. We shows the four steps of metadata reconstruction: (1) read an use RabbitMQ 3.7.7 [9], an open-source and highly efficient entry from the message queue, (2) read metadata from the message broker implemented in Erlang. We configure this blockchain, (3) delete useless metadata entries in the database component with separate channels, to handle the two types

4 I n-(1,10,2,3) n-(1,20,2,3) n-(5,10,2,7) RECAST CONFIGURATIONS USED IN THE EVALUATION (LEFT-MOST COLUMN).WE ALSO INDICATE THE NUMBER OF METADATA ENTRIES 4000 THEYGENERATEPERDOCUMENTASWELLASTHENUMBEROFPOINTERS THEYREFERENCE. 3000 Configurations n-(S,T,e,P) Entries Pointers x n-(1,10,2,3) 4 10 2000 Size (kB) y n-(1,20,2,3) 4 20 1000 z n-(5,10,2,7) 8 10 0 0 100 200 300 400 of messages: validation and reconstruction. Note that the Number of documents channels are configured to allow the transit of messages in clear, without any use of techniques. We further Fig. 6. Storage overhead discuss this choice in Section VIII. The second component of METABLOCK is the underlying A. Evaluation Settings blockchain. We based and built our prototype on top of Ethereum (introduced in Section III). We leverage Ethereum We evaluate METABLOCK in a distributed setting on a for its private deployment capabilities, its alternative consen- local cluster of machines. The host machines have 8GB of sus algorithms, the easy deployment of smart contracts and RAM and 8-Core Xeon CPUs and are interconnected over a its configuration flexibility to realize the features listed in 1 GB/s switched network. More precisely, we use three virtual SectionV. We settle on the proof-of-authority [8] consensus machines (VM) using the KVM hypervisor [5]. Each VM algorithm. This choice has several advantages: (1) it removes runs Ubuntu 16.04 and direct (e.g., host-passthrough) 1 the expensive mining process of proof-of-work [28]; (2) it access to underlying Xeon CPU cores using KVM. Given the saves on resource usage; and (3) it limits participation to our default configuration of Geth [4] (a command-line interface for blockchain to authenticated nodes. running a full Ethereum node), we allocate 1 GB of RAM per A smart contract is used to store the metadata in dedicated available CPU core. data structures. These are maintained, queried and modified The different components, for both METABLOCK and RE- using their built-in public interfaces, e.g. without the need CAST are orchestrated using Docker Swarm [15]. We rely on the default scheduling strategy in Docker Swarm, (i.e., of further customization. As METABLOCK is designed to be emptiest (loosely) coupled with storage systems that split documents in [11]), to achieve the best load balancing outcome separate blocks (a typical scenario for archival storage systems in terms of number of containers per VM. Note however that leveraging erasure-correcting codes), we make the distinction this choice has no effect on the observed performances. between document metadata and block metadata. For this B. Storage overhead reason, we use different hash tables to store the document The results of our storage overhead study are presented in and block information. In addition, an index is maintained Figure6. In this experiment, we insert representative metadata for faster querying of both structures. The smart contract is of 400 documents and measure the cost of storage after every implemented in Solidity [12] (v0.4.20) and consists of 80 lines 100th insertion. We perform this insertion with a changing of code running on top of Ethereum (v1.8.14). number of key-value pairs per document based on the three The core of the METABLOCK prototype consists of 4150 RECAST configurations described in TableI. lines of Python (v3.6) as well as various shell scripts. To We run these tests in isolation on METABLOCK but send communicate with the message queue and the blockchain, realistic metadata payloads that could be generated from we use existing open-source tools and , respectively storage client requests to RECAST. From this simulation, we Pika [7] (v0.12.0) and Web3.py [13] (v4.6.0). observe that the initialization cost of an empty blockchain METABLOCK is packaged using Docker [1], it can be is approximately 60kB, which can fluctuate by around 2kB easily deployed using Docker-Compose [2] and distributed depending on the state initialization, the artifacts generated over multiple machines via Swarm [3]. during the bootstrap of the database, as well and external con- Our prototype is open-source and available at ://github. ditions (e.g. time, network, logs, . . . ). This cost is negligible com/ArnaudLhutereau/mb. as the archive (and the blockchain) grows. Having recorded the storage space every 100 documents, VII.EVALUATION we can see from Figure6 that the size of the metadata grows linearly with the number of documents. The major difference In this section, we present our evaluation of METABLOCK between Configurations x and y on one side and Configura- integrated with RECAST. The rest of this section introduces tion z on the other side is that Configuration z requires eight the evaluation settings, the configuration used for evaluation and our results. 1http://www.linux-kvm.org/page/Tuning KVM

5 Similarly to what we observe in the storage overhead 3 nodes 16 nodes 32 nodes experiment (see VII-B), the RECAST configuration has little 6 s impact on the overall validation time. Neither does the size of the blockchain cluster. However, we can see that a greater 5.5 s number of entries that make up the document’s metadata can validated slow down the process as the median time for Configuration z 5 s 100% 99% lags behind the other two Configurations. This slow-up can 95% be explained by a greater number of transactions necessary 4.5 s 90% 75% (1 per metadata entry) for the document to be stored in the 50% 4 s blockchain. The most important factor is the block generation time set on the Ethereum blockchain. In METABLOCK, the block gen- eration time is set to 5 seconds. As a consequence, validation n−(1,10,2,3)n−(1,20,2,3)n−(5,10,2,7) n−(1,10,2,3)n−(1,20,2,3)n−(5,10,2,7) n−(1,10,2,3)n−(1,20,2,3)n−(5,10,2,7) times fall around that number for all configurations tested. This choice of block generation time is arbitrary but stems from the need to avoid the creation of numerous empty blocks when Fig. 7. Metadata validation per document, different RECAST configurations idle while still offering acceptable validation time. The block generation time being a value in METABLOCK’s configuration, transactions per document (one for the document itself (meta- it may be adjusted to fit the needs of the storage system it doc) and seven for the new data chunks (metablocks)), whereas supports. Configurations x and y only require four transactions per A solution to minimize the validation time in this imple- document (one metadoc and three metablocks). Consequently, mentation could be to dynamically increase the number of the metadata storage cost for Configuration z is twice as large. connections from the entrypoint script to the blockchain to The cost per transaction is 1.3kB for Configuration x, improve resource utilization and increase system throughput. 1.45kB for Configuration y, and 1.25kB for Configuration Such a modification could help bridge the gap between the 5s z. Configuration y is more costly because it must store 20 limit and the results presented in figure7. pointers to existing data blocks in each document transaction, D. Reconstruction whereas Configuration x and z only store 10 pointers. Fur- Figure8 shows the results obtained when rebuilding the thermore, transactions for the data chunks are less costly than metadata of 100 documents using the three different RECAST transactions for the documents themselves, which explains configurations listed in TableI. Following the workflow of why transactions for Configuration z are on average less Figure5, note that this rebuild erases all entries from the host’s costly. metadata database document that is being rebuilt, and recopies Note that Ethereum generates empty blocks (i.e., blocks that them from the entries stored in the blockchain. do not contain any transaction) at a rate determined by the Similarly to the validation experiment, the greater the num- block generation time (5s in our setup). The default block ber of entries, the longer the reconstruction. Between Configu- size being 4kB, a back of the envelope estimation tells us that rations x and y (entanglement with 20 pointers instead of 10), our blockchain, left idle, could grow up in storage space to only the size of a stored field increases, which does not impact 25GB every year. A long-term archival strategy could be to the overall reconstruction time. In contrast, Configuration z roll the system on to a fresh blockchain on a regular basis and stands out, with a reconstruction time significantly longer due keep a copy of the older blockchains on a small number of to the larger number of data chunks, and thus of transactions, nodes. for each archived document. To repair the metadata of a corrupted document, the recon- C. Validation struction routine sequentially reads all the entries related to This experiment measures the validation time of a document the document from the blockchain before writing them back sent by a client to the storage system and validated by the to the host storage system. While the sequential nature of blockchain as described in Figure7. We evaluate the validation this process suggests that the reconstruction time should be of 100 documents using the three different configurations of linear in the number of entries, the results prove otherwise. RECAST listed in TableI. The number of validating nodes Indeed where we would expect the median time to reconstruct is kept to 2 for all these experiments and the number of a document to double from Configurations x to z, we see concurrent documents being validated is locked to 8. The it is multiplied by at least three. We explain this behaviour results from an average of 10 runs are presented in Figure7. by the event-driven nature of the repair routines triggered as For the purpose of this experiment and to minimize network soon as a request is read from the message queue. A large influence, we run RECAST and METABLOCK on the same number of reconstruction requests may produce a heavy load machine: a Intel(R) Xeon(R) CPU E5-2683 with 64 cores and on the nodes queried and in consequence introduce a greater 128GB or RAM backed by an SSD. slowdown to the reconstruction time than expected. Great

6 n−(1,10,2,3) n−(1,20,2,3) n−(5,10,2,7) n−(1,10,2,3) n−(1,20,2,3) n−(5,10,2,7) 100 120

80 100

60 80

40 60

% reconstructed 20 40 Latency (ms)

0 20 0 0.5 1 1.5 2 2.5 3 3.5 Time (s) 0

Fig. 8. Reconstruction time for different RECAST configurations Fig. 9. Metadata auditing care must be taken in the event of numerous repair requests latency measured in milliseconds. In fact, latencies are in in terms of load balancing of the read requests. Another the orders of seconds [27], results that have been further improvement over the current algorithm would be to rebuild confirmed by our own deployments. Similar observations have more selectively. Instead of bulk-rewrites of all the metadata been recently reported by the most recent version of the Storj for a document and its blocks, the reconstruction would only white paper [14], in which the authors mention that “waiting fix the corrupted entries. Reconstruction time and bandwidth for a cluster of nodes to probabilistically come to agreement consumption would benefit for configurations with numerous on a shared global ledger is a non-starter.” blocks per document such as Configuration z. Applicability perspectives. A promising application to use E. Audit in conjunction with METABLOCK is write once read many (WORM) systems, which must offer stringent and highly Figure9 shows the results of the audit experiment performed regulated security as well as durability guarantees. WORM on a set of 100 document metadata, following the procedure technologies can typically be used for long-term archival of explained in SectionV. The figure shows the average time to large, cold and immutable data blocks [34], thus limiting verify that all entries related to a document are valid in the the number of blockchain operations and overall latency of host storage system using METABLOCK as the correct source. Once again, we see close results between Configurations metadata validation. In METABLOCK, once data is archived and the corresponding metadata is validated, the bottleneck x and y of RECAST, which shows that a difference in the length of a stored field has little impact on the auditing time disappears and reading data is orthogonal to the technology as opposed to a change in the number of metadata entries used to secure the metadata. Metadata of large data objects is for Configuration z that directly slows down the process. orders of magnitude smaller than the data objects themselves, This slowdown is due to the number of additional transactions which ensures the good scalability of METABLOCK in this needed to read the different entries from the blockchain. Over- setting. all, we see a linear relation between the number of metadata Lack of encryption is not necessarily bad. Our archi- entries in a configuration and the auditing time. In the current tecture exposes a message queue used to exchange blocks back and forth with the blockchain. As explained before version of METABLOCK the auditing is sequential, verifying one entry at a time, but the parallelization by document or (SectionV), and perhaps counter-intuitively, we do not enforce even by entry could easily be achieved to improve the process any form of encryption over the channels, such as TLS. Doing throughput. so, we avoid affecting negatively the achievable throughput of the message channel. Furthermore, this offers to malicious VIII.LESSONS LEARNED users (compromised operating systems, hypervisors, system Through the implementation, deployment and evaluation of administrators) unrestricted inspection capabilities of the meta- METABLOCK we acquired insight into several aspects of such data message, a scenario fully supported by our threat model a system. This section highlights the most important lessons (SectionIV). learned. Choice of consensus algorithm. This choice plays an Blockchain performance: not there yet. The Ethereum important role on the performances of the blockchain com- ecosystem is rich and varied, and offers strong out of the box ponent. We settled on the proof-of-authority [8] consensus anti-tampering properties, in particular due to its by-design algorithm. This choice brings several advantages: no expensive immutable nature. Furthermore, leveraging smart contracts mining stemming from the proof-of-work, lighter resource to secure metadata in METABLOCK did not require major load (opening the path to deploy this system on edge efforts. However, although many optimizations are with lower computing capabilities), and the ability to control possible (as mentioned previously, for instance to parallelize the membership of the nodes in the ledger. the auditing routine), in their current state blockchain oper- The size of the ledger network does not matter. We ations are slow and unsuitable for storage systems targeting tested METABLOCK on a small private ledger network, with

7 validate, audit and reconstruct corrupt records in a storage Storage API Program system, METABLOCK harnesses the integrity features of the blockchain for metadata protection. The asynchronous design Node of METABLOCK makes its integration to existing storage sys- tems seamless and minimizes its impact on the overall system Store Messages … Asynchronous MQ performance. We showed that METABLOCK is well-suited to Connection to MQ Blockchain work with immutable storage systems using erasure-correcting Node codes with realistic and practical parameters. Finally, with some additional engineering efforts, METABLOCK could be KV store Metadata extended to support other systems as well as have some of its processes improved by work parallelization. Fig. 10. Integrating METABLOCK with other systems. ACKNOWLEDGMENT This research has received funding from the European few nodes running on premises. A bigger network would Union’s Horizon 2020 - The EU Framework Programme for have no impact on the storage overhead per node (as seen Research and Innovation 2014-2020, under grant agreement in Figure7). Also, there would be similar side-effects on the No. 653884. costs of reconstruction and auditing (scaling linearly with the REFERENCES size of the network). Integration is smooth with containers. Our micro-service [1] Docker. https://www.docker.com/. [2] Docker compose. https://docs.docker.com/compose/. prototype extensively exploits Docker containers and orches- [3] Docker swarm mode. https://docs.docker.com/engine/swarm/. trators. This allows deployers to switch between alternative [4] Geth Ethereum Command Line Interface. https://github.com/ethereum/ storage backends, different archival storage systems or coding go-ethereum/wiki/geth. [5] KVM. http://www.linux-kvm.org/page/Main Page. schemes and even switching to other blockchains. In practice, [6] Maidsafe. https://maidsafe.net. the integration of a new system is done by automating the [7] Pika. https://pika.readthedocs.io/en/stable/. [8] Proof-of-Authority. https://github.com/paritytech/parity/wiki/ sending of metadata to METABLOCK’s message queue using Proof-of-Authority-Chains. the standard AMQP protocol (as seen in Figure 10). Provi- [9] RabbitMQ. https://www.rabbitmq.com/. sioning the blockchain with a custom smart contract may be [10] Smart Contracts. http://www.fon.hum.uva.nl/rob/Courses/ needed to cover the specific needs of the new system, however InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo. best.vwh.net/smart.contracts.html. our implementation for RECAST should cover most metadata [11] Swarm Emptiest Strategy. https://docs.docker.com/docker-cloud/ storage needs. Finally, the audit and reconstruction processes infrastructure/deployment-strategies/. require a more involved effort as authors must be able to read [12] The Solidity Programming Language. https://solidity.readthedocs.io/en/ v0.4.0/. and reconcile data from their own system and METABLOCK. [13] Web3.py. https://github.com/ethereum/web3.py. Migration: think twice about your data schema. While [14] Storj: A decentralized cloud storage network framework, v3.0. https: //storj.io/storj.pdf, 2018. storing data in Ethereum offers strong integrity guarantees, [15] Swarm mode overview, Aug. 2018. it presents challenges in terms of schema migration. Should [16] E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, the layout of the metadata change at some point in time, one A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich, et al. Hyperledger Fabric: a distributed operating system for permissioned would need to deploy a new contract with the modified schema blockchains. In Proceedings of the Thirteenth EuroSys Conference, and proceed to pipe the data from the older contract to the page 30. ACM, 2018. new one. In addition to the extra validation cost, the extra [17] J. Aspnes, J. Feigenbaum, A. Yampolskiy, and S. Zhong. Towards a theory of data entanglement. Theoretical , 389(1- storage overhead increases the toll of any migration operation. 2):26–43, 2007. In the future, we expect these operations to be facilitated by [18] J. Aspnes, J. Feigenbaum, A. Yampolskiy, and S. Zhong. Towards a specialized migration tools. theory of data entanglement. Theoretical Computer Science, 389(1):26– 43, Dec. 2007. EU-based developers and GDPR. Finally, an interesting [19] R. Barbi, D. Burihabwa, P. Felber, H. Mercier, and V. Schiavoni. Recast: observation is the relevance of using blockchains for protecting Random entanglement for censorship-resistant archival storage. In metadata of personal data (e.g., medical records) in relation to 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 171–182, June 2018. the European general data protection regulation (GDPR) [49]. [20] M. Bartoletti, S. Carta, T. Cimoli, and R. Saia. Dissecting Ponzi schemes The current consensus of legal experts is that destroying on Ethereum: identification, analysis, and impact. arXiv:1703.03779 decryption keys satisfies the right of erasure as defined in the [cs], Mar. 2017. arXiv: 1703.03779. [21] J. Benet. Ipfs - content addressed, versioned, p2p file system, 2014. GDPR, even if the encrypted data and the metadata are not [22] K. Bhargavan, A. Delignat-Lavaud, C. Fournet, A. Gollamudi, physically destroyed. This does not, therefore, preclude the G. Gonthier, N. Kobeissi, N. Kulatova, A. Rastogi, T. Sibut-Pinote, use of immutable blockchains for metadata protection. N. Swamy, et al. Formal verification of smart contracts: Short paper. In Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security, pages 91–96. ACM, 2016. IX.CONCLUSION [23] S. A. Brandt, E. L. Miller, and D. D. E. L. and. Efficient metadata management in large distributed storage systems. In 20th IEEE/11th We presented METABLOCK, a metadata protection system NASA Goddard Conference on Mass Storage Systems and Technologies, built on top of the Ethereum blockchain. With tools to 2003. (MSST 2003). Proceedings., pages 290–298.

8 [24] V. Buterin. Ethereum: A next-generation smart contract and decentral- [49] P. Voigt and A. Von dem Bussche. The EU General Data Protection ized application platform. Technical report, 2013. http://ethereum.org/ Regulation (GDPR). A Practical Guide, 1st Ed., Cham: Springer ethereum.. International Publishing, 2017. [25] V. Buterin. Slasher: A punitive proof-of-stake algorithm. https://blog. [50] D. Vorick and L. Champine. Sia: Simple decentralized storage. https: ethereum.org/2014/01/15/slasher-a-punitive-proof-of-stake-algorithm, //sia.tech/sia., 2014. 2014. [51] M. Vukolic.´ The quest for scalable blockchain fabric: Proof-of-work [26] C. Dannen. Introducing Ethereum and Solidity. Springer, 2017. vs. BFT replication. In International Workshop on Open Problems in [27] T. T. A. Dinh, J. Wang, G. Chen, R. Liu, B. C. Ooi, and K.-L. Tan. Network Security, pages 112–125. Springer, 2015. BLOCKBENCH: A Framework for Analyzing Private Blockchains. In [52] M. Waldman and D. Mazieres. Tangler: a censorship-resistant publishing Proceedings of the 2017 ACM International Conference on Management system based on document entanglements. In Proceedings of the 8th of Data, SIGMOD ’17, pages 1085–1100, New York, NY, USA, 2017. ACM conference on Computer and Communications Security, pages ACM. 126–135. ACM, 2001. [28] C. Dwork and M. Naor. Pricing via processing or combatting junk [53] M. Waldman, A. D. Rubin, and L. F. Cranor. Publius: A Robust, Tamper- mail. In Annual International Cryptology Conference, pages 139–147. Evident Censorship-Resistant Web Publishing System. In 9th USENIX Springer, 1992. Security Symposium, pages 59–72, 2000. [29] Y. Gao, X. Gao, X. Yang, J. Liu, and G. Chen. An efficient ring-based [54] S. A. Weil, K. T. Pollack, S. A. Brandt, and E. L. Miller. Dynamic metadata management policy for large-scale distributed file systems. metadata management for petabyte-scale file systems. In Proceedings pages 1–1. of the 2004 ACM/IEEE Conference on Supercomputing, SC ’04, pages [30] S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg. Proofs of 4–. IEEE Computer Society. ownership in remote storage systems. In Proceedings of the 18th ACM conference on Computer and communications security, pages 491–500. Acm, 2011. [31] G. Hileman and M. Rauchs. Global cryptocurrency benchmarking study. Cambridge Centre for Alternative Finance, 2017. [32] S. King. Primecoin: Cryptocurrency with prime number proof-of-work. July 7th, 2013. [33] P. Labs. Filecoin: A decentralized storage network. https://filecoin.io/ filecoin.pdf, 2017. [34] D. Lomet and B. Salzberg. Access methods for multiversion data. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, SIGMOD ’89, pages 315–324, New York, NY, USA, 1989. ACM. [35] H. Mercier, M. Augier, and A. K. Lenstra. STeP-archival: Storage integrity and anti-tampering using data entanglement. In Proceedings of the 2015 International Symposium on Information Theory (ISIT), pages 1590–1594, 2015. [36] H. Mercier, M. Augier, and A. K. Lenstra. Step-archival: Storage integrity and tamper resistance using data entanglement. IEEE Trans- actions on Information Theory, 64(6):4233–4258, June 2018. [37] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system. 2008. [38] S. Niazi, M. Ismail, S. Haridi, J. Dowling, S. Grohsschmiedt, and M. Ronstrom.¨ Hopsfs: Scaling hierarchical file system metadata using newsql . In 15th USENIX Conference on File and Storage Technologies (FAST 17), pages 89–104. [39] W. W. Peterson, W. Wesley, E. Weldon Jr Peterson, E. Weldon, and E. Weldon. Error-correcting codes. MIT press, 1972. [40] R. Price. Digital currency Ethereum is cratering because of a $50 million hack. [41] H. Shacham and B. Waters. Compact proofs of retrievability. In International Conference on the Theory and Application of Cryptology and Information Security, pages 90–107. Springer, 2008. [42] K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST ’10, pages 1–10. IEEE Computer Society. [43] H. J. Singh and S. Bawa. Scalable metadata management techniques for ultra-large distributed storage systems – a systematic review. 51(4):82:1– 82:37. [44] R. Sion. Strong WORM. In Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems, ICDCS ’08, pages 69–76, Washington, DC, USA, 2008. IEEE Computer Society. [45] M. Swan. Blockchain: Blueprint for a new economy. ” O’Reilly Media, Inc.”, 2015. [46] A. Thomson and D. J. Abadi. CalvinFS: Consistent WAN replication and scalable metadata management for distributed file systems. In 13th USENIX Conference on File and Storage Technologies FAST 15), pages 1–14. [47] P. Tsankov, A. Dan, D. D. Cohen, A. Gervais, F. Buenzli, and M. Vechev. Securify: Practical security analysis of smart contracts. arXiv preprint arXiv:1806.01143, 2018. [48] P. Vigna and M. J. Casey. The age of cryptocurrency: how bitcoin and the blockchain are challenging the global economic order. Macmillan, 2016.

9