Stampery Blockchain Timestamping Architecture (BTA)
Total Page:16
File Type:pdf, Size:1020Kb
Stampery Blockchain Timestamping Architecture (BTA) Adán Sánchez de Pedro Crespo Luis Iván Cuende García ∗ Stampery, CTO Aragon, CEO [email protected] [email protected] Version 6 – LTS Published on October 16, 2016 Last reviewed on November 13, 2017 Abstract A method for timestamping, anchoring and certification of a virtually unlimited amount of data in one or more blockchains, focusing on scalability and cost-effectiveness while ensuring existence, integrity and ownership by using cryptographic proofs that are independently verifiable by anyone in the world without disclosure of the original data and without the intervention of the certifying party. I. Introduction lockchains—understood as immutable, decentralized, distributed and unedited digital ledgers—have already proved their undeniable capability to serve as a universal support B for a distributed monetary system. Since the appearance of Bitcoin[1] and its revolutionary approach to digital cash, there have been many attempts to develop methods allowing anyone to leverage the immutability of the blockchain technology for purposes other than the transfer of currency. arXiv:1711.04709v1 [cs.CR] 13 Nov 2017 The Bitcoin protocol itself provides an opcode[2] called OP_RETURN[3] that lets the participants of the network to embed arbitrary data into the transactions they broadcast to the Bitcoin blockchain since version 0.9.0 of Bitcoin Core, released in March 2014. Since OP_RETURN was made available, tools like Proof of Existence[4], Stampery[5] or Tierion[6] have been making use of the Bitcoin blockchain to timestamp and verify data. Nevertheless, the OP_RETURN method has several limitations that are intrinsic to the very specificities of the bitcoin protocol: ∗Luis Iván—Stampery’s former CTO and now one of our advisors—is credited here for being the original deviser and developer of BTA’s very first version back in 2014. 1 Stampery Blockchain Timestamping Architecture (BTA) • Version 6 • Oct 2016 • In order to keep the average transaction and block size inside acceptable bounds, the maximum length for a single OP_RETURN operation is 80 bytes. • It is considered non-standard for a transaction to carry more than one OP_RETURN output. • Due to the existing block size limit, the bitcoin network can handle up to 7 transactions per second under ideal circumstances, but the actual value is even lower. • The fee that must be paid to the network in order to rest assured that all transactions are accepted and validated in a reasonable time span is high and may keep rising over time as Bitcoin value fluctuates. Broadcasting a transaction containing a single OP_RETURN with a certain guarantee that it will be written into the next block and confirmed nearly immediately can cost up to 120,000 satoshi (0.0012 Bitcoins), which at the time of the last review of this document roughly amounts to $3.12. In a very first approach to overcoming some of those limitations, the aforementioned timestamping services do not actually embed the whole piece of data to be timestamped. Instead, they calculate a cryptographic hash of the data, which serves as a univocal identifier that has a length between 32 and 64 bytes and therefore fits into a single bitcoin transaction. This is what we refer as data anchoring, blockchain timestamping or simply stamping. Nevertheless, there are some use cases which require bulk timestamping of hundreds or thousands of files; or timestamping the same piece of data over and over in its successive versions. This would require expending a huge amount of money in transaction fees and would quickly lead to saturation of the previously mentioned bitcoin block size limit. These consequences would obviously render the OP_RETURN method unsuitable for such use cases. It is worth mentioning that other Proof-of-Work blockchains like Ethereum’s or Litecoin’s do not offer a definitive solution for those problems. Although they may benefit from lower transaction fees at the moment, given a huge volume of stamping actions, the limitations would be roughly the sames. This technical paper proposes a method and architecture that allows (1) carrying out scalable and cost-effective blockchain data anchoring (overcoming all the already referred limitations), (2) generating irrefutable proofs of anchoring and (3) verification of the validity of such proofs by any individual or system in the world. II. Blockchain Anchoring lockchain anchoring or blockchain timestamping consists in leveraging the immutability and B security features of public blockchain technologies to certify, ensure and prove that a document, communication or dataset existed at a certain point in time by generating a digital proof of existence. Unlike signature-based timestamping and other digital evidentiary tools that existed before the 2 Stampery Blockchain Timestamping Architecture (BTA) • Version 6 • Oct 2016 emergence of blockchains and their consensus algorithms, blockchain anchoring does not require relying on a potentially corruptible third party as a source of truth. Instead, blockchain anchoring relies on mathematically provable digital documents that are backed by the same consensus algorithms found in the blockchains used to perform the anchoring process. The conflicting interests of the numerous and heterogeneous participants of blockchain networks make it impossible for a majority of them to collude with the purpose of altering or tampering with the timestamps. Given the historic evolution of the volume of monetary value stored in form of tokens or digital coins in the main public blockchains, the chance for these to completely disappear in the next decades or even centuries is very limited, which ensures that proofs generated by blockchain anchoring methods will be available and enforceable far in the future. Most blockchain anchoring services make use of one-way “hash functions” to generate unique fingerprints of the data being anchored and publicly distribute those in such a manner that anyone holding the original data can verify that they match, but it is impossible for anyone else to retrieve or reconstruct the original data with the only knowledge of the fingerprint. This property guarantees that the proofs are completely free of privacy concerns and therefore they can be published safely on public blockchains. From a technical standpoint, the stamping or anchoring itself is performed by: 1. Applying a hash function (commonly SHA-256[7]) to the data unit to be anchored, 2. writing the resulting hash into a blockchain transaction, 3. broadcasting the transaction into one or more blockchains, and finally 4. distributing or storing the data of the blockchain transaction along the metadata of the anchored data unit. The time assumed to be the timestamp is the one contained in the information of the block in which the broadcasted transaction gets included, which can be easily looked up in block explorer websites and services. While the precision and accuracy of these timestamps is not comparable to that of atomic clocks, it happens to be good enough for most use cases. Because of the increasing cost of broadcasting transactions in the main public blockchains, there are some cryptographic constructions that can be applied in order to aggregate several data units into only one hash and send them in a single blockchain transaction in a similar fashion to how transactions themselves are aggregated into every block in a blockchain. This makes the cost of the anchoring process dependent on the period between transactions that the anchoring party decide to establish rather than on the number of data units being anchored. This way, a virtually unlimited number of data units can be anchored into a single blockchain transaction, making the unitary cost of the each stamp approach asymptotically to zero. 3 Stampery Blockchain Timestamping Architecture (BTA) • Version 6 • Oct 2016 tc(n) = n · fee ∈ O(n) tc(n) = fee ∈ O(1) tc(n) tc(n) fee 1 cpt(n) = = fee ∈ O(1) ppt(n) = = ∈ O( ) n n n n tc(n) tc(n) 5 cpt(n) 5 cpt(n) 3 3 1 1 1 3 5 1 3 5 Figure 1: Without aggregation, the total cost is Figure 2: With aggregation, the total cost al- directly proportional to the number of ways equals the transacion fee (con- timestamps (linear) while the cost per stant) while the cost per timestamp is timestamp always equals the transaction inversely proportional to the number of fee (constant). timestamps (rectangular hyperbola). III. Hashes aggregation Many single hashes corresponding to different pieces of data or files can be compiled into a single hash by building a binary tree in which every leaf node is populated with each of the hashes while every non-leaf node is populated with the hash of the merger of its child nodes. This type of tree is commonly known as Merkle tree[8]. α β γ δ &.&. αβ γδ &. αβγδ Figure 3: Example of Merkle tree with 4 hashes as leaf nodes. Merkle trees have been used for decades to perform efficient and secure verification of the contents of large data structures[9]. Indeed, popular blockchain systems like Bitcoin, Ethereum or Litecoin rely on Merkle trees for gathering together transactions into every block of the chain. 4 Stampery Blockchain Timestamping Architecture (BTA) • Version 6 • Oct 2016 The tip node in the tree—the hash resulting from aggregation of all the hashes in the leafs—is called Merkle root or simply root. The binary hash chain[10] proving that a certain hash belongs to a tree—further discussed in Section V—is called Merkle proof or simply proof. α −→ αβ −→ αβγδ %% β γδ Figure 4: Binary hash chain representing the Merkle proof for the α hash in the tree depicted in Figure 1. P(α) = {β, γδ} By aggregating multiple hashes into a Merkle tree and then publishing only the root, it is possible to anchor large volumes of data in a blockchain with a single transaction, which dramatically reduces costs and avoids the aforementioned bottlenecks.