NORTHWESTERN UNIVERSITY

Unchaining the Network Layer

A DISSERTATION

SUBMITTED TO THE GRADUATE SCHOOL

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

for the degree

DOCTOR OF PHILOSOPHY

Field of Computer Science

By

Uri Klarman

EVANSTON, ILLINOIS

March 2019 2

c Copyright by Uri Klarman 2019

All Rights Reserved 3

ABSTRACT

Unchaining the Blockchain Network Layer

Uri Klarman

Blockchains are an exciting new type of Peer-to-Peer (P2P) distributed systems, which enable parties to transact directly, and maintain the record of said interactions in a dis- tributed manner. A unique feature of is their ability to maintain a consensus without requiring knowledge on the number of participants, nor their identities, opening the door for cross-border, self-organized, decentralized, open ecosystems. The original blockchain, , aims to be a Global P2P Electronic Cash System, and to replace fiat money, banks, payment-processors, and other financial middlemen. Other blockchains, e.g., Ethereum, aim to remove middlemen from other types of interactions, such as replac- ing custodians with Smart Contracts, and to securely store credentials, identities, health records, and private information.

While blockchains have the potential to be transformative in many fields, their real- world usage is held back by practical limitations. First and foremost, for a blockchain to be used at a global scale, it must be capable of handling a high volume of transactions; for

Bitcoin to replace Visa, MasterCard, PayPal and other payment processors, it must be 4 capable of processing roughly 5, 000 transactions per second (TPS), the average number of

TPS these companies process today [26, 74]. To support online shopping, it must support peak demand, which for Alibaba stands at 325, 000 TPS [109]. In contrast to these significant requirements, Bitcoin can only process 3–4 TPS. Other significant limitations include the centralization and wastefulness of blockchain mining, the procedure which records transactions in the blockchain.

Research in the blockchain field had thus far focused on new cryptographic primi- tives and alternative blockchain protocols to address these real-world challenges, while the networking aspects were largely ignored. In this thesis, I propose a definition for the blockchain Network Layer, and provide evidence that the Network Layer is the bottleneck and root-cause for some of the most pressing challenges blockchains face today. I further propose new networking primitives and novel network utilization methods, and explore how they can be used to overcome said challenges, including scalability, mining central- ization, and mining wastefulness, as well as to utilize blockchains to decentralize existing knowledge silos. First, I provide the necessary background to understand the operation of blockchains, and a definition for the blockchain Network Layer. Then, I present an analysis which outlines how the Network Layer is the bottleneck for blockchain scalabil- ity, and suggest a new networking primitive, the Blockchain Distribution Network (BDN) to overcome both blockchain scalability and mining centralization. Lastly, I present novel networking methods which enable blockchains to be utilized in new fields, focusing on the decentralization of the search engines market, and mitigate mining wastefulness. 5

Thesis Committee

Aleksandar Kuzmanovic, Northwestern University, Committee Chair

Fabi´anBustamante, Northwestern University, Committee Member

Peter A. Dinda, Northwestern University, Committee Member

Cristina Nita-Rotaru, Northeastern University, Committee Member 6

Acknowledgements

First, I would like to thank my advisor, mentor, and partner, Aleksandar Kuzmanovic, for his continuous support of my research, and for his wise guidance. His focus on exciting ideas, his confidence in our ability to overcome any obstacle we encounter, and his genuine enthusiasm about our joint work, are the cornerstones for my approach towards science, research, and academia.

I would further like to express my gratitude to my co-advisor, Fabi´anBustamante, for supporting my research, for sharing his insights on conducting research across multiple domains and its implications, and for his valuable feedback and advice. My sincere thanks also go to the other members of my committee, Peter Dinda and Cristina Nita-Rotaru, for their valuable insight and candid feedback. Their insights and comments during my proposal had a significant effect on the shaping of my thesis, for which I am grateful.

I thank my lab-mates and fellow students at the Northwestern Networking Group:

Marcel Flores, Ning Xia, Marc Warrior, Qurat-Ul-Ann Akbar, Andrew Kahn, and Alexan- der Wenzel, for their camaraderie, technical assistance, and wise advice. It had been my pleasure to work alongside each and every one of you. Additional thanks go to the entire bloXroute Labs team, and to Soumya Basu in particular, for investing countless hours in the development and deployment of bloXroute, bridging the gap between clever ideas and solid reality. 7

I would like to thank my parents, brothers, in-laws, nephews and nieces, for contin- uously supporting me in this endeavor, and throughout my life. Finally, I would like to thank my wife Yaara, without whom I would have never set out on this exciting academic journey, nor would it had been half as exciting. 8

Table of Contents

ABSTRACT 3

Thesis Committee 5

Acknowledgements 6

List of Figures 12

Chapter 1. Introduction 16

1.1. The Blockchain Network Layer 17

1.2. Blockchain Networking Resources 18

1.3. Thesis Organization 18

Chapter 2. Thesis 20

Chapter 3. Blockchain Fundamentals and the Blockchain Network Layer 21

3.1. The Operation of Bitcoin 21

3.1.1. Bitcoin Users 21

3.1.2. Nodes, Miners and Validators 22

3.1.3. Block Mining 24

3.1.4. Blockchain Security 25

3.1.5. Forks 26

3.1.6. Mining Pools 28 9

3.1.7. Alternative Blockchain Designs 30

3.2. The Blockchain Stack and the Network Layer 31

3.2.1. Layer-1: Consensus Layer 31

3.2.2. Layer-2: Intermittent Consensus Layer 32

3.2.3. Layer-3: Inference Layer 33

3.2.4. Layer-0: Network Layer 34

Chapter 4. bloXroute: the First Blockchain Distribution Network (BDN) 35

4.1. Analysis of the Blockchain Scalability Problem 35

4.1.1. Scalability Constraints 36

4.1.2. Block Propagation Time Analysis 37

4.1.3. Forks, Security and Usability 39

4.1.4. Decentralization 41

4.1.5. Block Size and Inter-Block Time Interval 41

4.2. Related Work 42

4.2.1. Centralized Propagation Systems 42

4.2.2. Off-Chain Scaling Solutions 44

4.2.3. On-Chain Scaling Solutions 44

4.3. Blockchain Distribution Network (BDN): a New Networking Primitive 45

4.3.1. Trust Model 46

4.3.2. System Components 47

4.3.3. bloXroute Scalability Improvement 48

4.3.4. bloXroute’s Provable Neutrality 50

4.3.5. Adversary Behaviors, System Failure, and Safety Measures 55 10

4.4. bloXroute Empirical Evaluation 62

4.4.1. Real-World Performance 62

4.4.2. High TPS Performance 66

4.5. Mining Decentralization 70

Chapter 5. Webcoin: Decentralizing Search Engines with Blockchain Networking

Resources 73

5.1. From Bitcoin to Webcoin 77

5.1.1. Webcoin’s Goals 77

5.1.2. Webcoin’s Non-Goals 78

5.1.3. Proof-of-Work Wastefulness 79

5.2. The Webcoin Protocol 81

5.2.1. Webcoin’s Proof-of-Work 81

5.2.2. Index Collectors 83

5.3. Web Indexing and Index Validation 85

5.3.1. Statistical Index Validation 86

5.3.2. Statistical Validation Accuracy 87

5.4. Webcoin’s Security 88

5.4.1. Block Mining 90

5.4.2. Block Validation 94

5.4.3. Resource Aggregation and Security 95

5.4.4. Minimal Inter-Block Time 96

5.5. Webcoin Evaluation 97

5.5.1. Crawling Feasibility 97 11

5.5.2. Network Usage 100

5.5.3. Mining at Large Scales 103

5.5.4. Collectors’ Properties. 105

5.6. Related Work 105

5.7. Discussion 107

Chapter 6. Conclusion 111

References 113 12

List of Figures

3.1 The blockchain stack. The Network Layer (Layer-0) supports the

operation of the Consensus Layer (Layer-1), which encapsulates

blockchain functionalities. The Intermittent Consensus Layer (Layer-2)

extends these functionalities outside the consensus, while the Inference

Layer (Layer-3) adds new functionalities outside the consensus. 31

4.1 The components of the bloXroute system: the bloXroute BDN, and

the Peer Network nodes utilizing it. Each Peer Network node runs a

Gateway process as an intermediary between its blockchain application

and the bloXroute BDN. 47

4.2 Naive block propagation using bloXroute. (1) Blockchain node passes

block to local bloXroute Gateway. (2) Gateway transmits block to

bloXroute server. (3) bloXroute server streams block to all bloXroute

servers as first bytes arrive. (4) bloXroute servers stream block to all

Gateways as first bytes arrive. (5) Gateways validate block structure,

and pass block to blockchain nodes. 51

4.3 Encrypted block propagation using bloXroute. (1) Blockchain node

passes block to local bloXroute Gateway, which encrypts block. Then,

propagation follows path of naive propagation (solid line). (2) Gateways 13

form a P2P network, where each notifies its peer of encrypted blocks

received. p1 and p2 both notify psource of receiving encrypted block.

(3) Once psource learns its peers received the encrypted block, it sends

encryption key to its peers, to be propagated to all Gateways. (4) Upon

receiving the encryption key, each gateway decrypts the encrypted block

and pass it to the blockchain node it serves. 52

4.4 Indirect relay of encrypted block to bloXroute. (1) As in encrypted

block propagation, blockchain node passes block to local bloXroute

Gateway, to be encrypted. (2) Gateway transmits encrypted block to

a Gateway peer, to be relayed to a bloXroute server. Encrypted block

then propagates, followed by its encryption key, as in encrypted block

propagation (solid line). 53

4.5 The ratios by which bloXroute compresses the blocks mined on January

1st, 2019 on the Bitcoin (BTC), (BCH), and Ethereum

(ETH). 63

4.6 The average timeline for the average block to propagate in the

bloXroute testbed. Blocks requires roughly 1 to be propagated in a

provably-neutral fashion to the entire deployed in the

testbed. 68

4.7 CDF of the transactions per second (TPS) processed in blocks mined

during 20 trials of the bloXroute testbed. Each trial includes the mining

of 20 consecutive blocks. 69 14

5.1 Webcoin mining process: (left) (1) Miner crawls webpages along a

directed path, (2) miner indexes webpages, hashes the index, propagates

digest to peers, (3) collectors request and receive compressed indices

from miners. (right) (4) Miners wait until some miner becomes eligible

to mine a new block, (5) eligible miner propagates block and index,

accelerated by collectors, (6) each miner verifies the block, compares

index to past digest, and verifies a small subset of webpages. 80

5.2 The mining process of a new Webcoin block, performed every second. 91

5.3 CDF of the number of seconds between the mining of consecutive blocks.

Vertical line represents target at 600 seconds. 97

5.4 The partitioning of miners to quintiles (20% units) according to their

download bandwidth, and the percentage of blocks mined by nodes in

each quintile. 99

5.5 CDF of the number of seconds required for miners to crawl webpages

for indexing, per index size. 100

5.6 Network usage of a Webcoin miner deployed on a PlanetLab node,

during the mining of two consecutive blocks utilizing an index size

of 1000 webpages. Dashed lines mark the periods in which mining is

prohibited (between 1 and 7). Numbers mark the events of the mining

process: (1) block is mined by another miner, (2) block and index

download from peers, (3) block and index propagation on the upload

direction, index validation on the download direction, (4) Web crawling 15 and indexing, (5) block and index propagation completion, (6) Web crawling completion, (7) block mining begins. 101 16

CHAPTER 1

Introduction

The Bitcoin white paper [75], published in October 2008, describes a new type of

Peer-to-Peer (P2P) distributed system – the blockchain. In a blockchain ecosystem, par- ticipants may transact among themselves, and their transactions are periodically batched into blocks which are recorded by all peers. This results in chain of blocks, the blockchain, which defines a consensus over which transactions took place. To add a new block of trans- actions to the blockchain, the Bitcoin protocol requires a Proof-of-Work (PoW), i.e., a proof of performing a difficult computational-intensive task. This requirement prevents sybil attacks from influencing the consensus, and alleviate the need for peers to have knowledge of the number of participants or their identities. In the decade since Bitcoin’s inception, many blockchains were launched in an attempt to improve upon, repurpose, substitute, or complement Bitcoin. While some blockchains are almost identical to Bit- coin, others had experimented with wide range of changes to the protocol, including altering its PoW, replacing its PoW with Proof-of-Stake (PoS) and other alternatives, al- tering the time interval between blocks and the number of transactions each can contain, running complex scripts, obfuscating transactions, and others.

Despite Bitcoin becoming a household name, and despite a market cap of hundreds of billions of dollars for blockchain-based , blockchain usage remains negligi- ble when compared to major payment processors such as MasterCard, Visa, and PayPal, 17 or to online shopping giants such as Alibaba and Amazon. While the credit card com- panies and PayPal combined handle roughly 5, 000 transactions per second (TPS) on average, and Alibaba had processed over 325, 000 TPS at peak demand, Bitcoin can only process 3–4 TPS. This limitation, which is common to all blockchains, is often referred to as the “scalability problem”, and is considered amongst the most pressing and difficult challenges blockchains face. Other significant challenges include mining centralization, i.e., the coalescence of miners into mining pools which undermine blockchains’ security, blockchain utilization, and the wasteful nature of PoW mining.

1.1. The Blockchain Network Layer

Possible solutions for the scalability problem and other pressing challenges had been the source of heated discussions among Bitcoin and blockchain enthusiasts, researchers, and developers. However, due to to cyber-security and cryptography origins of this com- munity, these discussions had mostly focused on novel cryptographic primitives, new protocols, and alternative consensus schemes. The networking aspects of blockchains had largely been ignored, and often considered to be a blackbox infrastructure which is ag- nostic to the operation of blockchains. Indeed, it is not uncommon to see all networking aspects abstracted away in these discussions, marked only as edges in graph representa- tions of blockchain systems.

In this dissertation I propose a definition for the blockchain Network Layer, and the functionalities which fall under it. I further show that the Network Layer is the root- cause of the scalability problem, and in contrast to common belief, it is the Network 18

Layer which holds the key for some of the most pressing challenges blockchains face, including scalability and mining centralization.

1.2. Blockchain Networking Resources

Most of the networking resources consumed by Blockchain systems are utilized at the blockchain Network Layer. However, it is possible for blockchains to utilize their network- ing resources in other fashions to achieve additional goals. In this dissertation, I present how novel network utilization techniques enable blockchains to consume networking re- sources, rather than computational resources, which can be leveraged to decentralize critical networking tasks and to mitigate mining wastefulness, focusing on Web search engines.

1.3. Thesis Organization

The remainder of this dissertation is structured as follows: First, in Chapter 2 I present the thesis statement, which captures the overarching scheme of the individual chapters incorporated in this dissertation. Then, in Chapter 3, I provide the necessary background to understand the operation of blockchains and some of the the major challenges they face, and provide a definition for the blockchain Network Layer. In Chapter 4, I present an analysis of the scalability problem, and demonstrate that its root-cause is a bottleneck at the blockchain Network Layer. I further outline a new networking primitive, the

Blockchain Distribution Network (BDN), which enables blockchains to overcome both the scalability problem and blockchain mining centralization, and present empirical results from the deployment of bloXroute, the first Blockchain Distribution Network. Next, in

Chapter 5 I present a novel networking utilization method to enable the application of 19 blockchains in new fields, focusing on the decentralization of the search engine industry, and to mitigate mining wastefulness. Finally, in Chapter 6 I summarize my contributions and conclude. 20

CHAPTER 2

Thesis

The blockchain Network Layer and the networking resources utilized by blockchains hold the key to some of the most pressing challenges blockchains face, including scalability, mining centralization, utility, and wastefulness. 21

CHAPTER 3

Blockchain Fundamentals and the Blockchain Network Layer

In this chapter I provide an overview of the operation of Bitcoin to demonstrate the fundamental principals of blockchains, and the necessary background to understand the importance of the blockchain Network layer. I then continue to outline the blockchain stack, and provide a definition for the blockchain Network Layer and the functionalities it includes.

3.1. The Operation of Bitcoin

3.1.1. Bitcoin Users

Bitcoin [75, 10] is the first blockchain system, and the first to gain signif- icant popularity and to become a household name. While some cryptocurrencies predate

Bitcoin [9, 29], none had gained any significant traction outside the cypherpunk com- munity [50]. At its core, Bitcoin is a Peer-to-Peer (P2P) Electronic Cash System which allows its users to hold a balance of , and to transact Bitcoins among themselves.

Said transactions are batched into blocks, which are recorded by all peers, and the re- sulting chain of blocks, i.e., the blockchain, defines a consensus over which transactions took place. While the Bitcoin white paper itself describes the blockchain as a distributed timestamp server, it is oftentimes referred to as a distributed , as its main function is to record financial transactions and balances. 22

To understand how Bitcoin transactions are created and the blockchain maintained, assume a user, Alice, wishes to buy an item from a different user, Bob, and to pay in

Bitcoin. Each Bitcoin user controls a wallet, which is a simple private and public key pair. The public key (or addresses derived from the public key) are used to maintain user balances, while the private key is used to prove ownership over said balances. To pay Bob,

Alice locally creates a new transaction tA→B, which specifies an amount of Bitcoins which is to be passed from her address to Bob’s address, and then signs it using her private key.

Alice then sends tA→B to the Bitcoin nodes she is connected to, to be propagated to the entire Bitcoin P2P network.

It is worth noting that Alice and Bob are Bitcoin users, and therefore in control of their Bitcoin wallets, and that any Bitcoin user may control any number of wallets, each of which controlling any number of public keys and addresses. However, Alice and Bob may or may not be running a Bitcoin node.

3.1.2. Nodes, Miners and Validators

The Bitcoin white paper refers to participants in the P2P network which maintain a local copy of the blockchain and attempt to produce new blocks as “nodes”. However, later work had led to a more fine-grained, yet somewhat ambiguous, terminology to describe the different kinds of blockchain participants. First, a comparison made in the Bitcoin white paper between the creation of a new block and the mining of gold had led to the use of the term mining to describe the creation of new blocks using Proof-of-Work (PoW), and the term miner to describe a node engaged in mining. Second, some [4] consider the term

“node” to also describe participants which maintain a local copy of the blockchain without 23 attempting to produce new blocks. These participants are often referred to as non-mining nodes. Lastly, some blockchains had experimented with the use of Proof-of-Stake (PoS) and other mechanisms to replace Bitcoin’s use of PoW, where the producers of new blocks are referred to as validators [114, 113] or block-producers (BPs)[24]. Throughout this thesis I use the term “miner” to describe participants which attempt to produce new blocks by any means, and the term “non-mining node” to describe participants which do not attempt to produce new blocks. Lastly, I use the general term “nodes” to describe participants of all kinds, including both miners and non-mining nodes.

Every Bitcoin node to receive tA→B would validate that:

(1) The balance of Alice’s addresses specified in tA→B is greater or equal to the

amount spent.

(2) tA→B contains a signature which can only be created with knowledge of Alice’s

private key.

If these two conditions are met, tA→B is deemed valid, and any node to receive it will propagate it to its peers. While all Bitcoin nodes would receive and propagate transac- tions, only miners attempt to aggregate the transactions they receive into a new block, to be appended to the blockchain. It is only after transactions are included in the blockchain that they are considered to have taken place, while transactions which still await to be included are not.

There are two monetary incentives for miners to engage in mining. First, each block contains a unique transaction, called the transaction, which passes some amount of Bitcoins to its miner’s address. Thus, miners do not attempt to produce a new block from the kindness of their hearts. Rather, they compete, as each miner attempts to 24 mine the next block and earn its reward. The blocks’ coinbase transactions are also the source for the supply of Bitcoins, as it creates Bitcoins “out of thin air”, and the source of the Bitcoins spent in each transaction can be traced back to a previous transaction, and eventually to the coinbase transactions which created them. The amount of Bitcoins produced in each block decreases exponentially, i.e., the amount is halved every 210, 000 blocks, limiting the total supply to 21 million Bitcoins.

The block reward provides an incentive for miners to attempt to produce a new block, however, it does not provide them an incentive to include transactions in the blocks they produce. To address this matter, the Bitcoin protocol includes a second monetary incentive – the mining fee. Each Bitcoin transaction may carry a fee to whichever miner who successfully include it in a block, incentivizing miners to include the transaction in the block they attempt to produce. Since the Bitcoin protocol limits the size of each block to 1 MB, it is often the case that users are required to include significant mining fees to incentivize miners to choose their transactions over others.

3.1.3. Block Mining

A Bitcoin block consists of two parts: an 80 bytes header, and a list of the transactions included in the block. To mine a new block, a miner hashes all the transactions to be included in the block, using a double SHA-256 hashing function to create a Merkle Tree [4].

The Merkle root is then included in the header, as well as a timestamp, the hashing the previous block, and an arbitrary binary value, known as a nonce. For a new block to be successfully mined, the hashing of these fields must yield a value which is very small.

Thus, miners exhaustively try different nonce values, in an attempt to find one which 25 produces a small enough value. The exact target value is also included in the header and is also included in the hashing process, and its value changes over time in an attempt to maintain an average of one block every 10 minutes, based on blocks’ timestamps. In addition to (i) the Merkle root, (ii) the time stamp, (iii) the hash of the previous block,

(iv) the nonce, and (v) the target, the header also includes (vi) a version field, and (vii) a transactions count field.

Once a new block is found, it is propagated to the entire Bitcoin network, similarly to transactions, i.e., it is validated by each node prior to its propagation. It is safe for the successful miner to propagate its block, including the nonce, and no adversary would be able to steal its block reward by altering the coinbase transaction, since any change would alter the result of the hashing, thus almost certainly invalidating the altered block.

3.1.4. Blockchain Security

The most critical aspect of the blockchain security is that the hashing of each block includes the value yielded from hashing the block preceding it. The immediate result of this inclusion is that any attacker attempting to alter the history of transactions, i.e., the inclusion, exclusion, or alteration of a transaction in some previous block, will change the value its hashing yields, which in turn will affect the hashing value of all consecutive blocks, and will almost certainly invalidate each and every one of them. This critical aspect, combined with the fact that each node maintains its own copy of the blockchain, and will only accept the longest1 blockchain when presented with multiple copies, has significant ramifications on the resources required to successfully launch such an attack.

1To be precise, the criteria is the amount of hash power a blockchain required to produce, rather then the number of blocks it contains 26

For the attacker to succeed in altering the blockchain, it must sequentially find a new nonce for every block it had invalidated, which it must produce at a higher rate than the rate at which all other miners extend the original unaltered blockchain. Thus, for such an attack to succeed, and for the other Bitcoin nodes to accept the altered copy of the blockchain, the attacker must control the majority of hashing power in the Bitcoin system [36]. While it had been shown that entities controlling less than 50% of the hashing power can gain an unfair advantage [36], and can eventually eliminate smaller miners, and gain majority of hashing power, it is this unique primitive which differentiate

Bitcoin and blockchain systems from previous decentralized systems.

It is worth noting that even if such an attack, commonly known as a “51% attack”, is successfully launched, the attacker remains incapable of adjusting user balances arbi- trarily, as any transfer of funds requires a transaction signed by their owner. The main concern related to the 51% attack is for an attacker to make a large payment, receive goods in return, and then revert said payment. Unlike the traditional banking system, it would be impossible to seize or freeze the attacker funds in order to compensate the merchant.

3.1.5. Forks

A unique feature of Bitcoin, and of blockchain systems in general, is their inherent ability to overcome inconsistent views of the transaction history in a distributed manner, by defining the blockchain which required the most computation to produce as the “true” blockchain. To demonstrate this ability, consider two Bitcoin miners which happen to successfully mine a new block at approximately the same time. The two blocks differ 27 from each other as they will contain different coinbase transactions, use different nonces, and it is very likely for them to contain slightly different transactions. Once the two blocks are mined, they are propagated in parallel to the entire Bitcoin network, resulting in some portion of the network considering one history of transactions to take place, while others consider a slightly different version of transaction history to take place. Such a situation, where two or more equally valid blockchain versions coexist is called a . While a fork remains unresolved, there exists some ambiguity regarding which transactions had taken place.

Forks are resolved once a new block is mined, as it causes one prong of the fork to become longer than the other prongs, which in turn incentivizes miners to abandon the shorter prongs and attempt to mine over the longest blockchain, as any rewards gained on shorter prongs are likely to be orphaned, i.e., discarded. Thus the system converges to the longest blockchain due to the selfish interests of the miners. It is possible, yet rare, for blocks to be mined over two prongs of a fork at approximately the same time, which keeps the fork unresolved, until eventually one becomes longer than the other.

Due to the possibility of forks, it is possible for a transaction to be included in a block, yet not to be included in the blockchain if said block is orphaned. However, the probability for a block to be orphaned exponentially decreases as more blocks are mined on top of it, in direct relation to the probability for two blocks to be mined on two prongs of a fork at approximately the same time. Thus, transactions are considered more secure as additional blocks are mined on top of the blocks containing them, commonly referred to as the number of confirmations a transaction has. The two terms, fork and orphan, are often used interchangeably, despite that the latter is the result of the former. 28

3.1.6. Mining Pools

Bitcoin’s PoW is statistical in nature, as each miner tries arbitrary nonce values to produce pseudo-random results, in the hope of producing a valid block. This Poisson process produces new blocks such that the time between blocks follows an exponential distribution with a mean of 10 minutes, causing significant variation in miners earnings [88]. Indeed, it is likely for a miner to mine for months and years before successfully mining a new block. While the expected reward for all miners, large and small, is proportional to their share of the hash rate, the relative variance of a miner reward exacerbates the smaller the portion of hash rate the miner controls. Simply put, smaller miners receive proportionally smaller rewards, yet experience disproportional higher relative variance.

Since mining operations incur significant costs, consisting mostly of hardware and electricity costs, miners had found it difficult to operate without incurring debts due to the high variance in mining rewards. To overcome this issue, a new type of entities had emerged, the mining pools. Mining pools coordinate the mining of numerous miners, collect their block rewards, and distribute them among the pool members based on the amount of hash power each miner had contributed. To achieve fair reward distribution, i.e., each miner is compensated based on the hash power it contributes, and no successful miner can keep the reward to itself, mining pools take advantage of Bitcoin’s PoW.

The main principle in the operation of mining pools is that block rewards are sent directly to the pool’s address, and pool members are compensated based on proofs of such blocks they attempted to mine. One common practice for pools is to put an easier target, e.g., 10% of the difficulty of the actual target, and distribute each block reward among the miners who present it with proofs of mining a block which (i) its hashing is smaller than 29 said target, and (ii) pass its reward to the pools address. These proofs, often referred to as shares, allow the pool administrator to divide the reward among the miners based on the hash power they have invested, since on average their proportional hash rate will translate to the number of shares each mines. The use of shares also prevents miners from sending their rewards to their own address, since a miner can either mine blocks which pass their rewards to the pool’s address, and thus produce shares, or mine blocks which pass their rewards to the miner’s address, which would not produce shares.

In an attempt to increase efficiency and reduce complexity, a protocol for pools named

Stratum [87] was developed. Stratum allows pool administrators to construct a new block using a local Bitcoin node, and to delegate to miners only the hashing operation, based on parameters provided by the pool administrator. Thus, miners no longer run their own

Bitcoin node in majority of cases, and instead run a Stratum process, which receives its instructions from their administrator, who runs both a Bitcoin node and a

Stratum manager.

The emergence of mining pools had allowed miners to receive their proportional share of the blocks reward, minus the fees pool administrators collect, while considerably de- creasing their reward variance. However, their prevalence has significant ramifications on the security of Bitcoin, since Bitcoin’s security model depends on attackers’ inability to control majority of the hashing power in the system. The coalescing of miners into mining pools had concentrated the hashing power at the hands of very few actors, who might collude or coerced to launch a 51% attack, to ignore the product of other miners, and even to ban specific wallets by rejecting any transaction which attempts to transact with them. It is worth noting that it is not strictly required for miners to delegate all decisions 30 regarding the construction of a new block to the mining pool administrator. The main value proposition pools offer miners it the reduced variance in their rewards, however it is technically simpler for a miner to delegate decisions regarding block construction to the pool administrator, and receive instructions on the hashing it should perform, than to share the block reward (and reduce its variance) while maintaining control over the construction of a block.

3.1.7. Alternative Blockchain Designs

While the above describes Bitcoin in particular, and alternative blockchain designs are continuously being explored, the underlying principles are common to all blockchain sys- tems. While the other blockchain designs might consider different interaction types among users, a different sybil-resistant mechanism to define the eligibility to extend the blockchain, and even allow blocks to be appended to create a Directed Acyclic

Graph [93, 92] rather than a chain of blocks, they are all derived from Bitcoin’s gen- eral design. 31

Figure 3.1. The blockchain stack. The Network Layer (Layer-0) sup- ports the operation of the Consensus Layer (Layer-1), which encapsulates blockchain functionalities. The Intermittent Consensus Layer (Layer-2) ex- tends these functionalities outside the consensus, while the Inference Layer (Layer-3) adds new functionalities outside the consensus.

3.2. The Blockchain Stack and the Network Layer

Blockchain systems are P2P systems designed to maintain a consensus over the inter- actions that took place among their users or participants, and maintain said consensus in a distributed manner. While blockchain protocols differ in the interactions they record, in the methods they employ to maintain a consensus, and in the messages their nodes exchange, they share a common architecture, depicted in 3.1. Below I provide an outline of the blockchain stack, which consists of 4 main layers:

(0) Network Layer

(1) Consensus Layer

(2) Intermittent Consensus Layer

(3) Inference Layer

3.2.1. Layer-1: Consensus Layer

The fashion in which blockchain nodes maintain their consensus is encapsulated in the

Consensus Layer, often referred to as Layer-1. A significant amount of research had 32 focused on studying and exploring Layer-1 [93, 92, 5, 4, 60, 46, 21, 8, 59, 49], as it consists of the functionalities and core principles which differentiate blockchains from other distributed systems. Theses functionalities include the definition of the messages nodes exchange, the data each node stores locally, and the state-machine logic each node follows to maintain the consensus.

3.2.2. Layer-2: Intermittent Consensus Layer

The Intermittent Consensus Layer (Layer-2) extends the functionality of the Consensus

Layer beyond the system’s consensus, by avoiding the continuous recording of interactions in the blockchain. These interactions, often referred to as “off-chain transactions”, are recorded in the blockchain only intermittently, often only once in the setup phase and once they are concluded. Thus, the Intermittent Consensus Layer provides blockchain users or participants with the same type of interactions as the Consensus Layer, without utilizing the blockchain to record them.

One example for interactions at the Intermittent Consensus Layer is the use of state channels [83, 21, 40] to maintain a frequently-changing balance between two parties.

Assuming two companies which transact frequently, with payments made in both direc- tions, said companies are not required to make frequent transactions using the blockchain.

Instead, these companies can each transfer a balance to a unique joint account, a state channel, which is recorded in the blockchain, such that each party is entitled to receive its original balance once their joint account is closed. Afterwards, the parties can privately send money to each other by using their cryptographic keys to update the state of their joint account, without revealing the updated state to the rest of the P2P network, and 33 without recording these updates in the blockchain. The balance of the state channel thus changes over time, based on the transactions between the companies. When any of the parties wishes to close the state channel, it can submit the most updated state to the rest of the network to be recorded in the blockchain, closing the state channel and passing the funds back to the companies based on their most updated balance. Note that state channels enable the same type of interactions as the Consensus Layer, without recording them in the blockchain, however it also introduces new complexities. For example, safety measures are required to prevent a dishonest party from closing of a state channel using an earlier version of the state channel, thus omitting recent payments.

3.2.3. Layer-3: Inference Layer

The Inference Layer (Layer-3) provides new types of interaction and new functionalities, utilizing the consensus achieved in Layer-1 and possibly in Layer-2. While these inter- actions and functionalities utilize the blockchain’s consensus, they are not part of the consensus themselves. For example, a Twitter-like [61] micro-blogging system can encode each message in a blockchain transaction, thus making it very difficult for any entity to delete or censor content. It is also possible for such a system to implement editing and deletion functionalities, using an append-only protocol to define how transactions should be inferred [48]. However, only the blockchain transactions themselves and the data they contain are governed by the Consensus Layer (and possibly the Intermittent Consensus

Layer), while the protocol used to infer the data stored in the blockchain transactions is not governed by the blockchain’s consensus [23]. The inference layer may even support 34 distributed systems which maintain a consensus of their own [82, 68], yet the consen- sus of such systems is not inherently maintained by the blockchain’s consensus, which is oblivious and agnostic to their rules.

3.2.4. Layer-0: Network Layer

While the Consensus layer (Layer-1) defines the operation of each node individually, the interactions among nodes, and the messages they exchange, the Network Layer (Layer-

0) encapsulates the network functionalities and the measures used for the exchange of messages among nodes, required by Layer-1, utilizing different technologies across the network stack. These functionalities include node discovery, data routing, the encoding and transmission of data, and error correction, as well as measurements of the network performance. When compared with the OSI model [115], the Network Layer (Layer-0) encapsulates the functionalities of the 4 bottom layers (Physical, Data Link, Network, and Transport layers), while the functionalities of the top 3 layers (Session, Presentation, and Application layers) are provided by the blockchain Consensus Layer (Layer-1), the

Intermittent Consensus Layer (Layer-2), and the Inference Layer (Layer-3).

The main functionality provided by the blockchain Network Layer is the unicast ex- change of messages among nodes. However, it also includes additional functionalities, such as multicast and broadcast dissemination of information, the caching of data to improve performance or functionality, and providing meta-data regarding disseminated information. Other functionalities include succinct aggregation of messages, providing statistics and/or measurements of the disseminated data, including the performance of the Network Layer itself. 35

CHAPTER 4

bloXroute: the First Blockchain Distribution Network (BDN)

In this chapter I analyze the blockchain “Scalability Problem”, arguably the greatest challenge blockchains face today, using the Bitcoin network as an example, and show how the bottleneck at the Network Layer is the root-cause for the Scalability Problem. I then continue to describe a new networking primitive, the Blockchain Distribution Net- work (BDN), which resolves this bottleneck, and evaluate the performance bloXroute, the

first BDN. Lastly, I explore how bloXroute alleviates the incentives for miners to coa- lesce into mining pools, and relieve the centralization pressure which negatively affects blockchain systems’ centralization and security.

4.1. Analysis of the Blockchain Scalability Problem

In Bitcoin, as in other cryptocurrencies and blockchain systems, system throughput is measured by the number of transactions per second (TPS) it supports. Since the

Bitcoin network produces one 1 MB block once every 10 minutes on average, and since the average Bitcoin transaction size is 544 bytes [12], it should append to the blockchain roughly 1927 transactions every 10 minutes on average, or 3.21 TPS. In reality, due to some inefficiencies, the average block contains only 1764 transactions, which translates to 2.94 TPS. In comparison, major payment processors such as MasterCard, Visa, and

PayPal, handle an average of roughly 5, 000 TPS (combined), and online shopping giants such as Alibaba and Amazon may require over 325, 000 TPS at peak demand time [109]. 36

Moreover, cryptocurrencies aim to enable machine-to-machine low fee transactions of very small sums (micropayments), which are expected to require a continuous throughput which is orders of magnitude higher than Visa, MasterCard, and PayPal’s [97].

For any given transaction size, a blockchain system throughput directly depends on 2 parameters: the block size (B), i.e., the number of bytes which can contain transactions in each block, and the inter-block time interval (tB), i.e., the average time required for the system to mine a new block. As noted above, in Bitcoin B = 1 MB and tB = 600 seconds, which allows it to process roughly 3 TPS. To improve Bitcoin’s throughput, it is possible to increase B to include more transactions, and to reduce tB, so that blocks are mined at a higher rate. However, as I show below, increasing these parameters to increase throughput has significant ramifications.

4.1.1. Scalability Constraints

In Bitcoin, it has been shown that a single consumer-grade processor can support thou- sands of TPS, while disk I/O can support hundreds of thousands of TPS [28]. In contrast, the capacity of the P2P propagation model is orders of magnitude more restricted, and is insufficient for wide real-world adoption.

Consider the state of the Bitcoin network in the year 2017. The network consists of approximately 9, 000 nodes (N = 9000) [17], the majority of which are connected to 8–12 of their peers [73], with a median latency between peers of approximately 110

th milliseconds [95]. At the 50 percentile, nodes upload rate is 56 Mbps (bw50th = 56 Mbps), while the 10th and 1st percentiles have a rate of 3.96 Mbps and 438 Kbps, respectively.

Thus, the upload rate at the 50th, 10th and 1st percentile supports 13, 000, 943 and 100 37

TPS, respectively. It is worth noting that global bandwidth measurements [94] show that download rates exceed upload rates by a factor of 1.85–5.81, with the exception of less-constrained regions, where the average bandwidth exceeds 100 Mbps.

It is evident that the bandwidth of individual Bitcoin nodes supports increasing the system throughput by orders of magnitude, and the latency among peers is not abnormally high as to pose a barrier. However, while individual nodes can easily support higher TPS, it is the distributed propagation which Bitcoin employs, also used in other blockchain systems, which significantly limits the system throughput.

Traditional P2P systems, such as BitTorrent [86], have been shown to quickly prop- agate data among peers. However, Bitcoin and other blockchain systems differ from such traditional P2P networks by requiring the continuous delivery of all blocks to all peers. They further differ by the different incentives they provide to their participants.

Specifically, distributed denial of service (DDoS) attacks are prevalent in blockchain sys- tems [53, 101], and are used to gain advantages in mining, voting, and other business- and protocol-related activities. To prevent malicious nodes from flooding the network with invalid blocks, nodes use a store-and-forward propagation model, where each node downloads the full block and verifies it prior to propagating it to its peers. This model allows nodes to identify any node which propagates invalid blocks as malicious, and limits the effect of such attacks to the nodes which are directly attacked.

4.1.2. Block Propagation Time Analysis

In order for Bitcoin to function as a decentralized system, it must allow nodes to receive blocks at a higher rate than the rate at which blocks are being produced. Indeed, if blocks 38 are produced at a higher rate than a node is capable of receiving them, then said node cannot keep track of balances stored in the blockchain, cannot determine whether or not transactions and blocks are valid, and is in effect excluded from the Bitcoin network. The block propagation time to the majority of the network (t50th ) does not depend solely on a receiving node bandwidth, rather, it depends on network topology, the bandwidth of all nodes, and the manner in which blocks propagate.

The time required for a block to propagate through the system, and how it is affected by the block size (B), can be roughly approximated based on the number of nodes (N) and their median bandwidth (bw50th ). For the median Bitcoin node, the time required to transmit a single 1 MB block to a single peer (tpeer) is roughly

B 1MB tpeer = = = 0.143sec bw50th 56Mbps

Assuming 8 peers, the average Bitcoin node will require approximately 8tpeer to propagate a block to its peers, regardless whether if done sequentially or in parallel. However, sequential propagation allows the node’s first peer to propagate the received block after tpeer had passed, while parallel propagation will only allow peers to propagate the block after 8tpeer have passed. Thus, to hasten block propagation when bandwidth is limited, nodes would ideally propagate blocks to their peers sequentially, rather than in parallel.

Using sequential propagation, a newly-mined block is known only to a single node, i.e., its miner, at time t = 0, to two nodes, the miner and its first peer, at time t = tpeer, to 4 nodes at time t = 2tpeer, and to the majority of the network at time

t50th = blog2(N)ctpeer = 13tpeer = 1.86sec 39

While this approximation does not account for network congestion, download bandwidth, messages exchange overhead, latency, packet loss, processing delay, the arbitrary topology of the nodes, and bandwidth consumed for transactions propagation, it does provide insights regarding block propagation time (t50th ).

Note that while the network size (N) effect over the block propagation time (t50th ) is logarithmic, the block size’s (B) effect is linear. For example, increasing the system’s

TPS by a factor of 10 by increasing the block size to B = 10 MB would increase the time required for the median node to transmit a block to a single peer by the same factor, to

10 MB tpeer = 56 Mbps = 1.43 seconds. This, in turn, would increase the block propagation time to the majority of the network by the same factor to t50th = 13tpeer = 18.6 seconds. The linear effect of block size (B) on block propagation time (t50th ) was also empirically found in previous studies [28, 31].

4.1.3. Forks, Security and Usability

The most crucial effect of the block propagation time is on the possibility for transactions to be undone, i.e., removed from the blockchain. A transaction can be undone if a fork occurs and the block which contains it is orphaned. Broadly speaking, a fork occurs when a miner mines a new block on top of a previous block, rather then on top of the most recent block. Since the blockchain incentive system incentivizes mining on top of the most recent block, forks occur because miners have not yet received the most recent block. The block propagation time, i.e., the time required for a new block to propagate throughout the system, therefore defines the window of opportunity in which forks may occur. The longer the propagation time, the higher the probability for a fork to occur, which can be 40 approximated based on the exponential distribution of mining using [69]:

−block propagation time P (fork) = 1 − e inter−block time interval

Consider Bitcoin’s mining, which follows the exponential distribution with a mean of 600 seconds (tB = 600), mining a new block every 10 minutes on average. Further consider the time required for a block to reach 90% of the network (t90th ) to be the block propagation time. The probability for a fork to occur therefore approximates:

−t 90th P (fork|tB = 600) = 1 − e 600

Based on the above, the probability for a fork to occur is P (fork) = 1.915% for a propagation time of t90th = 11.6 seconds, which was the average propagation time observed in March, 2017 [16]. Due to the non-negligible probability for a fork to occur, it is considered best practice to wait for several blocks, e.g., 6, to be mined on top of a transaction before deeming it secure, and wait longer times for larger transactions. For such a transaction to be undone, a fork must not be resolved for 6 consecutive blocks, which has a probability of: P (6 blocks fork) = P (fork)6 ≈ 10−10.

An attempt to increase the block size (B) by a factor of 10, which would increase system capacity to ∼ 30 TPS, would increase the propagation time to approximately t90th = 116 seconds. This in turn would increase the probability for a fork to occur to P (fork) = 17.58%, which is unacceptable for real-world usability, and increase the probability for a 6-block fork to P (6 blocks fork) = P (fork)6 ≈ 10−5.

If the the block size (B) were to be increased by a factor of 100, which remains orders of magnitude smaller than current requirements for a global-scale payment processor, the 41 block propagation is expected to surpass the average time between blocks, causing forks to occur for the majority of blocks. In such a scenario, forks are no longer eventually resolved to a single blockchain. Instead, forks lead to forks-of-forks, and to forks-of-forks- of-forks, causing the blockchain to unravel to more and more forks, preventing nodes from knowing which fork to follow, and breaking the blockchain’s consensus over which transactions took place. It is this phenomena which prevents blockchains from scaling, and as outlined above, it is the Network Layer task of timely block propagation which is the root-cause for the Scalability Problem.

4.1.4. Decentralization

The block propagation time, which is directly affected by the block size, also affects the ability of nodes to participate in the Bitcoin network, as nodes must be capable of receiving blocks at a higher rate than they are produced. Failing to achieve this, nodes cannot track the balances stored in the blockchain, and thus cannot determine the validity of transactions and blocks, and are in effect excluded from the Bitcoin Network. To allow

90% of nodes to remain in the network, the propagation time to 90% of the network must be smaller than the inter-block time interval:

t90th < tB

4.1.5. Block Size and Inter-Block Time Interval

The positive and negative effects of increasing block size (B), i.e., increased system throughput, reduced blockchain security, and node exclusion, are symmetric to the ef- fects of reducing the average time between blocks (tB) by the same factor. For example, 42 doubling the block size (B) would increase the system throughout by a factor of 2, and the same is achieved by halving the inter-block time interval (tB). Similarly, doubling the block size will approximately double the block propagation time (t90th ), which in turn will double the probability of forks, while halving the inter-block time interval (tB) will have the same effect. Lastly, to include 90% of the nodes in the network, the block propagation

th time at the 90 percentile must be smaller than the inter-block time interval (t90th < tB).

Doubling the block size (B) would roughly double the block propagation time (t90th ), which will have the same node exclusion effect as halving tB.

4.2. Related Work

While the Scalability Problem remains unresolved, it had been addressed by existing work, as was the matter of block propagation time.

4.2.1. Centralized Propagation Systems

While bloXroute is the first propagation system that addresses the blockchain scaling problem without a centralized trusted intermediary, block propagation systems do exist in Bitcoin. In particular, in order to minimize block propagation times, and to put smaller miners on equal terms with larger mining farms, centralized Bitcoin relay networks were deployed.

Bitcoin Fast Relay Network / FIBRE. The first relay network to be deployed, the Bitcoin Fast Relay Network (BFRN) relayed blocks using multiple gateways around the globe to reduce block propagation time for miners. BFRN focused on utilizing low- latency connections and block compression to reduce the block propagation time. BFRN 43 was later replaced by FIBRE, which uses a similar architecture while utilizing fiber-optic wires and forward error correction (FEC) to further reduce latency and packet error rate, aiming to minimize the number of RTTs required to propagate a block.

The Falcon Network. Falcon was deployed after BFRN deployment and prior to

FIBRE’s deployment, and aims to reduce block propagation time by using cut-through routing [55], where relay-nodes relay the first bytes of an inbound block as soon as they arrive rather than wait for the entire block to arrive.

Both Falcon and FIBRE have greatly reduced block propagation times and block orphan rates in the Bitcoin network, to the level at which it is no longer possible to measure the real-world performance of Bitcoin’s P2P propagation. However, it is important to note that neither was designed, nor is suitable, to scale Bitcoin. Bitcoin cannot rely on these relay networks to achieve higher throughput, since the use of a relay network to scale places the control over which transactions are included in the blockchain, and which miners may participate, in the hands of its operator. For example, the relay network operator may choose (or coerced) to propagate blocks only from one group of miners, and reject all others, or to propagate blocks only to one group of miners, and not to others. Worse still, it might reject all blocks which contain transactions involving a specific address [100], effectively banning its owner from using it. As was established previously, the P2P propagation is insufficient to propagate blocks at scale, causing any blocks which are rejected by the relay network to be orphaned by the blocks which the relay network does propagate, with a high likelihood. Thus, the use of a centralized relay network to scale a blockchain significantly alters the blockchain security model, as a party which does not control the majority of hash power is granted control over the 44 transactions recorded in the blockchain. bloXroute, the first BDN, addresses this issue by being inherently ignorant of blocks’ content, origins, and their receivers, and by making itself auditable and replaceable by the global Peer Network it serves.

4.2.2. Off-Chain Scaling Solutions

An alternative approach, using off-chain transactions, aim to reduce some of the load from the blockchain, by recording transactions only intermittently. Generally speaking, an off-chain scaling solution will open up a payment channel between two parties, have the parties exchange funds while keeping track of intermediate balances, and then post a settlement transaction on the blockchain. These solutions include the Lightning Net- work [21], TeeChan [66], and more.

These solutions are promising, however they also introduce significant complexities, and had yet to achieve production maturity despite the years of development invested in them by numerous teams. It is important to note that theses off-chain solutions are complementary to bloXroute’s value proposition. If the underlying blockchain can support

1, 000 times the number of transactions as before thanks to bloXroute, and if off-chain transactions increase the throughput by another factor of 1, 000, then that blockchain’s throughput has increased by 6 orders of magnitude.

4.2.3. On-Chain Scaling Solutions

On-chain scaling solutions usually involve modifying the consensus protocol in some way to achieve higher throughput. One such approach, known as “sharding”, splits the blockchain into several smaller “shards”, which are maintained and interleaved in a fashion that aims 45 to keep blockchain’s original security properties while only requiring a full node to track one shard instead of the full blockchain.

Other approaches, such as Bitcoin-NG [35], suggest to replace blocks by a stream of transactions, or forgo them altogether, while still other systems aim for nodes to place trust in specific nodes, and to assure their honest behavior through the ability to re- place them. There are also newer consensus protocols based on Proof-of-Stake, such as

Casper [5] or Algorand [42]. I point interested readers to a survey of consensus pro- tocols for tolerating Byzantine faults, that are used in the state-of-the art blockchain systems [22].

While the above approaches show potential, their robustness, security, usability, and adoption rates in practice remains to be seen. However, all on-chain scaling solutions will perform strictly better with a faster Network Layer, which is where bloXroute improves their performance. Indeed, in every distributed consensus protocol, every honest node must reach the same decision. Thus, regardless of the consensus protocol, every honest peer must obtain information about each transaction in the system. bloXroute focuses on this particular problem, which is fundamentally a broadcast problem, since every valid piece of information (transaction/block) must be propagated to every honest peer in the system. bloXroute is thus complementary to a native consensus protocol used, and it is capable of boosting up the performance, often dramatically, for any blockchain. 46

4.3. Blockchain Distribution Network (BDN): a New Networking Primitive

The Blockchain Distribution Network (BDN) is a new networking primitive, whose goal is to enable cryptocurrencies and blockchain systems to scale to thousands of on- chain transactions per second. Unlike other networking primitives, a BDN is a centrally controlled relay network designed to improve the performance and functionality of a P2P system, without jeopardizing its unique distributed attributes. though centrally controlled

CDNs [96] had experimented with hybrid systems, which offloads some of their work to

P2P components [112, 110], to the best of my knowledge the BDN is the first primitive to do the opposite – a centralized system designed to serve a P2P system.

Below I present the design, the trust model, and the components of bloXroute, the

first BDN, which aims to provide scalability to numerous cryptocurrencies and blockchains simultaneously, utilizing a global infrastructure to support distributed blockchain systems in a provably neutral fashion.

4.3.1. Trust Model bloXroute’s trust model is based on two observations. First, I observe that long block propagation times will not allow trustless P2P blockchains, e.g., Bitcoin, to scale to thousands of on-chain transactions per second. Second, I observe that small centralized systems scale very well by placing trust in a small subset of participants, and passing them the control over the transactions included in the blockchains, e.g., Ripple [7], EOS [8],

BitShares [71], Steem [56]. However, such centralization defeats the single most notable aspect of blockchains and cryptocurrencies: the distribution and decentralization of con- trol over transactions. Providing control over a blockchain’s transactions to a limited 47

bloX route Server Peer N ode

Figure 4.1. The components of the bloXroute system: the bloXroute BDN, and the Peer Network nodes utilizing it. Each Peer Network node runs a Gateway process as an intermediary between its blockchain application and the bloXroute BDN.

number of participants allows said participants to collude, censor, and discriminate be- tween users, nodes, and miners. A limited participant set also reduces the number of nodes a malicious actor has to compromise to control the system.

bloXroute gets around this tradeoff by reversing the direction of trust in centralized systems. While centralized systems place trust in a subset of nodes to enable scalability, bloXroute enables scalability by using a small set of servers which place trust in the entire network. The system utilizes a Blockchain Distribution Network (BDN) to enable scaling, yet nodes need not place any trust in the BDN. Instead, the BDN blindly serves the nodes, without knowledge of the blocks it propagates, their origin, or their destination.

Moreover, its behavior is constantly audited by the nodes it serves, and it is incapable of discriminating against individual nodes, blocks, and transactions. While such a design places the BDN at a disadvantage compared to the nodes it serves, its robustness allows it to withstand dishonest and malicious behaviors. 48

4.3.2. System Components

The bloXroute system consists of two types of operational networks, as shown in Fig- ure 4.1:

• bloXroute is a high-capacity, low-latency, global BDN network, optimized to

quickly propagate transactions and blocks for multiple blockchain systems simul-

taneously.

• Peer Networks are P2P networks of nodes which utilize bloXroute to propagate

transactions and blocks, while carefully auditing its behavior. Each Peer Network

consists of all the nodes using a specific protocol, each running also a bloXroute

Gateway. For example, all the Bitcoin nodes utilizing bloXroute form a single

Peer Network, and each runs a Bitcoin node and a bloXroute Gateway. At the

same time, all Ethereum nodes utilizing bloXroute form a different Peer Network.

bloXroute propagates blocks on behalf of the Peer Networks’ nodes. However, contrary to relay networks, bloXroute propagates blocks without knowledge of the transactions they contain, their number, their sums, the wallets or addresses involved, the miner to produce each block, nor the original node propagating the block. Furthermore, bloXroute cannot infer the above characteristics even when colluding with other nodes of the Peer

Network, or by analyzing blocks’ timing and size. bloXroute cannot favor specific nodes by providing them blocks ahead of others, and cannot prevent any node from joining the system and utilizing it. bloXroute can only propagate all blocks to all its Gateways fairly.

The bloXroute system as a whole is protocol-agnostic, providing its scaling services to numerous cryptocurrencies and blockchains simultaneously. 49

4.3.3. bloXroute Scalability Improvement

To better understand bloXroute’s operation, first consider the naive block propagation depicted in Figure 4.2. Note that each node in the Peer Network runs a bloXroute

Gateway process locally, in addition to its blockchain node. For node psource to propagate a new block b to the rest of the Peer Network, psource’s blockchain node locally passes the block to its Gateway, which transmits the block to the closest server of bloXroute’s

BDN. As the server begins to receive the block’s first bytes, its starts transmitting, i.e., streaming, the block to the other bloXroute servers, which in turn begin to stream block b to the Gateways which they are connected to. Soon after psource’s Gateway completes the transmission of b to its closest bloXroute server, all Gateways are expected to complete the download of block b. Once completed, each Gateway passes block b to the blockchain node it serves.

4.3.3.1. Broadcast Infrastructure. By combining a broadcast primitive with the use of cut-through routing [55], i.e., by streaming blocks as they arrive to all Gateways simul- taneously, bloXroute is capable of propagating blocks significantly faster than blockchains’

P2P block propagation, similarly to relay networks. Since bloXroute is a broadcast prim- itive, blocks don’t require 10–20 hops to reach the entire network, each incurring addi- tional processing time. The superior networking infrastructure bloXroute employs also significantly improves propagation speed, and the expected speedup allows bloXroute to propagate a 10–100 MB block at a similar speed to a 1 MB propagating over the P2P network.

4.3.3.2. Block Compression. In the common case, transactions are first propagated throughout a blockchain P2P network, before being included in a block by a miner. 50

Thus, the propagation of each transactions twice, once by itself and once inside a block, is wasteful for a system which aims to minimize block propagation time. bloXroute takes advantage of the prevalence of its Gateways to continuously monitor the appearance of new transactions. Once a Gateway learns of a new transaction from the blockchain node it serves, it sends the transaction to the bloXroute server it is connected to, to be prop- agated to the entire Peer Network. While propagating such transactions, bloXroute also internally assigns each with a 4 Bytes short ID, reusing the same 4 Bytes address space for this purpose. bloXroute thus continuously push updates regarding the mapping between transactions and said internal IDs to all its Gateways, and each Gateway maintains a hashtable of this mapping, which is consistent across all Gateways.

Once a Gateway receive a block from the blockchain node it serves, it first replace each transaction with its internal ID, before transmitting it to the closest bloXroute server. The replacement of each transaction by its ID compacts the size of the block by a factor of ∼ 100, reducing the propagation time even further. Once the compacted block is received by a different Gateway, said Gateway would reconstruct the original block using its hashtable. Once completed, the Gateway would pass the reconstructed block to the blockchain node it serves, which remains oblivious to the fact that the block was compacted during the propagation process.

The combination of the two methods described above, cut-through broadcasting and block compacting, has the potential to allow bloXroute to propagate propagate a 1 GB block at the speed a 1 MB block can propagate over the P2P system. 51

Figure 4.2. Naive block propagation using bloXroute. (1) Blockchain node passes block to local bloXroute Gateway. (2) Gateway transmits block to bloXroute server. (3) bloXroute server streams block to all bloXroute servers as first bytes arrive. (4) bloXroute servers stream block to all Gate- ways as first bytes arrive. (5) Gateways validate block structure, and pass block to blockchain nodes.

4.3.4. bloXroute’s Provable Neutrality

For a BDN to be safely used to scale blockchains, it must not be capable of discriminating among nodes, blocks, wallets, or transactions. bloXroute achieves this by being provably neutral; it can only propagate all blocks to all its Gateways fairly, and it is incapable of discrimination or censorship due to the inherent encryption and relay countermeasures it employs, and to the auditing performed by the Peer Network nodes. I consider poten- tial adversary behaviors, a complete system failure, and the safety measures bloXroute employs, in the next subsection, in Section 4.3.5.

4.3.4.1. Encrypted Blocks and the Gateway P2P Network. To prevent bloXroute from stopping the propagation of any block based on its content, i.e., based on the wal- lets, addresses and sums of a block’s transactions, its timestamp, its coinbase transaction, or any other attribute, blocks are propagated encrypted, obscuring their content from bloXroute. The key principal behind the encryption of blocks is that gateways encrypt 52 their blocks, and only reveal the encryption key after they have learned from other gate- ways that the block was propagated, i.e., after bloXroute had already delivered the en- crypted block. The gateways’ encryption also alters the block size, hiding both the size and the number of transactions it contains.

Figure 4.3 illustrates bloXroute’s encryption scheme. First, psource’s blockchain node passes a new block (b) to its Gateway, similarly to the naive propagation depicted in

Figure 4.2. Then, its Gateway encrypts block b using a newly generated encryption key k, producing an encrypted block b0. Afterwards, the Gateway transmits b0 to its closest bloXroute server, to be propagated to all other Gateways, similarly to the naive propa- gation (solid line). In order for gateways to learn of their block propagation, gateways form their own P2P network, and each gateway notifies its peers of any encrypted blocks its receive. To reduce bandwidth consumption, gateways hash each encrypted block they receive and send only the result of this hashing, rather than the entire encrypted block.

0 Once psource’s gateway learns that its direct peers, p1 and p2, they have received b , it sends them the encryption key k, to be propagated to the entire Gateway P2P network.

4.3.4.2. Partial Disclosure of Peers. A key attribute of the bloXroute system is the ability of Peer Network nodes to audit the behavior of bloXroute. To that end, nodes relay test-blocks to bloXroute, and validate their peers quickly receive them. However, bloXroute and/or colluders might attempt to relay a node’s blocks only to its immediate peers, and not to the entire Peer Network, causing the node to falsely believe its blocks are relayed to the entire network. To prevent such a behavior, Peer Network nodes do not reveal all the nodes they are aware of. Instead, nodes conceal half the nodes they are aware of, including half of their immediate peers. Thus, if an adversary were to analyze 53

Figure 4.3. Encrypted block propagation using bloXroute. (1) Blockchain node passes block to local bloXroute Gateway, which encrypts block. Then, propagation follows path of naive propagation (solid line). (2) Gateways form a P2P network, where each notifies its peer of encrypted blocks re- ceived. p1 and p2 both notify psource of receiving encrypted block. (3) Once psource learns its peers received the encrypted block, it sends encryption key to its peers, to be propagated to all Gateways. (4) Upon receiving the en- cryption key, each gateway decrypts the encrypted block and pass it to the blockchain node it serves.

Figure 4.4. Indirect relay of encrypted block to bloXroute. (1) As in en- crypted block propagation, blockchain node passes block to local bloXroute Gateway, to be encrypted. (2) Gateway transmits encrypted block to a Gateway peer, to be relayed to a bloXroute server. Encrypted block then propagates, followed by its encryption key, as in encrypted block propaga- tion (solid line). a nodes’ known peers, it will be unable to determine which nodes are its immediate peers and which are not. 54

4.3.4.3. Indirect Relay. To prevent bloXroute from rejecting blocks based on their origin, i.e., based on the node attempting to propagate the block, blocks are not sent directly to bloXroute’s servers. Rather, blocks are relayed to bloXroute’s servers via a

Gateway peer, obscuring their origins from bloXroute. Figure 4.4 presents how the block propagation process is further extended to allow peer-relayed encrypted block propagation.

Similarly to the encrypted propagation outlined above, psource’s blockchain node passes a new block (b) to its Gateway, to be encrypted. However, rather than transmitting the encrypted block (b0) directly to a bloXroute server, it is transmitted to a peer Gateway,

0 p1. Said peer Gateway then relays b to its closest bloXroute sever, to be streamed to all

Gateways. psource’s Gateway then continue as per the encrypted propagation, revealing k after it learns from its peers that b0 was propagated.

0 Since both p1 and bloXroute stream b as it is is being uploaded by psource, psource is expected to receive notifications from its other peers regarding the receipt of b0 within tens of milliseconds after the upload completion. Failing to receive such notifications, whether because p1, bloXroute, or both had rejected it, psource would repeat the process using a

0 different Gateway peer, e.g., p2, using a new encryption key (k ).

In addition to indirectly relaying blocks to bloXroute, nodes may request their peers to relay them incoming blocks arriving from bloXroute. This ensures that bloXroute cannot discriminate against nodes by delaying or avoiding the delivery of blocks. Thus nodes are not required to directly interact with bloXroute at all in order to benefit from its service. The short delay (0.5 RTT) such nodes experience overlaps with the time required for nodes to receive the encryption key (k), minimizing or nulling any negative effect. 55

4.3.4.4. Test-Blocks. While bloXroute is oblivious to which node originated each block, it may attempt to block or stall blocks arriving from some subset of nodes, affecting all the blocks they relay. In order to detect and prevent such behavior, nodes must be capable of continuously monitoring bloXroute’s service. Such monitoring is achieved by allowing nodes to send encrypted invalid blocks, test-blocks, directly to bloXroute, and measuring the time required for peers to report the arrival of the test-blocks. bloXroute is unable to employ discriminatory policies over valid blocks alone, and to faithfully propagate test-blocks, since the two are indistinguishable until their keys are published. The use of test blocks allows Gateways to avoid relaying blocks via peer Gateways which are dis- criminated against, negating the effect of discriminatory policies against individual nodes, while discrimination against a significant portion of the Gateways would be detected and handled similarly to a complete system failure.

4.3.5. Adversary Behaviors, System Failure, and Safety Measures

To better understand how the use of peer-relayed encrypted blocks to obscure a block’s content, origin, and destination, and the use of test-blocks to continuously monitor bloXroute’s behavior, prevents bloXroute from employing discriminatory policies, I pro- vide an overview of bloXroute’s potential adversary behaviors. The key principal behind many of bloXroute’s safety measures is the prevention of discrimination on an individual or small scale basis, and the detection of large-scale discrimination, which is treated as a complete system failure.

4.3.5.1. Content-based Discrimination. For bloXroute to slow or reject a block based on its content, i.e, based on the wallets, addresses and sums of a block’s transactions, its 56 timestamp, its coinbase transaction, its size, or any other attribute, it must have access to these attributes prior to making a choice whether or not to propagate the block. Since these attributes are obscured by the encryption Gateways employ, bloXroute remains oblivious to these attributes until the encryption key (k) is revealed. The encryption key (k) is only revealed after the encrypted block (b0) was propagated to all gateways, which themselves are open-source and are run locally by node operators, at which time bloXroute is incapable of “retracting” them. bloXroute also cannot deceive a node to reveal k prematurely, since it is oblivious to which nodes it is connected, and it would statistically need to propagate b0 to, e.g., 90% of the Gateways, to persuade it that the block was indeed propagated to 90%.

The involvement of colluding Gateways, whether operated by a colluding party or directly by bloXroute, does not affect bloXroute inability to identify the blocks or their timing, since blocks are encrypted before being relayed to a peer Gateway, using a different encryption key (k) every time. bloXroute also cannot discriminate blocks based on the time of their mining, since, lacking the ability to distinguish between test-blocks and real blocks, nor to infer their origins, it must reject all encrypted blocks from all Gateways to do so. This discrimination would be identified by the Gateways, and would be treated identically to a complete system failure.

4.3.5.2. Transaction Discrimination. bloXroute maps between transactions and their internal short IDs to allow gateways to compact blocks, as outlined in Section 4.3.3. To prevent bloXroute from rejecting transactions and avoid assigning them with IDs, Gate- ways are allowed not to compact all transactions in a block. Since blocks are encrypted when relayed through bloXroute, miners are free to include any transaction in the block 57 without interference from bloXroute, whether compacted or not. Simply put, bloXroute cannot reject specific transactions, nor can bloXroute avoid relaying blocks which contain specific transactions. Indeed, bloXroute avoidance of assigning an ID to a transaction does not even prevent said transaction from using bloXroute’s fee-reduction service.

4.3.5.3. Individual Node Discrimination. A different form of discrimination is to prevent individual nodes from propagating their blocks, based on nodes IP address, their operators’ identity, node implementation, or any other node attribute. To en- sure bloXroute cannot discriminate in this fashion, Gateways relay their blocks indi- rectly. Rather than transmitting a block directly via bloXroute, nodes propagate blocks to bloXroute through their peers, preventing bloXroute from knowing the origin of the blocks it receives. Similarly, Gateways may request other Gateways to rely them blocks, avoiding any direct interaction with bloXroute servers altogether.

The use of test-blocks, which are indistinguishable from real encrypted blocks, al- lows each Gateway to periodically test the behavior of bloXroute and the Gateways it utilizes. On the upstream direction, a Gateway can periodically send a test-block, and test whether it is received by its peer Gateways. Any discrimination, whether continuous or intermittent, would be identified by the test-blocks. Test-blocks can also be sent to peers, to test the service they (and bloXroute) provide. On the downstream direction, each Gateway can test that it receives all the encrypted blocks it was notified by majority of its Gateway peers. For bloXroute to reject or delay a block from being uploaded, it must blindly discriminate all blocks and test-blocks received from all honest nodes, as it is oblivious of their real origins. For bloXroute to prevent a block from reaching individual nodes in a timely manner, it would need to blindly discriminate against all honest nodes, 58 as each of them might be relaying blocks. Such extreme means would be identified by gateways, and be treated identically to a complete system failure.

The involvement of colluding Gateways, whether operated by a colluding party or di- rectly by bloXroute, does not affect bloXroute inability to discriminate against individual nodes. If an honest Gateway relays blocks via a colluding Gateway, it would identify any discrimination through the use of test blocks, and replace it with an honest peer Gateway.

For colluding nodes to cause an honest Gateway to release its encryption key(k) prema- turely, they must control the vast majority of Gateways. However, the attack vector of partitioning a P2P network thru the control over the vast majority of nodes is agnostic to bloXroute, and applies directly to blockchain P2P propagation. Thus, bloXroute does not negatively affect the blockchain security model.

bloXroute also cannot discriminate in favor or against individual nodes, by propagat- ing the encrypted block (b0) to some Gateways ahead or after others. Gateways which receive the b0 ahead of other would have to wait for the propagation of the encryption key

(k), which would only occur after the block had been propagated to majority of Gateways, which negates any benefit. Gateways which are discriminated against and their propaga- tion is delayed would identify this behavior based on the notifications they receive from their peer Gateways, and would request one of their peer Gateways to relay blocks to them. Since Gateways stream blocks to one another, it will place the discriminated nodes on par with their peers, as the short delay they suffer (0.5 RTT) overlaps with the time required for k to propagate.

4.3.5.4. Large Scale Node Discrimination and Complete System Failure. As outlined above, bloXroute cannot discriminate against blocks, transactions, or individual 59

Gateways without resorting to discriminate against large portions of the network, which is easily detected by the Gateway P2P network. However, it is possible for bloXroute to be shut down, to suffer from a complete system failure, to intentionally stop its service all together, and to provide its service only to colluding nodes. Regardless of the scenario, it is critical for bloXroute to avoid becoming a single-point-of-failure, especially since blockchains aim to be an always-on, uncensorable, decentralized systems.

For bloXroute not to become a single-point-of-failure, it must be replaceable, such that in the case of a complete system failure or malicious behavior, blockchains can continue to operate at scale. To this end, bloXroute open-source its entire codebase, including easy deployment scripts, and introduces the notion of Backup Networks. Backup Networks are idle instances of bloXroute, deployed by other parties in multiple jurisdictions using different cloud providers, which are ready to “hot-swap” bloXroute in the event of a complete system failure. Since Backup Networks are idle, they incur negligible costs to operate, yet they provide an alternative if bloXroute is ever to be compromised or fail.

Backup Networks are meant to be deployed by entities which are economically in- vested in blockchains, such as miners, exchanges, large investors protocol developers, and blockchain companies. For such entities, the low operational costs of running a backup network in the event of a complete system failure is insignificant to the cost of losing the ability to propagate blocks or transact at scale. Unlike blockchain nodes, which are generally anonymous, the operators of Backup Networks are expected and incentivized to publish their networks, such that Gateways could be both manually and automatically configured to use multiple Backup Networks operated by reputable entities in multiple jurisdictions. 60

While the use of multiple Backup Networks is not as cost-efficient as the use of bloXroute for all blockchains, it does protect blockchains from falling back to the slow

P2P propagation mechanism. Gateways would also automatically test bloXroute using test-blocks to measure its performance, and in the event of a short-lived temporary system failure, would revert to using bloXroute once the matter is resolved. In the event of mali- cious behavior, or of a permanent system failure, each blockchain ecosystem, i.e., miners, developers, node operators, etc., could operate using Backup Networks for months and possibly years, as its members decide of a suitable long-term replacement for bloXroute.

4.3.5.5. Sustainability through Peer Auditing. bloXroute provides a provably neu- tral block dissemination service by allowing the Peer Network nodes, via Gateways, to continuously monitor its behavior using test-blocks, and through its willingness to prop- agate un-validated encrypted data. This puts bloXroute at a disadvantage in comparison to its users, and opens the door to resource-wasting malicious behavior, which bloXroute is provisioned to withstand.

bloXroute’s robustness and service incurs costs, namely, the delivery of large traffic volumes to a large number of nodes. For example, assuming a network of 10, 000 full nodes, i.e., similar in size to today’s Bitcoin network, each of which creating four 1 MB test-blocks per day, bloXroute would deliver 100 TB per day, or 9.26 Gbps. At the same time, assuming transactions are created at a rate of 3, 000 TPS, their delivery would require additional 132 Gbps. Note that the bandwidth required to support test-blocks increases exponentially as the number of full nodes increases, while the bandwidth required to support higher TPS does not. The cost of supporting a reliable, low-latency, global infrastructure which immediately delivers these large traffic volumes is non-negligible. 61

To assure bloXroute’s sustainability, transactions may include a minuscule, optional and voluntary payment to bloXroute, which provides greater incentives for miners to include them. The validation of such payments is done by the Gateways; the Gateways validate that as blockchains process ever increasing volumes of transactions per second, and as the cost of supporting them increases, a fraction of the capacity bloXroute creates is dedicated for transactions which contain such payments. Thus, transactions can opt-in to this additional capacity by including a payment to bloXroute, which will reduce the overall fee they must carry. Since this additional capacity has lower demand and excess capacity, miners require a smaller incentive to include a transaction in this excess capacity, i.e., they require a smaller transactions fee. Thus, bloXroute reduces transaction fees to outweighed the payment to bloXroute, creating a “fee-reduction service”.

4.3.5.6. Onboarding Process. A blockchain ecosystem whose members wish to take advantage of bloXroute can do so in the following steps. The first nodes and miners of a specific blockchain who wish to utilize bloXroute are required to do nothing more than simply running bloXroute’s Gateway process in parallel to their blockchain node, and adding the Gateways to the node peer list. For onboarding purposes, bloXroute will run a sufficient number of servers around the world, so that Gateways can propagate blocks and receive transactions faster. As more nodes use bloXroute, blockchains will see considerably fewer forks and stronger security guarantees that result from bloXroute’s superior block distribution. As a second step, blockchains can adjust their protocol to capture more of the capacity increase provided by bloXroute, e.g., increasing the block size and reducing the inter-block time interval. bloXroute requires no further changes to the protocol, and will allow cryptocurrencies to use the network for free as long as 62 they produce no more than 100 transactions per second (TPS). As a reference, Bitcoin supports only 3 TPS. Thus, bloXroute allows to increase capacity by a factor of 33 today, without requiring any fees and any protocol change beyond adjusting the block size and inter-block time interval.

Blockchains require no protocol change beyond adjusting the block size and inter- block time interval to fully utilize bloXroute’s capacity. Once the 100 TPS threshold is reached, there becomes an increasing incentive for users to make the minuscule payments to bloXroute to reduce their fees. This in turn would eventually cause users to demand the implementation of making such payments easily from their wallets and nodes. Note that the protocol itself does not change; the validity requirements remain the same, as is the structure of blocks and transactions, and all the messages among nodes.

4.4. bloXroute Empirical Evaluation

bloXroute had been developed and deployed by bloXroute Labs, and its operational deployment is supporting multiple real-world blockchains simultaneously, Including Bit- coin (BTC), Bitcoin Cash (BCH), and Ethereum (ETH). It is important to note that since bloXroute is a commercialized system, it must address issues which are usually ignored by research prototypes, including the prevention of nodes from dropping their connections, recovery from failures, a diverse and complex deployment to allow for continuous updates, and others. As such, some known performance issues are not yet addressed, despite the availability of suitable solutions. Therefore, the measurements of bloXroute’s performance presented below outline bloXroute’s potential, rather than its possible capacity. 63

500 BTC 400 BCH ETH 300

200

100 Compression Ratio

0 1/1/19 00:00 1/1/19 06:00 1/1/19 12:00 1/1/19 18:00 1/1/19 23:59 Time

Figure 4.5. The ratios by which bloXroute compresses the blocks mined on January 1st, 2019 on the Bitcoin (BTC), Bitcoin Cash (BCH), and Ethereum (ETH).

4.4.1. Real-World Performance

In order to assess bloXroute’s ability to compress the size of the blocks it propagates, as outlined in Section 4.3.3, I measured the ratio by which bloXroute had compressed the blocks of Bitcoin (BTC), Bitcoin Cash (BCH), and Ethereum (ETH), which were mined and propagated on January 1st, 2019, presented in Figure 4.5. During this time, 154 new blocks had been mined on the Bitcoin (BTC) blockchain, 138 new blocks had been mined on the Bitcoin Cash (BCH) blockchain, and 5889 new blocks had been mined on the

Ethereum (ETH) blockchain, and the factors by which these blocks were compressed are shown. Since the Ethereum (ETH) blockchain produces smaller and more frequent blocks than Bitcoin (BTC) and Bitcoin Cash (BCH), and to allow a clearer presentation of the data, I average the compression ratios of Ethereum (ETH) blocks in every 10 minutes interval in the time period presented,

First, consider the the blocks produced on the Bitcoin (BTC) blockchain. bloXroute’s

Gateways compress Bitcoin blocks by a factor of 107.6 on average. At the 10th, 50th, and 64

90th percentiles, bloXroute compress Bitcoin blocks by a factor of 15.5, 77.5, and 139.5, respectively. These results confirm bloXroute’s ability to compress propagated blocks by

1–2 orders of magnitude, allowing it to propagate significantly larger blocks. It is worth pointing out two blocks in particular, which provide additional insights on bloXroute’s performance. The first of the two is the Bitcoin block at height 556, 611, which carries a timestamp of 18:38:03. Block 556, 611 is compressed by a factor of 468, significantly higher than other blocks observed, despite being of similar size to blocks 556, 610 and

556, 612 which immediately precede and succeed it. In fact, block 556, 611 is ∼ 50 KB smaller than blocks 556, 610 and 556, 612. Unlike other blocks observed during this period, block 556, 611 contains a small number of very large transactions, only 309 transactions with an average size of 3.5 KB. For comparison, blocks 556, 610 and 556, 612 contain over 2, 500 transactions each, with an average size of 426 bytes. bloXroute is especially effective for the propagation of blocks containing large transactions, since all transactions are compressed to the same size, enabling the use of blockchains for more complex and sophisticated interactions.

It is also worth considering block 556, 554, which carries a timestamp of 10:40:49, which is compressed by a factor of 1, i.e., it is not compressed at all. A deeper inspection of the blockchain reveals that block 556, 554 is extremely small, only 241 bytes, and contains only the coinbase transaction. Further examination reveals that its predecessor, block 556, 553, carries a timestamp of 10:40:18, presumably mined only 30 second earlier, which is a normal and common phenomena, caused by the probabilistic nature of Bitcoin’s mining. It is thus likely that the mining pool to produce block 556, 554 had learned of the mining of block 556, 553 and the content of its header, which had allowed it to start 65 mining on top of it, yet it did not receive and process the entire block, which prevented it from deciding which transactions were valid and could be safely included in a new block.

To maximize its profits, it began mining an “empty” block until such time it could safely construct a valid block, and had successfully mined block 556, 554 before such time had arrived.

Given that block 556, 554 only contains its coinbase transaction, which is unknown prior to the block’s mining, bloXroute is unable to compress it to hasten its propagation.

However, since its size is roughly half the size of an average transaction, its propagation to the entire network is likely extremely fast. When considering the circumstances that had led to the mining of an “empty” block, it is very likely that the slow propagation of its predecessor, block 556, 553, had caused it, since the processing time it requires is at least an order of magnitude shorter than both the propagation time and the supposed time interval between blocks 556, 553 and 556, 554. That said, it is impossible to know for certain the actual mining time of these 2 blocks.

In addition to Bitcoin (BTC), I measure the factors by which bloXroute had com- pressed Bitcoin Cash’s (BCH) blocks, which had diverged from Bitcoin (BTC) on August of 2017 to allow larger blocks. Despite allowing larger blocks, the Bitcoin Cash (BCH) blockchain sees less usage than Bitcoin (BTC), and its average block contains only 50 transactions, in comparison to Bitcoin’s 1, 500 transactions. The lack of usage provides bloXroute with fewer opportunities to compress Bitcoin Cash’s blocks, and increase the relative overhead of blocks’ headers and coinbase transactions. However, even in this less- than-ideal scenario, bloXroute is able to compress blocks by a factor of 14.3 on average.

The Ethereum (ETH) blockchain, which rapidly produces small blocks, draws a similar 66 picture. Ethereum’s average block size is 14.38 KB, out which over 500 bytes are used for its header. Thus, Ethereum blocks have significant overhead in comparison to Bitcoin’s.

Despite this overhead, bloXroute is able to compress Ethereum (ETH) blocks by a factor of 16 on average. bloXroute is therefore capable of compressing blocks by more than two orders of magnitude in favorable real-world conditions, and by more than an order of magnitude in less-favorable real-world conditions.

4.4.2. High TPS Performance

While bloXroute is designed for real-world usage, and is indeed connected to multiple real-world blockchain networks, the low volume of transactions on the Bitcoin, Bitcoin

Cash, and Ethereum blockchains does not allow to test whether bloXroute truly enables blockchains to process high volumes of TPS. To better understand bloXroute’s perfor- mance, I deploy a network of 30 real Bitcoin nodes, using Bitcoin Unlimited’s [111] implementation of Bitcoin, and adjust their parameters to mine a new block (of any size) every 30 seconds. Said Bitcoin nodes are deployed in 3 AWS datacenters in 3 different regions: North America, Europe, and Asia-Pacific. The deployment utilizes Docker [18] to manage for the automation of resource allocation and peering, and each node runs the

Linux operating system using 4 cores, 16 GB of RAM, and a bandwidth of 100 Mbps, comparable to the resources of a high-end consumer PC [95]. Each Bitcoin node performs all its normal operations, e.g., validation of transactions and blocks utilizes the normal

Bitcoin messages, and incurs all the delays and overhead of a normal Bitcoin node. In addition, each Bitcoin utilizes its own Gateway, which compacts blocks, encrypts them, notifies peer Gateways of encrypted blocks, propagates encryption keys, decrypts blocks, 67 and uses Bitcoin messages to send blocks to the Bitcoin node it serves. New unique trans- actions are generated and sent to all nodes, to resemble organic transaction generation.

This experiment is part of a multi-year effort to enable a Bitcoin network consisting of thousands of nodes to process thousands of TPS, however this goal is yet to be achieved.

I acknowledge the need to deploy a larger network on a more diverse infrastructure to assess bloXroute’s capacity more accurately. That said, the experiment below provides an indication of that the bottleneck for blockchain scalability is indeed at the blockchain

Network Layer, and of bloXroute’s potential to solve it.

Figure 4.6 presents the different steps of the propagation of the average block in bloXroute’s testbed and its time line. The average block contains over 40, 000 transactions, which must be propagated significantly faster than the 30 seconds block time interval rate used in the testbed. As Figure 4.6 shows, the time required for a block to be passed from a

Bitcoin node to its Gateway is negligible. The time required for the Gateway to compact the block is similarly negligible, only 170 microseconds on average, which compacts the block by a factor of 50, since the transactions generated in the testbed are only half the size of the average Bitcoin transaction. The most time consuming procedure Gateways perform is the encryption of blocks, which takes on average 618 ms for the average block.

While the encryption of blocks is vital to bloXroute’s neutrality, it is likely that faster encryption methods can be used to achieve the same level of security, and the bloXroute

Labs team is now considering alternative, less-costly encryption methods.

Once the block is encrypted, Gateways would require 3 ms to begin its transmission, and 2 additional milliseconds for the closest bloXroute server to receive the encrypted block’s first bytes, which would be immediately relayed to all other servers and Gateways. 68

0 ms : Mined 618 ms : Encrypted 650 ms : More Gateways Send Notifications

0 ms : Received at Source Gateway 621 ms : Being Send 742 ms : Server-III Begins Receive

0.2 ms : Compacted 623 ms : First Bytes at Server-I 744 ms : Server-III Begins Send

623 ms : Server-I Begins Send 751 ms : Last Gateways Begin Receive

628 ms : Gateways Begin Receive 755 ms : Last Gateways Send Notifications

629 ms : Server-I Ends Send 881 ms : Source Gateway receives last notifications

634 ms : Gateways Send Notifications 882 ms : Source Gateway sends key

637 ms : Source Gateway Receives Notifications 1094 ms : All Gateways receive key

641 ms : Server-II Begins Receive 1094 ms : Block Decrypt

641 ms : Server-II Begins Send 1094 ms : Block Reconstruct

645 ms : More Gateways Begin Receive 1094 ms : Reach Nodes

Figure 4.6. The average timeline for the average block to propagate in the bloXroute testbed. Blocks requires roughly 1 to be propagated in a provably-neutral fashion to the entire Bitcoin network deployed in the testbed.

These short time scales are in all likelihood the result of both Gateways and servers being hosted in the same datacenters. However, the latency between a Gateway and its closest bloXroute server in a real-world deployment should not be much longer, in the order of 10–20 ms. After 5 ms, which is 628 ms after the block’s mining, the first Gateways begin to receive the block. At time 629 ms, the closest bloXroute server completes the transmission to all Gateways and servers, and at time 634 ms the closest Gateways send notifications to their peers regarding the encrypted block they received, which the source

Gateway receives at time 637 ms. This too is expected to take a few tens of milliseconds longer in a real world deployment, where the closet Gateways are not running in the same datacenter. At time 641 ms, on average, the first bytes of the encrypted block are received at a bloXroute server which runs in a different region, which immediately propagates them to the Gateways it is connected to and to other bloXroute servers. Said Gateways receive 69

1.0

0.8

0.6

0.4

CDF of Blocks 0.2

0.0 0 250 500 750 1000 1250 1500 1750 2000 Transaction per Second (TPS)

Figure 4.7. CDF of the transactions per second (TPS) processed in blocks mined during 20 trials of the bloXroute testbed. Each trial includes the mining of 20 consecutive blocks.

the first bytes of the encrypted block at time 645 ms on average, and 5 ms afterwords send notifications to their peers regarding the encrypted block they have received.

At time 742 ms, 121 ms after the Gateway began its transmission, the block’s first bytes arrive to the third bloXroute server, which begins relaying them 2 ms afterwards. At

751 ms, the Gateways which are furthest away from the source Gateway receive the block’s

first bytes, and by 755 ms, they send notifications to their peers regarding the encrypted block they received. At 881 ms, the source Gateway receives the last notifications from his peers, and sends out the encryption key at 882 ms. Lastly, at time 1094 ms, slightly over a second after the block was mined, the encryption key is received by all Gateways, its content is revealed, and each Gateway passes it to the Bitcoin node it serves.

Figure 4.7 presents the CDF of the TPS processed by blocks during 20 trials of the bloXroute testbed. Each trial allows for 20 consecutive blocks to be mined by the Bitcoin nodes running in the testbed, whose parameters were adjusted to mine a new block 70 approximately every 30 seconds. Thus, the data presented in Figure 4.7 is collected from

400 blocks mined in the bloXroute testbed.

It is apparent that bloXroute indeed allows the Bitcoin nodes running in the testbed to process orders of magnitude higher volumes of TPS. In the median case, bloXroute enables the Bitcoin nodes to process 868.6 TPS, while the block at the 10th percentile processes 495.8 TPS, and the block at the 90th percentile processes over 1542 TPS.

In addition, the bloXroute testbed shows that blockchains are capable of processing a high volume of TPS, assuming a blockchain Network Layer to support them. Nodes and miners are capable of processing a high volume of transactions using consumer-grade hard- ware, even when presented with large blocks which are mined frequently. Furthrermore, it is evident that bloXroute’s propagation scheme is capable of quickly disseminating blocks despite the overhead of compression, encryption, key propagation, decryption and reconstruction, as can be seen both in Figure 4.6 and in Figure 4.7.

4.5. Mining Decentralization

A unique feature of blockchains is that their decentralized and distributed nature makes it very difficult for any adversary to freeze funds, reverse transactions, or prevent participation, as outlined in Chapter 3. These properties makes Bitcoin, and blockchains in general, a valuable method to safely store and transfer wealth, and for permissionless, cross-border transactions. However, these properties depend on the inability of any single entity to control the majority of hash power, either directly, through collusion with other miners, or by forcing other miners to comply using legal or illegal means. The coalescence 71 of miners into mining pools significantly exacerbates this risk, as only 4–6 pool operators needs to be compromised to gain control over the majority of hash power [41].

While mining pools had emerged to reduce the high variance of miner rewards, it is not strictly necessary for miners to transfer the control over the construction of blocks, and thus the power to reject transactions, to pool operators. One solution to reduce the variation in miner rewards without centralizing the control over the construction of new blocks is P2Pool [102], which had been in use since 2014, yet had failed to gain significant traction. With P2Pool, each miner runs its own blockchain node, e.g., a Bitcoin node, which constructs new blocks locally. In the process of mining, any P2Pool member to mine a block to earn a share, as outlined in Chapter 3, would propagate said block to the other P2Pool members, which would verify (i) its validity, (ii) it splits its reward proportionally with the miners of previous shares, and (iii) it contains the hash of the previous share, using bytes which may contain arbitrary data. Thus, P2Pool miners maintain a “share chain”, where each share-block has a pointer to its predecessor, and each miner is incentivized to mine on top of the longest chain since it knows other miners are doing the same, as per Bitcoin.

While P2Pool has the potential to alleviate the concerns regarding mining central- ization, and despite the support of the blockchain community, P2Pool had not gained significant traction with miners. The main reason for P2Pool’s lack of traction is rooted at the blockchain Network Layer; a P2Pool miner could only start mining on top of a newly-found block after it had received it, which could take up to tens of seconds.

In comparison, pool members only receive the parameters required for hashing for the 72 pool operator, and it is only the pool operator which must receive the new blocks. Fur- thermore, large pool operators run multiple geographically-dispersed, well-provisioned, well-connected nodes, as well as connect to other pools (“spy-mining”), to assure they learn of new blocks as soon as possible. Since pool operators hear of new blocks sooner, they enable pool members to begin mining on top of new blocks sooner, which in turn increases miner profitability, in comparison to P2Pool.

The use of a BDN levels the field for all miners, as it broadcasts new blocks to all miners and pools simultaneously. Indeed, the profitability of P2Pool miners is expected not only to match that of a pool member, but to surpass it, since pool operators collect fees from their members while P2Pool does not. Thus, for the first time, the deployment of bloXroute creates an incentive for miners not to coalesce into mining pools. With this incentive in place, new development is now underway to further reduce the complexity of

P2Pool, and to allow independent miners (“solo-miners”) to share their rewards without connecting directly. 73

CHAPTER 5

Webcoin: Decentralizing Search Engines with Blockchain

Networking Resources

Networking tasks, such as Web crawling, webpage scraping, Web indexing, routing measurements, as well as latency and bandwidth measurements, are required by CDNs, cloud services, search engines, ISPs, and researchers. Traditionally, networking tasks are considered difficult to perform and hard to estimate, since their results depend heavily on an observer’s network characteristics. At the same time, incentivizing users to leverage their positions in the network to perform these networking tasks remains a challenging open problem. I propose an alternative and comprehensive solution to this problem; namely, I show how blockchains can utilize their networking resources to crowdsource complex networking tasks. Considering the large space of networking tasks which can be outsourced, I necessarily limit the scope of this work to one particular subproblem in this domain, and focus on utilizing a blockchain’s networking resources for continuous

Web indexing, a necessary step to enable Web search, and arguably first among these networking tasks, in terms of real-world influence.

The Web search market, for both mobile and desktop clients, is extremely consolidated and dominated by Google, which holds a share of over three-quarters of the market [76].

Furthermore, the 4 largest search engines, Google, Bing, Baidu and Yahoo, together control over 98% of the market. Each one of these search engines maintains an index of 74 the Internet’s webpages and their content in order to return appropriate responses to their clients’ queries. Google, which research had shown to hold the largest index, maintains an index of approximately 46.5 billion webpages [20, 30].

One of the key barriers that hinders the proliferation of a larger number of search en- gines is the daunting, multi-billion-dollar-worth task of crawling and indexing the exabyte- scale Web space1. Hence, this task is carried out by a small number of powerful players capable of committing substantial resources to conduct continuous 24/7 Web crawling and indexing. Despite efforts to democratize this space, i.e., promote the use of dis- tributed, objective and unbiased index instead, e.g.,[37, 67, 108, 84], the use of such distributed systems had not become common. This is mostly due to inferior results as- sociated with small Web indices used by such systems, induced by the lack of economic incentives for participation in distributed Web crawling, or due to the small-scale nature of the communities of users interested in collaborative Web indexing.

I propose a design for a new distributed digital currency, Webcoin, which crowdsources networking tasks, using a novel method to utilize the networking resources of Webcoin miners. Specifically, Webcoin rewards Web crawling, scraping, and indexing, by producing money, for those who participate in these tasks. Systems with a similar economic model, which produce money for participation, had already been successfully implemented and are gaining traction. For example, Steem [56] rewards users with money for the blogs, posts, comments and other social media content they produce, and had reached a market capitalization of over 300 million USD. However, Webcoin’s novelty, focus, and unique contribution lies in being the first to incentivize blockchain participants to utilize their

1as disclosed by an executive of a major search engine 75 networking resources, rather than their computational resources or personal content, to produce Google-scale Web indices within days.

The starting point in designing Webcoin is Bitcoin [75], the first peer-to-peer currency to gain considerable traction globally. Bitcoin is a mechanism to maintain and update a ledger of transactions in a distributed manner. Its security and integrity are achieved through a process called mining, where every node (miner) conducts intensive compu- tations in an attempt to add a valid block of transactions to the ledger. To incentivize mining, each block produces new Bitcoins which are awarded to its creator, an incen- tivizing mechanism which Webcoin utilizes to perform non-computational tasks. Unlike

Webcoin, which aims to democratize the search engine industry, the unparalleled compu- tation power involved in Bitcoin mining, currently at 15–20 exahashes per second, serves no other purpose beyond Bitcoin. For comparison, Bitcoin is currently consuming roughly

43.7 TWh annually, similarly to Hong Kong’s annual energy consumption [32].

Webcoin’s goal is to utilize Bitcoin’s principles, yet force miners to crawl and index the Web instead of conducting purposeless computations. However, whereas creating a valid Bitcoin block is computationally hard, and thus the production of such a block is a Proof-of-Work of the computational resources invested, it is challenging to utilize networking resource and the content of indexed webpages to produce a Proof-of-Work, while providing the same security guarantees as Bitcoin. This is because both legitimate and fraudulent webpages are susceptible to manipulation. To overcome this challenge, I decouple the webpage indexing Proof-of-Work from the nodes’ success, remodeling it as a necessary but insufficient requirement for successful Webcoin mining. I thus remove any 76 incentive for miners to manipulate webpages, as such efforts will not affect their chances to be rewarded.

A second major challenge in Webcoin’s design is the requirement for Web indices to be collected from all Webcoin miners, not only from the few which successfully mined a

Webcoin block. Unlike Bitcoin, where only the miner which had successfully mined a new block publishes it, Webcoin must force every single Webcoin miner to publish its index, so that the entire product of the indexing effort is made available to all. I accomplish this by requiring miners to publish a hash of their Web index as a way to qualify for mining

Webcoin, and to upload their compressed Web index to ensure their competitiveness.

The third challenge is associated with Web index validation, i.e., how to ensure that the Web indices submitted by miners are accurate? Contrary to Bitcoin, where a Proof- of-Work validation lasts several nanoseconds, validating all indices does not scale. I thus provide a mechanism that incentivizes miners to submit valid Web indices. In particular, once a miner mines a new Webcoin, its webpage indexing Proof-of-Work is validated. As a result, while only a small fraction of submitted indices is actually verified, the scheme ensures that nodes have no chance of winning Webcoins if they do not properly index the

Web. This guarantees the validity of the Web index collectively produced by the miners.

I note that the creation of a competitive real-world search engine requires significant amounts of additional work, in fields ranging from Human-Computer Interaction to In- formation Retrieval. However, I argue that no such additional work can begin without an access to a global Web index, which is limited today to the employees of the largest search engines. I further note that reviewing all technical aspects of a crypto-currency designed for a new domain exceeds the scope of this work. My goal is thus to present 77 the principles which enable and incentivize the crowdsourcing of networking tasks, rather than to review all of Webcoin’s underlying low-level details.

5.1. From Bitcoin to Webcoin

5.1.1. Webcoin’s Goals

The ultimate goal in the creation of Webcoin is to incentivize distributed Web indexing, in a manner which enables unrestricted access to the resulting indices. To achieve this goal,

Webcoin migrates Bitcoin’s principles from the computational domain to the networking domain, and rewards miners for performing Web indexing, unlike any other digital cur- rency, to the best of my knowledge. To achieve the above goal in practice, Webcoin must satisfy the following goals:

(1) Security. Webcoin must provide the same security and integrity guarantees as

Bitcoin to achieve practicality.

(2) Scalability. Webcoin must scale to the Bitcoin network size and beyond, without

increasing the load on the miners as the index size reaches Google-scale. Webcoin

must also not increase the load on Web servers as the number of miners increases.

(3) High Participation. Webcoin’s design must allow and incentivize the partici-

pation of miners with limited networking capacities, i.e., with low bandwidth and

high latency. Large user participation is essential for gaining momentum with

system’s popularity, particularly in early days of the system adoption. I discuss

the economic effects of larger mining operations in Section 5.4.

(4) Indices Availability and Reliability. Unlike Bitcoin, where only the first

node to mine a block publishes it, to be propagated to the entire Bitcoin network, 78

Webcoin must not only allow, but force every single Web-coin miner to publish

its index, so the entire product of the indexing effort is made available to all.

Moreover, Webcoin aims to verify the indices’ correctness, and to incentivize

miners to reliably report the results of their indexing.

5.1.2. Webcoin’s Non-Goals

While Webcoin is designed with real-world deployment in mind, some aspects of its op- eration go beyond the scope of this work. I define these items as non-goals, as listed below:

(1) Bitcoin’s Imperfections. Bitcoin’s design is not necessarily optimal. The

concerns regarding Bitcoin include the threat which mining pools pose to Bit-

coin’s security [36], vulnerability of wallets to common attacks, e.g., phishing

attacks [52], and Bitcoin’s time interval between blocks, which could be reduced

to incorporate transactions at a higher rate [10, 106]. In designing Webcoin, I

do not attempt to address these inherited concerns, and I focus on the utiliza-

tion of Bitcoin’s principles in the networking domain, and on the utilization of

blockchains’ networking resources.

(2) Indexing and Crawling Frameworks. Webpage indexing, and document

indexing in general, can be performed in multiple fashions, depending on the

desired features of the resulting index, and numerous indexing frameworks are

available [27, 108, 11, 44, 45, 65, 70, 80, 84, 89]. However, the design

of an index depends on the prioritization among its features, requires ongoing

adjustments, and is outside the scope of this work. I thus refrain from integrating 79

Webcoin with any of the above frameworks, and implement an inverted indexing

procedure [3]. Future implementation could integrate with any of the above

frameworks.

Similarly to indexing, multiple frameworks could be utilized for scalable and

distributed web crawling [19, 51, 64]. Webcoin utilizes Scrapy [90], as in [103],

a highly popular and powerful open-source framework for web-scraping. Alterna-

tive frameworks, such as [64], might be even better suited for Webcoin, as their

capacity to filter on-the-fly content-generating webpages, spam, and “crawler

traps” exceeds Scrapy’s. Such improvements to the crawling and indexing pro-

cedures are outside the scope of this work.

5.1.3. Proof-of-Work Wastefulness

Bitcoin is the first blockchain-based crypto-currency, and as noted above, its mining consumes growing amounts of resources. The energy consumed in the mining process can be viewed as energy invested in the security of Bitcoin. However, the increased investment does not translate to additional functionality, which bears asking how much energy should be invested for additional security.

One approach to reduce crypto-currencies’ energy footprint is to shift from Bitcoin’s

Proof-of-Work (PoW ) to the use of Proof-of-Stake (PoS)[57]. In PoS, the exhaustive search for a nonce is eliminated, and the eligibility to produce a new block is distributed among those who hold crypto-currency funds. The more coins one holds, the higher its probability to produce the next block. However, the security guarantees and economical 80

(a) (b)

Figure 5.1. Webcoin mining process: (left) (1) Miner crawls webpages along a directed path, (2) miner indexes webpages, hashes the index, prop- agates digest to peers, (3) collectors request and receive compressed indices from miners. (right) (4) Miners wait until some miner becomes eligible to mine a new block, (5) eligible miner propagates block and index, acceler- ated by collectors, (6) each miner verifies the block, compares index to past digest, and verifies a small subset of webpages.

model of PoS systems remains to be proven. A different approach aims to replace the com- putational intense hashing with alternatives which require frequent memory access [81] or storage access [79, 72].

Webcoin takes a different approach, aiming to crowdsource a beneficial work that has significant real-world implications, to miners. In addition, the footprint is consider- ably reduced by networking tasks, as they consist of long waiting times and infrequent computations. It is worth noting that Permacoin [72] suggests it is possible to use its storage-lookup PoW to crowdsource the archiving of large data sets. However, it lacks mechanisms to make the archived information available to all. 81

5.2. The Webcoin Protocol

The Webcoin protocol is identical to Bitcoin on most aspects; it uses the same mes- sages, the same gossip protocol, the same transactions, and the same blockchain mecha- nism. Webcoin only differs from Bitcoin in the requirements a newly-mined block must fulfill in order to be accept by nodes. In Bitcoin, a node will only accept a new block if hashing the new block with its predecessor (including its timestamp, nonce, target, and its transactions’ Merkle root) yields a small enough value. In Webcoin, each new block is accompanied by an index, and the block will be considered valid only if:

(1) Block is valid.

(2) Index is valid.

(3) Index reported ahead of time.

(4) Index includes webpages of a specific crawl path, determined by miner’s IP ad-

dress, also included in block’s header.

(5) The hashing of some fields in the new block, including said IP address, with the

fields of previous blocks yields a value smaller than Webcoin’s target.

I explain these requirements and the mining process, depicted in Figure 5.1, below.

5.2.1. Webcoin’s Proof-of-Work

In earlier versions of Webcoin, I have made attempts to use the index as a na¨ıve Proof- of-Work, i.e., to mine a new block, a miner must simply present a valid index, where the hashing of a new block, its predecessor, and the index, produces a value smaller than

Webcoin’s target. Unfortunately, this approach has two significant drawbacks. First, it only incentivizes the miner who successfully mines a new block to publish the result 82 of his efforts, rather than incentivizing all miners to publish their indices. Second, it opens the door to numerous types of index and block manipulation; rather than invest networking resources to crawl and index the Web, a dishonest miner might produce valid block and index much faster by altering webpages under its control, or by manipulating the transactions included in the block.

To overcome these challenges, the Webcoin protocol uses its Proof-of-Work as a nec- essary but inefficient condition. In every round where miners attempt to mine a new block, each miner must produce a valid index and publish its hash to be eligible to mine a block in the following round (not the present round). However, the index by itself does not determine whether a miner will successfully mine the following block, it is merely a prerequisite. In addition to this prerequisite, miners compete on who will be the first to mine the next block and receive its reward, as in Bitcoin, however their competition is quiescent. Each miner waits until such time that the hashing of the current time (in seconds) and the IP address it used to construct its index, as they appear in the header of the new block, and a new header field anybit, and the hash of past transactions, as they appear in the headers of past blocks, produce a digest which is smaller than Webcoin’s target. I continue to outline here the mining process in its entirety, and provide a detailed explanation of these parameters, their effect on security, and the implications of utilizing multiple IP addresses in Section 5.4.

Simply put, a Webcoin miner first produces a new Web index and publishes its hashing along with its IP address. Then, in the next round, it awaits until such time it can successfully mine a new block, or more likely, it hears of a new block mined by another miner, accompanied by the index from its miner. Thus, in every round the miner both 83 quiescently tries to mine a new block and actively produces a Web index to be eligible to mine a block in the following round.

Upon receiving a new block, which is accompanied by its miner’s index, a miner will verify that the block is valid, that the index is accurate, and that it had been accurately reported in the previous round. To validate the block itself, the miner will validate all transactions are valid, as in Bitcoin. It will also verify the block contains an IP address which matches the index’s IP address, and that it had received the index’s digest and IP address during the previous mining round. The miner will then test the miner’s control over said IP address by sending it a single-packet request for the block’s hash, and finally it will statistically validate the index by sampling a small subset of its webpages, a process detailed in Section 5.3. If block and index are both valid, the miner will accept the block.

Note that the blockchain does not contain any indices. The miner of a new block will send its index alongside its newly-mined block, and they will both propagate throughout the Webcoin network, however the index is discarded soon after the block is accepted.

5.2.2. Index Collectors

The prerequisite to publish the indices’ digests ahead of time incentivizes all miners to produce Web indices and publish their digests. While it is clearly possible to require all miners to publish their indices, rather than the digests, such undiscriminating flood of data is wasteful. Most miners have no interest in collecting the indices for their own purposes, and are only interested to receive the index of the miner who successfully mined a new block, in order to validate it. However, Webcoin provides both the means and the incentives to allow interested parties to collect all the indices from all miners. 84

Nodes interested in collecting indices, collectors, e.g., new search engines, existing search engines aiming to cut costs, companies performing analysis of big-data from the

Web, and researchers, can gain knowledge of all the indices produced through the propa- gation of their digests. Collectors may request miners to provide them with their indices, and such collaboration is facilitated based on their respective interests, in a tit-for-tat manner. While collectors have an interest in receiving indices for their own goals, miners have an interest to minimize the time required for their block and index to reach the majority of nodes. Miners are interested in to shorten their propagation time since it reduces the chances for a competing block, mined by a different miner at a similar time, to reach and be accepted by the majority of nodes first, thus to prevent their block from being included in the blockchain. A miner can thus increase its profitability by sending its index before the next round begins.

It is therefore in the best interest of a miner to comply with requests for its index it receives from collectors, so in the event it successfully mines a new block in the following round, the index and block he must present would propagate faster through the network, assisted by the upload bandwidth of the collectors. The cost for a miner to comply with such requests, and send his index in a compressed form to collectors is minimal.

Collectors, on the other hand, do not earn Webcoins by participating, the way miners do.

However, by assisting with the propagation of a single index and block in each round, collectors get access to all indices, which they can use for their own goals. To be useful to miners, collectors secure sufficient upload capacity and maintain a long list of outbound connections. 85

5.3. Web Indexing and Index Validation

Each Webcoin miner is required to crawl 1,000 webpages every 10 minutes, a value experimentally determined in Section 5.5. As the number of miners increases, so does the likelihood of different miners to index the same webpages. To increase Webcoin’s efficiency, and to allow for a truly scalable Web indexing, Webcoin limits the possible webpages each miner can crawl and index. Miners must comply with the following restrictions for their index to be considered valid.

First, miners must create their indices using continuous crawls, i.e., their indices must contain one or more series of webpages, each referencing those following it. Each index must specify its crawls’ order in a designated data structure. Second, the first webpage of each crawl must be selected from among the indices accompanying the previous two blocks in the blockchain. An additional crawl is only valid if the current crawl had exhausted its webpages. Third, in order for a webpage to be included in a crawl, the hashing of the miner’s IP address, the previous block’s digest, the webpage’s URL, and the URLs

2256−1 of all the webpages which led to it via crawl must be smaller than b 10 c. Simply put, only 10% of the webpages a miner encounters are valid for it to index, and their validity differs across miners and for different crawl paths. Necessarily, the eligible webpages are unknown to a miner ahead of time.

Each index produced by a miner contains the URLs crawled to produce it, the crawl path used, the miner’s IP address (which had determined the crawl path), and a hashtable mapping between every word encountered and its appearances in the different URLs.

Given the power-law distribution of the Web [58], a collective “Brownian crawling motion” will effectively cover the most prominent portions of the Web, assuming crawlers have 86 the ability to systematically avoid, on-the-fly, content-generating webpages, spam, and

“crawler traps” [64].

I note that the Web indexing which Webcoin incentivizes contains a computational aspect, namely, mapping between each word and its occurrences. However, I consider

Webcoin’s Web indexing to reside mostly in the network domain, rather than the com- putational, since additional computational resources would not improve a miner’s success rate, while additional networking resources would (as I explain in Section 5.4.3). Addi- tionally, while Web crawling requires 1–2 minutes to perform, the time required for the computational aspect is shorter by an order of magnitude.

5.3.1. Statistical Index Validation

Validating that the miner of a new block had indeed performed Web crawling and index- ing prior to its mining raises several main challenges. One challenge is overcoming the

Internet’s natural churn. Webpages must remain stable enough to allow their validation for at least several minutes after they have been indexed. Otherwise, an index might be found accurate by the first miners to validate it, and inaccurate by those following. A second challenge is to avoid placing a heavy load on the webpages included in the index, as a large number of clients, e.g., 100K miners, will attempt to contact them over a short period of time, a scenario known as a “flash event” [54]. A third challenge is to reduce the validation time to a minimum, so that even low-bandwidth miners will be able to perform the validation in a timely manner.

To overcome these challenges, Webcoin uses statistical index validation, which requires only the validation of a very small subset of webpages, yet can accurately predict the 87 index validity. To assess webpages churn, I have measured the change rate of over 100, 000 webpages, arbitrarily selected from among the webpages indexed by Webcoin miners in my experiments, detailed in Section 5.5. I define a change to be any change in the HTML body

(or in text for non-HTML webpages) which causes the Web indexing to yield a different result. Thus, changes of HTML tags, whitespaces, and metadata, are not considered to be a change in the webpage. I have found that only 1.97% of webpages change within 4 minutes, and of the remaining, only 0.231% change within the following 30 minutes, when excluding dates and timestamps fields, as well as regular expression string-matching. By allowing indices for inaccuracies in up to 3.5% of their webpages without deeming them invalid, not a single index had failed its validation in my experiments.

I thus relax the accuracy requirements, and require every miner to randomly sample 30

(out of 1000) webpages from the index, download them, and validate they were properly indexed. In the event that up to 1 webpage is inaccurate, the miner will deem the index valid. If more than 6 webpages are inaccurate, it is deemed invalid. If 2–6 inaccurate webpages are found, the process (of randomly sampling new 30 webpages) will be repeated, up to 4 times, after which it will be deemed invalid if 12 or more inaccurate webpages were encountered, and valid otherwise.

5.3.2. Statistical Validation Accuracy

The use of statistical validation to assess the accuracy of the index accompanying a newly- mined block, rather than requiring every miner to validate every webpage included in said index, reduces the load from both miners and servers. At the same time, it remains infeasible for a valid index to be rejected by the Webcoin network, nor for an invalid 88 block to be accepted. I compute that there is a 71.7% probability that a valid index will be labeled correctly in the first iteration, relieving most miners from doing additional iterations, and thus reducing both the time required and the load on webpages. The probability increases to 92% by the second iteration, and to 99.36% by the fourth. The resulting load for each webpage indexed, for a network of 100K miners, is only 6.9 requests per second, on average, and for a 1 Mbps miner, the validation requires 6.1 seconds, on average.

There is 0.0001% probability that a valid index will be rejected by a miner, and the erroneous miner would amend its mistake when the consecutive block is mined. For a network of 100K miners, the probability for an index to be wrongly rejected by even 0.2% of the network is approximately 10−11, and is expected to occur once every hundreds of thousands of mining years. The probability for an inaccurate index to be validated is very small; for an index which half of its webpages contain inaccuracies, it is infeasible to be accepted by the network, or even by 0.01% of the network. An almost accurate index, which 85% of its webpages are accurately indexed, has a probability of 95.8% to labeled invalid by a miner. This translates into a probability of 10−8 to be accepted by 5% of the network. Thus, Webcoin ensures that valid indices are accepted by the network, while invalid indices are rejected. Any forks induced at individual nodes are quickly resolved by the mining of the next block.

5.4. Webcoin’s Security

The transition of the cryptographic principles upon which Bitcoin is built into the networking domain is non-trivial. A major difference between the domains is that while 89 the first is governed by a priori mathematical and statistical principles, the latter depends on real-world state, and is thus susceptible to manipulation.

In earlier versions of Webcoin, where I have tested the usability of Web indices as

Proof-of-Work, I have found that the miners’ ability to affect the content of webpages opens the door to the manipulation of their indices, in an attempt to improve their mining success rate. Despite numerous counter-measurements tested, I have found it infeasible to guarantee that such manipulation does not take place, as the nodes’ level of control over arbitrary Web domains is unbounded. At the same time, miners may attempt to manipulate the blocks they produce, i.e., to add, remove and reorder a block’s transactions, to increase their mining success rate. To that end, miners might produce new public keys and transactions between their own wallets.

The manipulation of either webpages or blocks allows dishonest miners to move the task of mining Webcoins back to the computational domain, where they can explore the space of possible adjustments to webpages and transactions permutations. This space is larger by several orders of magnitude, and can be explored faster by several orders magnitude, than the space explored by a miner in the networking domain, i.e, the space of crawled webpages. A dishonest miner capable of such manipulations voids all of Webcoin’s security guarantees, as it is capable of producing blocks at a higher rate than all other nodes combined, and can thus alter the blockchain at will. Webcoin’s mining process thus must answer a novel challenge, consisting of six requirements:

• Webcoin’s mining must require the publication of valid Web indices.

• A block’s validity must not rely on said indices. 90

• A block’s validity must not rely on the transactions it contains, nor on any

element under its miner’s control.

• A block’s transactions cannot be selected or manipulated to affect its miner’s

future success rate.

• A block’s validity must rely on transactions of previous blocks, as per Bitcoin.

• Transactions of previous blocks, even if no block’s validity is yet to rely on them,

must not be susceptible to manipulation.

To address all six requirements stated above, Webcoin employs three unique principles.

First, it requires a valid Web index for a block to be mined, yet the success of the process is not determined by its content. Second, the ability of miners to produce new block relies on elements outside their control. Third, Webcoin employs a novel hashing mechanism which breaks the bidirectional relation between blocks; while it allows a block’s validity to rely on its predecessors, the miners of said predecessors are prevented from directly affecting the success in mining the blocks succeeding them.

5.4.1. Block Mining

Figure 5.2 depicts Webcoin’s block mining process, which is repeated every second. We- bcoin utilizes Bitcoin’s SHA-256 double-hashing to hash 4 elements. First, every block

Bi contains an additional field, which contains the digest yielded by hashing of the entire block preceding it by 32 blocks (Bi−32), denoted TXhash. Second, every block Bi contains an additional 1 bit field, denoted anybit, which its value is randomly selected by the miner, and is introduced to counter pre-computation of future blocks. Third, Webcoin utilizes the current time, in seconds, based on the Coordinated Universal Time (UTC) [6]. 91

Figure 5.2. The mining process of a new Webcoin block, performed every second.

Fourth, the success of each miner also depends on the hashing of its IP address, and it must prove its ability to communicate through it in order to validate its block. The mining of a new block Bi is as follows.

(1) The IP address of the miner and the current time, in seconds, are hashed with

the mining time of Bi−1 and with its miner’s IP address, both specified in Bi−1.

Denote the result IPhash.

(2) The resulting 256 bit digest is divided into 51 values of 5 bits, discarding the last

bit, and each of these values is used as a pointer to a block Bj in the range Bi−32

to Bi−1.

(3) For each block Bj specified by such a 5 bit value, the block’s TXhash, which holds

the digest yielded from the hashing of block Bj−32, is divided into two 118 bit

binary values, and the remaining 20 bits are discarded. 92

(4) Each such 118 bit binary value is translated to a permutation of the 32 blocks

in the range between Bi−32 and Bi−1. Using each permutation, the anybit fields

are extracted from the blocks according to their order, to create a single 32 bit

binary value per permutation. Hence, 102 32-bit values are created.

(5) Lastly, these 102 values are hashed according to their order, and the resulting

digest is compared to Webcoin’s target, which determines whether the miner can

produce a new block.

Webcoin’s hashing procedure serves multiple purposes. First, from the perspective of

Bi’s miner, its success depends exclusively on its IP address and the current time. It is thus unable to manipulate the block in order to affect its success rate. Second, from the perspective of every miner of a block

Bj ∈ Bi−32,...,Bi−1,

i.e, one of the 32 blocks preceding Bi, it must accurately include the hashing of block

Bj−32 in Bj’s TXhash field, otherwise Bj will be rejected by the network. Thus, Bj’s miner can only affect the mining of Bi through its selection of anybit. This effect is very limited, as it is bounded to two possible values, 0 and 1, without any guarantees that neither will better serve the miner.

In the edge case, a miner M of infinite computational power and perfect system knowledge, attempting to improve its likelihood to mine block Bj+1, is twice as likely to select the value to better serve it, in comparison to a coin toss. This does not mean the miner will be able to arbitrarily set itself as the miner of Bj+1, but, for example, that it will set anybit to 1 if it will shorten the period of time until M is eligible to mine Bj+1, 93 in comparison to setting it to 0. However, it is extremely likely that a different miner will become eligible prior to M, for both values, a fact M cannot change. Furthermore, M’s ability to determine the effect of its selection decreases exponentially for blocks beyond

Bj+1, as their mining depends on anybit values not yet determined, until the value of no anybit is known for the 32nd consecutive block. From the perspective of the miners of blocks

Bk ∈ Bi−64,...,Bi−33,

they hold the largest possibility space to affect the mining of Bi, as their hashing will define the permutations of anybit values to be used in the mining process. However, due to Webcoin’s novel mining scheme, this possibility space only affects the permutations of value to be used, prior to the selection of these values, i.e., they dictate permutations of anybit fields of blocks which have not been mined yet. By design, a block’s hashing can only affect the permutation of values of future blocks.

While the mining procedure does not allow manipulation to affect the mining of Bi, it is worth noting that the possibility space it provides through the 102 permutations of

32 bits values exceeds the possibility space of the hash function by more than a hundred orders of magnitude, and thus does not negatively affect the security guarantees. In addition, although a block Bi’s transactions do not affect its hashing by design, their hashing is included in Bi+32, and affects all consecutive blocks in the blockchain. The requirement for 32 blocks for transactions security is not disproportional, in comparison for the 100 to 120 blocks required before Bitcoin’s reward can be used.

In the event that a miner can indeed produce a new block, the miner will report its IP address in a designated field, the timestamp which had produced the new block, 94

the hashing of Bi−32 in the T Xhash field, the resulting digest, and a random bit in the anybit field. The miner will then include transactions in the block, and will publish the newly-mined block to the Webcoin network, along with its Web index. While the entire procedure can be done in advance, the block and index will be rejected by peer miners until the time used for the successful mining had been reached.

5.4.2. Block Validation

Upon receiving a new block, a miner will ensure that it is indeed valid. Blocks can be easily validated by repeating the above process, using the time and IP address specified in the new block. However, in order to assure that the block’s miner had indeed produced the block using an IP address under its control, the validating miner sends a single packet to the IP address over a designated port number (Webcoin’s current implementation supports one miner per NAT). Furthermore, in order to assure the transactions contained in the block had not been altered, the packet will include the hashing of the entire block, to which the block’s miner will reply with a single packet to confirm both that it indeed controls the IP address, and that the transactions had not been altered.

For a block to be accepted as valid, it must be accompanied by a valid index which its miner had reported in advance, as detailed in Section 5.3. Webcoin thus requires miners to hash their indices and publish only the resulting 256 bit digests and their IP addresses, in order for their blocks to be accepted in the following round. Miners are prevented from reporting a digest in hopes of creating an index to match it at a later time, as finding an input to produce a desired digest is computationally infeasible, an infeasibility upon which Bitcoin’s security model relies. 95

5.4.3. Resource Aggregation and Security

It is clearly possible for a miner to obtain more than a single IP address, whether legally from ISPs, or illegally, through the use of malicious bots [38]. It is also possible for numerous miners to form large mining pools, and for resource-rich entities, e.g., search engines, to participate in the mining of Webcoin. Here, I consider how these phenomenas affect Webcoin’s security model.

Similarly to Bitcoin, the reward of a Webcoin miner is proportional to the portion of resources it controls, and any miner to control over 50% of the resources voids all of

Webcoin security guarantees and controls the blockchain. However, unlike Bitcoin, the resources required for Webcoin mining are two-fold: an IP address, and a bandwidth of approximately 1 Mbps. The latter of the two makes Webcoin more resilient to resources aggregation than Bitcoin.

Any Webcoin miner wishing to utilize multiple IP addresses to increase its mining capacity will need to control enough bandwidth to support crawling the different paths required for each IP address, consuming approximately 1 Mbps per IP address. However, the bandwidth resource does not scale similarly to Bitcoin’s computational resources. For comparison, a Bitcoin miner using a home desktop can increase its capacity by 8 orders of magnitude, by purchasing a single Antminer S9 [104] for $2400, and these machines can be easily stacked. For a similar investment of capital, a Webcoin miner using a residential internet connection can only increase it capacity by 2 orders of magnitude, and its ability to scale it much further is uncertain.

While it is expected for resource-rich entities and large mining pools to participate in

Webcoin’s mining, as is the case with Bitcoin, they do not jeopardize Webcoin’s security 96 model any more than they jeopardize Bitcoin’s. All miners are constantly competing over the mining of the next block, and must invest ever-growing resources to remain competitive. The investment of considerable resources by such entities will be met by the investment of other miners, which aim to increase their own share of the reward, which prevents the formation of a monopoly [33].

5.4.4. Minimal Inter-Block Time

In my experiments, I have found that miners controlling more bandwidth require less time to crawl webpages and to produce a web indices, as to be expected. Since despite a new block is mined every 10 minutes on average, the exact inter-block time varies due to its statistical nature, the inter-block time between some blocks is very short. In such cases, since miners must produce a new Web index prior to the mining of the next block to be eligible to participate in the next round, limited-capacity miners often fail to produce a

Web index before the block is mined, and are thus excluded from participating in the following round. To mitigate this phenomena, I add an additional requirement to the quiescent mining competition among miners: a new block’s timestamp must exceed its predecessor by at least 180 sec. This requirements prevents a new block to be mined very shortly after its predecessor, which would exclude limited-capacity miners from partici- pation in the next round. I provide additional insights regarding this design choice in

Section 5.5. 97

Figure 5.3. CDF of the number of seconds between the mining of consecu- tive blocks. Vertical line represents target at 600 seconds.

5.5. Webcoin Evaluation

To evaluate Webcoin’s ability to crawl and index the Web, I deploy a network of

200 Webcoin miners on PlanetLab [25]. My implementation of a Webcoin miner is built on top of Pyminer, a python implementation of a Bitcoin miner [85]. I set miners to commence mining Webcoins using random transactions, using blocks with an average size of 700 KB. Miners utilize download bandwidths ranging between 0.5 and over 300 Mbps, with a median of 19.68 Mbps. Like in Bitcoin, each miner has 8 outbound connections, i.e., 8 peer-miners to which it propagates blocks and indices, which I select randomly from among all other nodes.

5.5.1. Crawling Feasibility

Here, I evaluate miners’ ability to crawl the Web and to produce indices of sufficient size in a timely manner, as to be useable in the context of block mining time scales. I utilize 4 index sizes, of 100, 500, 1000 and 2000 webpages per index, and I measure the performance of the miners over a period of one week for each of the index sizes. 98

The above index sizes were chosen with the desire to allow even miners of limited capacity, i.e., miners utilizing a bandwidth in the order of 1 Mbps, to participate, while still producing Google-scale indices. Indeed, if 100K miners, which is the estimated number of Bitcoin miners [78], were to participate using an index of 100 webpages, the resulting index will reach Google’s index size approximately within a month, and if an index of

1000 webpages were to be used, approximately within 3 days.

Figure 5.3 presents a CDF of the time intervals between the mining of new blocks for each of the index sizes I have tested. Note that identically to Bitcoin, Webcoin adjusts its difficulty target in an attempt to reach an average of 10 minutes, i.e., 600 seconds. It is evident that the index sizes I have tested allow for mining in the desired time scales.

The average time intervals between consecutive blocks are 582.75, 608.4, 608.2 and 610.1 seconds, for indices of 100, 500, 1000 and 2000 webpages, respectively. Webcoin thus had missed its target of 600 seconds by up to 17.25 seconds. For comparison, Bitcoin which its large miners community makes its mining time more statistically reliable, regularly misses its target of 600 seconds by similar and larger scales.

Another phenomena which can be observed is the similar distribution of mining times experienced by smaller index sizes, in contrast to that of indices consisting of 2000 web- pages. As the larger indices require longer times to produce, mining time as a whole lengthen. To compensate for this lengthening, Webcoin increases its target value, i.e., making mining easier by relaxing the validity requirements, in order to maintain an aver- age mining time of 600 seconds. The combined effect is of a denser distribution function, as the same number of blocks are mined during shorter time interval. 99

Figure 5.4. The partitioning of miners to quintiles (20% units) according to their download bandwidth, and the percentage of blocks mined by nodes in each quintile.

Figure 5.4 presents the distribution of mined blocks over all miners, which are parti- tioned to quintiles (20% units) based on their bandwidth, using a time span of 180 seconds in which block mining is prohibited. The distribution of mined blocks of size 1000 (blue) is almost equally distributed, ranging between 19% of blocks mined by 2nd quintile min- ers, and a maximum of 21% is mined by 5th quintile miners. Similar results are achieved when using smaller index sizes of 100 and of 500 webpages. In contrast, when using an index size of 2000 webpages (red), the 1st quintile miners mine only 11.2% of the blocks, while the miners of the other quintiles mine 23.1%, 22.2% 21.6% and 21.9% of blocks, respectively.

The low success-rate of mining the 1st quintile miners when using an index size of 2000 webpages is explained by the longer time they require to produce such indices, which often causes them to be ineligible to mine the following block. Figure 5.5 shows the CDF of the crawl times for the 200 miners as a function of the number of webpages crawled.

Necessarily, as the number of pages increases, so does the time required for nodes to 100

Figure 5.5. CDF of the number of seconds required for miners to crawl webpages for indexing, per index size. download them. In the case of an index size of 2000 webpages, I observe that the low- bandwidth miners can spend between 250 and 400 seconds in crawling. This corresponds to the tail shown in the figure for index size of 2000. Hence, such timescales overlap with inter-block timescales shown previously in Figure 5.3. Whenever the crawling time exceeds the inter-block times, a miner fails to qualify for mining a block in the following round. This explains the reduced performance of low-bandwidth miners for the index size of 2000 in Figure 5.4. I thus conclude that an index of size 2000 is too large to allow for fair mining. I further conclude that miner equality and performance can be balanced through the use of an index size of 1000, combined with a time span of 180 seconds in which mining is invalid.

5.5.2. Network Usage

Figure 5.6 presents the networking usage of a single Webcoin miner during the mining of two consecutive blocks, using an index size of 1000 webpages. The miner’s bandwidth, as measured periodically, is approximately 96 Mbps in the download direction and 4.1 Mbps in the upload direction. The figure presents both its download (red) and upload (blue) 101

Figure 5.6. Network usage of a Webcoin miner deployed on a PlanetLab node, during the mining of two consecutive blocks utilizing an index size of 1000 webpages. Dashed lines mark the periods in which mining is prohibited (between 1 and 7). Numbers mark the events of the mining process: (1) block is mined by another miner, (2) block and index download from peers, (3) block and index propagation on the upload direction, index validation on the download direction, (4) Web crawling and indexing, (5) block and index propagation completion, (6) Web crawling completion, (7) block mining begins. bandwidth utilization, as well as the events occurring during the mining process. The networking usage plot provides several insights regarding the dynamics of Webcoin mining, and the resources it consumes. First, I note the seven events which consists the mining of a single block, which are divided into 3 phases, with very different characteristics.

The first phase starts with the mining of a new block by a peer miner (denoted Line1), which indicates the beginning of the 180 seconds time span in which blocks cannot be mined. The block’s mining is closely followed by the arrival of the block and its index

(denoted Line2), and upon the block’s successful validation, the propagation of both to the miner’s peers, using its upload bandwidth (denoted Line3). At the same time, the miner utilizes its download bandwidth to validate the index he had received, its completion 102 denoted Line4. These events, which make up the first phase of the mining, occur in a rapid succession, and are completed within 12 seconds from the mining of the new block.

Note that the bandwidth requirements for this phase, including the statistical validation of the new index, are very limited, under 5 Mbps.

In the second phase, with the successful validation of the index (Line4), the miner begins its Web crawling and indexing on the download direction, while in parallel it con- tinues to propagate the block and the index it had received to its peers. Said propagation is completed at Line5, and the Web crawling and indexing at Line6, marking the end of the second phase. The second phase is considerably longer than the first, and it is there where the miner invests its networking resources to produce a new Web index. However, even in this more network-intensive phase the miner utilizes only a very small portion of its bandwidth. Indeed, despite a download bandwidth of almost 100 Mbps, the miner surpasses the use of 10 Mbps only thrice during the mining of the first block presented, and similar characteristics can be observed during the mining of the consecutive block.

This pattern allows for miners of low capacity to compare on even footing with miners of much greater capacity.

In the third phase, upon completing the Web crawl (Line6), the miner propagates the hashing of the index to its peers, and then awaits for the mining to resume with the passing f 180 seconds from the mining of the previous block, denoted Line7. Following this, the miner will quiescently await until such time that a new block is mined, which occurs at 563 seconds, denoted Line1. This phase requires very little resources, as it consists mostly of idle waiting for a new block to be mined. 103

It is essential to understand that the index hash, published by the miner at Line6, at time 152 seconds, makes the miner eligible to compete in the next round, the one for which mining starts at Line7 at 743 seconds. A miner thus has a sufficient time to crawl the Web and propagate its index hash. Observing the network usage as a whole, it can be seen that the networking resources needed for participation are meager.

5.5.3. Mining at Large Scales

I consider the scale’s effect on Webcoin’s ability to produce a new block every 10 minutes, for a Bitcoin-scale network of 100K miners, and the ability of a miner with a capacity of

1 Mbps bandwidth (both upload and download) to participate on equal grounds, despite increased volume of traffic.

Upon the mining of a new block, said block is propagated to all miners, followed by its index. According to Bitcoin measurements in [13, 15], the average time required for a block to reach the majority of nodes, over a 9 week period, was 5.9 seconds, and the average block size had been 700.7 KB. As Webcoin’s blocks are almost identical to

Bitcoin’s, and are of the same size, their propagation across a large network of 100K miners is expected to be identical. As Webcoin’s blocks are followed by their indices, which are approximately a half of a block’s size on average, yet may reach as much as twice the blocks’ size, I estimate their combined propagation to the majority of nodes to be less than 20 seconds, on average.

During the propagation, every miner to receive the block queries the block’s miner, to verify it indeed controls the IP address used for mining. In the event that the block’s miner has a bandwidth of 1 Mbps, it is likely to become a bottleneck. Each such request 104 contains 16 Bytes of the block’s digest, yet together with TCP/IP headers the load totals to 2.67 MB for the majority of requests. A low capacity miner will require approximately

21.3 seconds to respond to them. However, as such requests start to arrive to the miner almost immediately after the new block’s mining, this time is incurred in parallel to the propagation time, and the completion of both is expected within approximately 25 seconds from the block’s publishing time.

Following the block validation and the propagation of both block and index, each miner statistically validates the index, which for a low capacity miner will require 6.1 seconds, on average. Thus, a new block and its Web index can be propagated and validated by the majority of the system, within approximately 30 seconds from the new block’s mining, for a network of 100K miners.

The last factor which might affect scalability is the propagation of all the indices’ digests, sent for every Web index produced by the system. Whereas the digests are only

32 Bytes long, they incur an overhead which doubles their size when sent individually over TCP. They are thus aggregated by miners prior to their transmission. For a network of 100K miners, each miner must download 3.05 MB of hashes during a time span of

10 minutes, on average, which consumes only 0.04 Mbps of the miner’s bandwidth. I conclude that, in addition to empirical evidence, the analysis of the resources required to mine Webcoins in a network of 100K miners shows that even limited-capacity miners can participate on equal grounds in Webcoin’s mining. 105

5.5.4. Collectors’ Properties.

For a collector to be capable of collecting all the Web indices produced by a network of

100K miners, it must be capable of downloading them in the timespan required to mine a new block, i.e., 10 minutes on average. This is because miners have an incentive to publish their index only as long as there is a chance they will need to propagate it, in the event of successful mining in the next round. While miners must download 20.45 MB of data, on average, to create an index of 1000 webpages, the compressed index itself is approximately 2 orders of magnitude smaller. Thus, for a collector to collect all the indices produced by a network of 100K miners, it must download 20 GB in the span of 10 minutes, i.e., it requires only a consumer-grade bandwidth of 266 Mbps.

Collectors may use the indices they collect from miners for any purpose, yet in the context of Webcoin’s effort to democratize Web search, it is worth noting that the indices they collect must be merged and aggregated to enable a competitive real-world web search.

Identifying the best methods to construct a large-scale index from these indices is outside the scope of this work.

5.6. Related Work

Guided Tour Puzzle [1] is a Proof-of-Work in the networking domain, aiming to filter the originators of spam emails from legitimate senders, a task for which traditional Proof- of-Work had been proved inadequate [63]. Rather than requiring computational power, the puzzle requires a node to sequentially retrieve information from multiple servers in or- der to successfully create its Proof-of-Work. While successfully reducing the performance 106 variance among nodes, the Guided Tour Puzzle (i) is centrally managed, (ii) utilizes ded- icated servers, (iii) does not support any additional purpose such as Web search, and (iv) does not support any distributed consensus similar to crypto-currencies.

The authors of [47] suggest incorporating Internet transactions, e.g., BGP advertise- ments, in a blockchain to distribute the Internet management, and a similar suggestion to utilize a blockchain to store DNS entries is suggested in [2]. Yet, they do not change any of the Bitcoin mechanisms, but simply use Bitcoin for a different type of transactions.

Webcoin fundamentally alters Bitcoin mechanisms by utilizing networking resources.

The authors of [107] suggest DDoSCoin, a blockchain which uses a malicious Proof- of-Work, where miners are rewarded for participating in a DDoS attack on websites using

Transport Layer Security (TLS), and must present a signed response from the server which is smaller than DDoSCoin’s target. While this work provides the incentive to perform a malicious network task, the DDoSCoin’s Proof-of-Work is computational in nature, generated by the targeted website. Webcoin, in contrast, can be used for constructive purposes, and its Proof-of-Work does not require an objective computational component to fairly reward miners for the resources they have invested.

Several “alt-coins” aimed to become ASIC-resistant. First among these solutions, is the use of Scrypt [81] as a hashing function, rather than Bitcoin’s SHA-256 [43]. Scrypt requires frequent memory access, instead of relying solely on computational power [34].

Unfortunately, the first generation of scrypt-mining ASIC machines have been introduced by multiple vendors during 2014 [14, 91]. Webcoin utilizes a novel block mining algorithm which enables miners with moderate network capacities to participate. 107

5.7. Discussion

Practical Challenges in Crawling and Indexing. While Webcoin is the first primitive designed to crowdsource complex networking tasks, and the first digital currency to incentivize open Web indexing, it must surmount several significant challenges in order to disrupt the Web search market.

First, Modern webpages are complex; they contain dependancies among their com- ponents, and consists of dynamic content, images, video clips, and code scripts. The challenge in crawling and indexing them in a consistent manner across all miners is sig- nificant, and it must be resolved for Webcoin to enable new high-quality search engines.

Second, Webcoin must overcome crawling permissions conflicts. Webpage administrators aim to allow search engines to access to their content, while denying access from others who might take advantage of their content, e.g., making their content freely available and harm their revenues. While known search engines’ crawlers can be identified based on their IP addresses, the same cannot be easily done for Webcoin. For Webcoin to gain traction, it should incentivize webpage administrators to allow miners to crawl their web- pages, possibly by improving webpage visibility and by providing proof of past Webcoin blocks they produced.

Another real-world challenge is to prevent firewalls from interfering with Webcoin’s operation, as is often the case with peer-to-peer communication patterns. This matter would have to be addressed in order for Webcoin to gain traction. The fashion in which webpages to be crawled are selected can also be improved upon, as more popular webpages should be crawled and indexed more frequently than less popular ones. It’s likely possible to improve the indices Webcoin produces by utilizing Webcoin’s blockchain to reach a 108 consensus regarding the popularity of domains and webpages, which would affect the frequency at which they are crawled.

IP Addresses. Webcoin uses IP addresses to direct miners on different crawl paths.

While miners can use multiple IP addresses, and they may have some level of control over their IP address, each address would direct the miner to crawl a different path, and a miner would require a bandwidth of roughly 1 Mbps per IP address, which does not easily aggregate by the same orders of magnitude. Thus, the use of IP addresses does not jeopardize Webcoin’s security. However, I believe Webcoin would benefit from avoiding any reference to miners’ IP addresses in the mining process altogether, as they make miners susceptible to DDoS, spoofing, and BGP hijacking attacks. In addition, the usage of IP addresses allows for only a single miner behind each NAT, which is a significant limitation. I am actively exploring changes to the mining process which would remove this dependency on miners’ IP addresses.

Additional Networking Tasks. Webcoin’s goal is to incentivize miners to perform

Web crawling and indexing to democratize the Web search market. As such, its attributes were chosen to allow even miners with limited capacity to perform these tasks, and to minimize the load on miners and servers. However, Bitcoin’s incentive mechanism can be similarly utilized for a wide range of networking tasks. For example, miners can be incentivized to measure latencies and routing paths towards a changing subset of miners, where each miner requests for signed proofs from the miners he’s targeting, and the

“winner” miner must present these proofs in order for its block to be considered valid.

Combining traceroute and reverse-traceroute, and utilizing a concept similar to collectors 109 for entities interested in such measurements, e.g., CDNs, VPNs, and ISPs, can provide additional accuracy and reliability.

Diverging from Bitcoin’s Proof-of-Work. While I have made every attempt to keep Webcoin as close to Bitcoin as possible, the need to prevent Webcoin’s Proof-of-

Work from affecting a miner’s success rate had forced me to make adjustments. While some adjustments, e.g., miners’ quiescent mining competition, have equivalences in the

Proof-of-Stake domain, others, such as determining the “winner” miner based on arbi- trary attributes of multiple past blocks, are unique to Webcoin. Most notably, while the correctness of Bitcoin’s Proof-of-Work is not time-sensitive, the correctness of Webcoin’s

Proof-of-Work is temporal due to the Web’s ever-changing nature. As such, Webcoin’s consensus requires witnessing blocks as they come, which is a substantial divergence from

Bitcoin’s design. It is worth noting that Webcoin still requires the majority of mining power to alter the blockchain, however it appears that introducing “check-points” as in

PoS, where the state of the blockchain is finalized periodically, will improve Webcoin’s security and robustness.

Additional Network Load. As Webcoin miners crawl the Web and report their indices to collectors, they increase the load on the servers hosting said webpages, and on the networks which connect them. This load is similar to the load of every other search engine’s crawler, centralized or decentralized, and I thus consider such load reasonable, similarly to the legitimacy of an additional search engine. I acknowledge the need for additional real-world measurements at a larger scale to assess how the size of the Webcoin network affects the load on hosting servers, and possibly to optimize the parameters of the index statistical validation accordingly. 110

Economical Incentives. Webcoin’s economic model follows the footsteps of other single-purpose blockchain-based ecosystems; the value of Webcoins is derived both from its usage and from its exchange rate [107, 39, 77, 98]. As the importance of online privacy and the drawbacks of ad-based revenue models become apparent in mainstream media [99,

105], the use of Webcoins can appeal to many who value their privacy. Incorporating “hot wallets” in Web browsers, similarly to Google’s recent patent [62], can seamlessly make micro-payments to online services in Webcoins, which in turn increases their demand, increases their price, and incentivizes Webcoin miners. Users can thus attain Webcoins through mining or exchange, and search engines and similar online services can offer their services either under the existing ad-based model, or as seamless “almost free” service.

More radical and complex models, where online services can sign long-term contract with miners to buy Webcoins at a discount, and sell these Webcoin to users, allowing both miners and services to profit, or forgoing user payments altogether and require users to provide a cryptographic proof of holding some amount of Webcoins instead, which similarly increases Webcoin’s demand, are also possible. While I have designed Webcoin as a system in and by itself, it might be expanded to other networking tasks to create a more robust ecosystem of online services.

Real-World Implications. I conclude by emphasizing that Webcoin does not dis- rupt the Web search industry by itself. Rather, it provides the key prerequisite for such disruption to occur: an open access to a high-quality global Web index for all to study, experiment, and use. 111

CHAPTER 6

Conclusion

Blockchains are a novel and disruptive P2P technology, yet they had thus far failed to gain significant real-world traction in comparison to their centralized counterparts. Their lack of success is mostly due to their inherent real-world constraints, and most notably, the blockchain ”scalability problem”. I had defined the blockchain Network Layer, and demonstrated how it is the root-cause of the scalability problem. I further showed that the blockchain Network Layer, and the utilization of blockchains’ networking resources, hold the key to some of the most pressing challenges blockchains face, including scalability, mining centralization, wastefulness and utilization.

First, I provided an overview of blockchains, their operation, and their different par- ticipants. I outlined the four layers in the blockchain stack and defined was the first to define the blockchain Network Layer, and the functionalities of which it consists.

Next, I presented an analysis of Bitcoin’s scalability, and demonstrated how scaling a blockchain affects its security, usability, and level of decentralization, due to the linear effect larger blocks and shorter inter-block time intervals have on block propagation. I further showed how the root-cause of the blockchain scalability problem is rooted in the blockchain Network Layer, and the need for continuous swift propagation of data to all blockchain participants.

I then presented a new networking primitive to resolve the bottleneck at the blockchain

Network Layer, the Blockchain Distribution Network (BDN), and detailed the operation 112 of bloXroute, the first BDN. I described the methods bloXroute use to quickly dissemi- nate information, enabling blockchains to process orders of magnitude larger blocks, and the means used to prevent bloXroute from censorship or discrimination. I evaluated the performance of bloXroute using empirical data from its real-world deployment, and ex- plained how bloXroute provides for the first time an incentive for miners not to coalesce into mining pools.

Lastly, I presented how blockchains can utilize their networking resources to enable new functionalities, disrupt new industries, and mitigate their wastefulness. I described the design, implementation, and evaluation of Webcoin, the first blockchain to utilize networking resources, rather than computational resources, which is crowdsources the production of a Google-scale Web index to its miners. I emphasized the importance of

Webcoin in “democratizing” the search engine industry, and its implications on informa- tion availability on a global scale.

In the decade which had passed since Bitcoin’s inception and the invention of the blockchain, significant efforts had been invested in developing cryptographic solutions at the Consensus Layer, in order to advance blockchain technology. During this time, the blockchain Network Layer was often overlooked and largely ignored. However, as I demonstrated, the underlying cause for some of the most significant and pressing chal- lenges blockchains face today is rooted at the blockchain Network Layer, and it is the blockchain Network Layer and the networking resources utilized by blockchains which hold the key to overcome these challenges. 113

References

[1] Abliz, M., and Znati, T. A Guided Tour Puzzle for Denial of Service Prevention.

In Computer Security Applications Conference, 2009. ACSAC’09. Annual (2009),

IEEE, pp. 279–288.

[2] Ali, M., Nelson, J. C., Shea, R., and Freedman, M. J. Blockstack: A global

naming and storage system secured by blockchains. In USENIX Annual Technical

Conference (2016), pp. 181–194.

[3] Anh, V. N., and Moffat, A. Inverted Index Compression using Word-Aligned

Binary Codes. Information Retrieval 8, 1 (2005), 151–166.

[4] Antonopoulos, A. M. Mastering Bitcoin: unlocking digital cryptocurrencies.”

O’Reilly Media, Inc.”, 2014.

[5] Antonopoulos, A. M., and Wood, G. Mastering Ethereum: Building Smart

Contracts and Dapps. O’Reilly Media, 2018.

[6] Arias, E., and Guinot, B. Coordinated Universal Time UTC: Historical Back-

ground and Perspectives. In Journees systemes de reference spatio-temporels (2004). 114

[7] Armknecht, F., Karame, G. O., Mandal, A., Youssef, F., and Zen-

ner, E. Ripple: Overview and outlook. In International Conference on Trust and

Trustworthy Computing (2015), Springer, pp. 163–180.

[8] Bach, L., Mihaljevic, B., and Zagar, M. Comparative analysis of blockchain

consensus algorithms. In 2018 41st International Convention on Information and

Communication Technology, Electronics and Microelectronics (MIPRO) (2018),

IEEE, pp. 1545–1550.

[9] Back, A., et al. Hashcash-a denial of service counter-measure, 2002.

[10] Barber, S., Boyen, X., Shi, E., and Uzun, E. Bitter to Better – How to Make

Bitcoin a Better Currency. In International Conference on Financial Cryptography

and Data Security (2012), Springer, pp. 399–414.

[11] Bender, M., Michel, S., Triantafillou, P., Weikum, G., and Zimmer,

C. Minerva: Collaborative P2P Search. In Proceedings of the 31st international

conference on Very large data bases (2005), VLDB Endowment, pp. 1263–1266.

[12] Bitcion Charts and Graphs - Blockchain, 2017. www.blockchain.info/charts.

[13] Bitcoinity.org, 2018. http://www.data.bitcoinity.org.

[14] ASICS for Litecoin. Here They Come. https://bitcoinmagazine.com/articles/

asics-litecoin-come-1394826069/.

[15] Bitcoinstats, 2018. http://www.bitcoinstats.com. 115

[16] BitcoinStats, 2018. www.bitcoinstats.com/network/propagation/.

[17] Global Bitcoin Nodes Distribution - Bitnodes, 2017. www.bitnodes.21.co.

[18] Boettiger, C. An introduction to docker for reproducible research. ACM SIGOPS

Operating Systems Review 49, 1 (2015), 71–79.

[19] Boldi, P., Codenotti, B., Santini, M., and Vigna, S. Ubicrawler: A Scal-

able Fully Distributed Web Crawler. Software: Practice and Experience 34, 8 (2004),

711–726.

[20] Bosch, A., Bogers, T., and Kunder, M. Estimating Search Engine Index Size

Variability: a 9-Year Longitudinal Study. Scientometrics 107, 2 (2016), 839–856.

[21] Burchert, C., Decker, C., and Wattenhofer, R. Scalable funding of bitcoin

micropayment channel networks. Royal Society open science 5, 8 (2018), 180089.

[22] Cachin, C., and Vukolic,´ M. Blockchains consensus protocols in the wild. arXiv

preprint arXiv:1707.01873 (2017).

[23] Cashaccount, 2018. www.cashaccount.info.

[24] Chauhan, A., Malviya, O. P., Verma, M., and Mor, T. S. Blockchain and

scalability. In 2018 IEEE International Conference on Software Quality, Reliability

and Security Companion (QRS-C) (2018), IEEE, pp. 122–128. 116

[25] Chun, B., Culler, D., Roscoe, T., Bavier, A., Peterson, L., Wawrzo-

niak, M., and Bowman, M. Planetlab: an Overlay Testbed for Broad-Coverage

Services. ACM SIGCOMM Computer Communication Review 33, 3 (2003), 3–12.

[26] Bitcoin Will Need to Scale to Levels Much Higher Than Visa, Mastercard, and

PayPal Combined, 2017. www.coinjournal.net/bitcoin-will-need-to-scale-to-levels-

much-higher-than-visa-mastercard-and-paypal-combined/.

[27] Compass. www.compass-project.org/.

[28] Croman, K., Decker, C., Eyal, I., Gencer, A. E., Juels, A., Kosba, A.,

Miller, A., Saxena, P., Shi, E., Sirer, E. G., Song, D., and Watten-

hofer, R. On scaling decentralized blockchains. In International Conference on

Financial Cryptography and Data Security (2016), Springer, pp. 106–125.

[29] Dai, W. b-money, 1998. www.weidai.com/bmoney.txt.

[30] De Kunder, M. The Size of the World Wide Web. WorldWideWebSize (2012).

[31] Decker, C., and Wattenhofer, R. Information propagation in the bitcoin

network. In Peer-to-Peer Computing (P2P), 2013 IEEE Thirteenth International

Conference on (2013), IEEE, pp. 1–10.

[32] Bitcoin Energy Consumption Index - Digiconomist. www.digiconomist.net/

bitcoin-energy-consumption.

[33] Dimitri, N. Bitcoin mining as a contest. Ledger 2 (2017), 31–37. 117

[34] Dziembowski, S. Introduction to Cryptocurrencies. In Proceedings of the 22nd

ACM SIGSAC Conference on Computer and Communications Security (2015),

ACM, pp. 1700–1701.

[35] Eyal, I., Gencer, A. E., Sirer, E. G., and Van Renesse, R. Bitcoin-ng: A

scalable blockchain protocol. In NSDI (2016), pp. 45–59.

[36] Eyal, I., and Sirer, E. G. Majority is Not Enough: Bitcoin Mining is Vulnerable.

In International Conference on Financial Cryptography and Data Security (2014),

Springer, pp. 436–454.

[37] Faroo Search Engine. http://www.faroo.com/index.en.html.

[38] Feily, M., Shahrestani, A., and Ramadass, S. A Survey of Botnet and Bot-

net Detection. In Emerging Security Information, Systems and Technologies, 2009.

SECURWARE’09. Third International Conference on (2009), IEEE, pp. 268–273.

[39] Freund, A., and Stanko, D. The wolf and the caribou: Coexistence of decen-

tralized economies and competitive markets. Journal of Risk and Financial Man-

agement 11, 2 (2018), 26.

[40] Gazi, P., Kiayias, A., and Zindros, D. Proof-of-stake sidechains. In IEEE

Symposium on Security & Privacy (2019).

[41] Gencer, A. E., Basu, S., Eyal, I., van Renesse, R., and Sirer, E. G.

Decentralization in bitcoin and ethereum networks. arXiv preprint arXiv:1801.03998

(2018). 118

[42] Gilad, Y., Hemo, R., Micali, S., Vlachos, G., and Zeldovich, N. Algo-

rand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th

Symposium on Operating Systems Principles (2017), ACM, pp. 51–68.

[43] Gilbert, H., and Handschuh, H. Security Analysis of SHA-256 and Sisters.

In International Workshop on Selected Areas in Cryptography (2003), Springer,

pp. 175–193.

[44] Gormley, C., and Tong, Z. Elasticsearch: The Definitive Guide. ” O’Reilly

Media, Inc.”, 2015.

[45] Grainger, T., Potter, T., and Seeley, Y. Solr in Action. Manning, 2014.

[46] Hammi, M. T., Hammi, B., Bellot, P., and Serhrouchni, A. Bubbles of

trust: A decentralized blockchain-based authentication system for iot. Computers

& Security 78 (2018), 126–142.

[47] Hari, A., and Lakshman, T. The Internet Blockchain: A Distributed, Tamper-

Resistant Transaction Framework for the Internet. In Proceedings of the 15th ACM

Workshop on Hot Topics in Networks (2016), ACM, pp. 204–210.

[48] Honest.cash, 2018. www.honest.cash.

[49] Hopwood, D., Bowe, S., Hornby, T., and Wilcox, N. Zcash protocol spec-

ification. Tech. rep., Technical report, 2016–1.10. Zerocoin Electric Coin Company,

2016. 119

[50] Hughes, E. A cypherpunk’s manifesto. URL (accessed 3 August 2004):

http://www. activism. net/cypherpunk/manifesto. html (1993).

[51] Isele, R., Umbrich, J., Bizer, C., and Harth, A. LDspider: An Open-

Source Crawling Framework for the Web of Linked Data. In Proceedings of the 2010

International Conference on Posters & Demonstrations Track-Volume 658 (2010),

CEUR-WS. org, pp. 29–32.

[52] Jagatic, T. N., Johnson, N. A., Jakobsson, M., and Menczer, F. Social

Phishing. Communications of the ACM 50, 10 (2007), 94–100.

[53] Johnson, B., Laszka, A., Grossklags, J., Vasek, M., and Moore, T.

Game-theoretic analysis of ddos attacks against bitcoin mining pools. In Interna-

tional Conference on Financial Cryptography and Data Security (2014), Springer,

pp. 72–86.

[54] Jung, J., Krishnamurthy, B., and Rabinovich, M. Flash Crowds and Denial

of Service Attacks: Characterization and Implications for CDNs and Web Sites. In

Proceedings of the 11th international conference on World Wide Web (2002), ACM,

pp. 293–304.

[55] Kermani, P., and Kleinrock, L. Virtual cut-through: A new computer com-

munication switching technique. Computer Networks (1976) 3, 4 (1979), 267–286. 120

[56] Kiayias, A., Livshits, B., Mosteiro, A. M., and Litos, O. S. T. A

puff of steem: Security analysis of decentralized content curation. arXiv preprint

arXiv:1810.01719 (2018).

[57] King, S., and Nadal, S. Ppcoin: Peer-to-Peer Crypto-Currency with Proof-of-

Stake. self-published paper, August 19 (2012). www.wallet.peercoin.net/assets/

paper/peercoin-paper.pdf.

[58] Kong, J. S., Sarshar, N., and Roychowdhury, V. P. Experience versus

talent shapes the structure of the web. Proceedings of the National Academy of

Sciences 105, 37 (2008), 13724–13729.

[59] Kosba, A., Miller, A., Shi, E., Wen, Z., and Papamanthou, C. Hawk:

The blockchain model of cryptography and privacy-preserving smart contracts. In

2016 IEEE symposium on security and privacy (SP) (2016), IEEE, pp. 839–858.

[60] Kumar, A., Fischer, C., Tople, S., and Saxena, P. A traceability analysis

of monero’s blockchain. In European Symposium on Research in Computer Security

(2017), Springer, pp. 153–173.

[61] Kwak, H., Lee, C., Park, H., and Moon, S. What is twitter, a social network

or a news media? In Proceedings of the 19th international conference on World wide

web (2010), AcM, pp. 591–600.

[62] Langschaedel, J., Armstrong, B. D., and Ehrsam, F. E. Hot wallet for

holding bitcoin, Sept. 17 2015. US Patent App. 14/660,418. 121

[63] Laurie, B., and Clayton, R. “Proof-of-Work” Proves Not to Work; version 0.2.

In Workshop on Economics and Information, Security (2004).

[64] Lee, H.-T., Leonard, D., Wang, X., and Loguinov, D. IRLbot: Scaling to

6 Billion Pages and Beyond. ACM Transactions on the Web (TWEB) 3, 3 (2009),

8.

[65] Leela, K., and Haritsa, J. SphinX: Schema-Conscious XML Indexing, 2001.

http://dsl.cds.iisc.ac.in/pub/TR/TR-2001-04.pdf.

[66] Lind, J., Eyal, I., Pietzuch, P., and Sirer, E. G. Teechan: Pay-

ment channels using trusted execution environments, 2017. URL https://arxiv.

org/pdf/1612.07766. pdf.[Online (2017).

[67] Majestic 12 Search Engine. https://www.majestic12.co.uk/about.php.

[68] Maker DAO, 2018. www.makerdao.com.

[69] Marshall, A. W., and Olkin, I. A multivariate exponential distribution. Jour-

nal of the American Statistical Association 62, 317 (1967), 30–44.

[70] McCandless, M., Hatcher, E., and Gospodnetic, O. Lucene in Action:

Covers Apache Lucene 3.0. Manning Publications Co., 2010.

[71] Menegay, P., Salyers, J., and College, G. Secure communications us-

ing blockchain technology. In MILCOM 2018-2018 IEEE Military Communications

Conference (MILCOM) (2018), IEEE, pp. 599–604. 122

[72] Miller, A., Juels, A., Shi, E., Parno, B., and Katz, J. Permacoin: Repur-

posing Bitcoin Work for Data Preservation. In 2014 IEEE Symposium on Security

and Privacy (2014), IEEE, pp. 475–490.

[73] Miller, A., Litton, J., Pachulski, A., Gupta, N., Levin, D., Spring, N.,

and Bhattacharjee, B. Discovering bitcoin’s public topology and influential

nodes. et al. (2015).

[74] Morgan Stanley Research: Global Big Debates - 2018, 2018. https:

//pwm.morganstanley.com/weilmangrasman/mediahandler/media/118821/

GLOBAL_20180119_0000.pdf.

[75] Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.

www.bitcoin.org.

[76] Operating system market share, 2017. http://www.netmarketshare.com/

operating-systemmarket-share/.

[77] of Money Research Collaborative:, F., Nelms, T. C., Maurer, B.,

Swartz, L., and Mainwaring, S. Social payments: Innovation, trust, bitcoin,

and the sharing economy. Theory, Culture & Society 35, 3 (2018), 13–33.

[78] Neighborhood Pool Watch, 2018. http://www.organofcorti.blogspot.mx/.

[79] Park, S., Pietrzak, K., Alwen, J., Fuchsbauer, G., and Gazi, P. Space-

coin: A Cryptocurrency Based on Proofs of Space. Tech. rep., IACR Cryptology

ePrint Archive, 2015: 528, 2015. 123

[80] Parreira, J. X., Donato, D., Michel, S., and Weikum, G. Efficient and

Decentralized Pagerank Approximation in a Peer-to-Peer Web Search Network. In

Proceedings of the 32nd international conference on Very large data bases (2006),

VLDB Endowment, pp. 415–426.

[81] Percival, C. Stronger Key Derivation via Sequential Memory-Hard Functions.

Self-published (2009), 1–16. www.bsdcan.org/2009/schedule/attachments/87_

scrypt.pdf.

[82] Peterson, J., and Krug, J. Augur: a decentralized, open-source platform for

prediction markets. arXiv preprint arXiv:1501.01042 (2015).

[83] Poon, J., and Buterin, V. Plasma: Scalable autonomous smart contracts. White

paper (2017).

[84] Pujol, J., and Rodriguez, P. Porqpine: a Distributed Social Search Engine. In

In Proc. of WWW’09 (Madrid, Spain, Apr. 2009).

[85] Pyminer. http://www.github.com/jgarzik/pyminer.

[86] Qiu, D., and Srikant, R. Modeling and performance analysis of bittorrent-like

peer-to-peer networks. In ACM SIGCOMM computer communication review (2004),

vol. 34, ACM, pp. 367–378.

[87] Recabarren, R., and Carbunar, B. Hardening stratum, the bitcoin pool min-

ing protocol. Proceedings on Privacy Enhancing Technologies 2017, 3 (2017), 57–74. 124

[88] Rosenfeld, M. Analysis of bitcoin pooled mining reward systems. arXiv preprint

arXiv:1112.4980 (2011).

[89] Sankaralingam, K., Sethumadhavan, S., and Browne, J. C. Distributed

Pagerank for P2P Systems. In High Performance Distributed Computing, 2003.

Proceedings. 12th IEEE International Symposium on (2003), IEEE, pp. 58–68.

[90] Scrapy. http://www.scrapy.org.

[91] Serapiglia, A., Serapiglia, C. P., and McIntyre, J. Crypto Currencies:

Core Information Technology and Information System Fundamentals Enabling Cur-

rency without Borders. Information Systems Education Journal 13, 3 (2015), 43.

[92] Sompolinsky, Y., Lewenberg, Y., and Zohar, A. Spectre: A fast and scal-

able cryptocurrency protocol. IACR Cryptology ePrint Archive 2016 (2016), 1159.

[93] Sompolinsky, Y., and Zohar, A. Phantom: A scalable blockdag protocol, 2018.

[94] Speedtest Market Report, 2017. www.speedtest.net/reports/.

[95] State of the Bitcoin Network, 2017. www.hackingdistributed.com/2017/02/15/

state-of-the-bitcoin-network.

[96] Su, A.-J., Choffnes, D. R., Kuzmanovic, A., and Bustamante, F. E.

Drafting behind akamai: Inferring network conditions based on cdn redirections.

IEEE/ACM Transactions on Networking (TON) 17, 6 (2009), 1752–1765.

[97] Swan, M. Blockchain: Blueprint for a new economy. ” O’Reilly Media, Inc.”, 2015. 125

[98] Swartz, L. Blockchain dreams: Imagining techno-economic alternatives after bit-

coin. Another economy is possible (2017), 82–105.

[99] Yes Facebook is using your 2FA phone number to tar-

get you with ads. https://www.techcrunch.com/2018/09/27/

yes-facebook-is-using-your-2fa-phone-number-to-target-you-with-ads/.

[100] Treasury Designates Iran-Based Financial Facilitators of Malicious Cyber Activ-

ity and for the First Time Identifies Associated Digital Currency Addresses, 2018.

https://home.treasury.gov/news/press-releases/sm556.

[101] Vasek, M., Thornton, M., and Moore, T. Empirical analysis of denial-of-

service attacks in the bitcoin ecosystem. In International Conference on Financial

Cryptography and Data Security (2014), Springer, pp. 57–71.

[102] Velner, Y., Teutsch, J., and Luu, L. Smart contracts make bitcoin mining

pools vulnerable. In International Conference on Financial Cryptography and Data

Security (2017), Springer, pp. 298–316.

[103] Wang, J., and Guo, Y. Scrapy-Based Crawling and User-Behavior Characteris-

tics Analysis on Taobao. In Cyber-Enabled Distributed Computing and Knowledge

Discovery (CyberC), 2012 International Conference on (2012), IEEE, pp. 44–52.

[104] Wheeler, K. A., Bowers, A. W., Wong, C. H., Palmer, J. Y., and

Wang, X. A power quality and load analysis of a cryptocurrency mine. In 2018

IEEE Electrical Power and Energy Conference (EPEC) (2018), IEEE, pp. 1–6. 126

[105] GOOGLE’S PRIVACY WHIPLASH SHOWS BIG TECH’S

INHERENT CONTRADICTIONS. www.wired.com/story/

googles-privacy-whiplash-shows-big-techs-inherent-contradictions/.

[106] Wood, G. Ethereum: A Secure Decentralised Generalised Transaction Ledger.

Ethereum Project Yellow Paper (2014).

[107] Wustrow, E., and VanderSloot, B. Ddoscoin: Cryptocurrency with a ma-

licious proof-of-work. In Proceedings of the 10th USENIX Workshop on Offensive

Technologies (2016), USENIX Association.

[108] YaCy Search Engine. http://www.yacy.net.

[109] Yin, F., Dong, D., Li, S., Guo, J., and Chow, K. Java performance trou-

bleshooting and optimization at alibaba. In Proceedings of the 40th International

Conference on Software Engineering: Software Engineering in Practice (2018),

ACM, pp. 11–12.

[110] Yin, H., Liu, X., Zhan, T., Sekar, V., Qiu, F., Lin, C., Zhang, H., and

Li, B. Design and deployment of a hybrid cdn-p2p system for live video streaming:

experiences with livesky. In Proceedings of the 17th ACM international conference

on Multimedia (2009), ACM, pp. 25–34.

[111] Zhang, R., and Preneel, B. On the necessity of a prescribed block validity

consensus: Analyzing bitcoin unlimited mining protocol. In Proceedings of the 13th 127

International Conference on emerging Networking Experiments and Technologies

(2017), ACM, pp. 108–119.

[112] Zhao, M., Aditya, P., Chen, A., Lin, Y., Haeberlen, A., Druschel, P.,

Maggs, B., Wishon, B., and Ponec, M. Peer-assisted content distribution in

akamai netsession. In Proceedings of the 2013 conference on Internet measurement

conference (2013), ACM, pp. 31–42.

[113] Zheng, Z., Xie, S., Dai, H., Chen, X., and Wang, H. An overview of

blockchain technology: Architecture, consensus, and future trends. In Big Data

(BigData Congress), 2017 IEEE International Congress on (2017), IEEE, pp. 557–

564.

[114] Zheng, Z., Xie, S., Dai, H.-N., Chen, X., and Wang, H. Blockchain chal-

lenges and opportunities: A survey. International Journal of Web and Grid Services

14, 4 (2018), 352–375.

[115] Zimmermann, H. Osi reference model–the iso model of architecture for open sys-

tems interconnection. IEEE Transactions on communications 28, 4 (1980), 425–432.