<<

Knowledge Discovery on : Challenges and Opportunities for Distributed Event Detection under Constraints?

Cedric Sanders1 and Thomas Liebig1,2,3

1TU Dortmund, Artificial Intelligence Unit, 44269 Dortmund, Germany [email protected], [email protected] http://www-ai.cs.uni-dortmund.de/index.html 2Universtiy of Nicosia, Artificial Intelligence Unit, PO 24005 CY-1700 Nicosia, Cyprus 3Materna Information & Communications SE, Artificial Intelligence Unit, 44141 Dortmund, Germany

Abstract. We study the applicability of technology for dis- tributed event detection under resource constraints. Therefore we pro- vide a test-suite with several promising consensus methods (Proof-of- Work, Proof-of-Stake, Distributed Proof-of-Work, and Practical Proof- of-Kernel-Work). This is the first work analyzing the communication costs of blockchain consensus methods for knowledge discovery tasks in resource constraint devices. The experiments reveal that our proposed implementations of Distributed Proof-of-Work and Practical Proof-of-Kernel-Work provide a benefit over Proof-of-Work in CPU usage and communication costs. The tests show further that in cases of low rates, where latencies by mining do not cause harm proposed blockchain implementations could be integrated. However, usage of blockchain requires data broadcasts, which leads to communication overhead as well as memory requirements based on the address list.

Keywords: Blockchain · Consensus Method · Ubiquitous Knowledge Discovery.

1 Introduction arXiv:1904.07104v2 [cs.DC] 6 Oct 2020 The current shift towards edge analysis and distributed knowledge discovery [8, 11] is mostly driven by making use of large computation clusters and the in- ternet of things. Indeed, applications that benefit from decentralized data man- agement and analysis are, amongst others, sensor networks and mobility based services. In both scenarios, a potentially large number of heterogeneous devices is connected and forms a system. The differences amongst the devices could be

? Supported by German Research Foundation DFG under grant SFB 876 ”Provid- ing Information by Resource-Constrained Data Analysis“ project B4 ”Analysis and Communication for Dynamic Traffic Prognosis“. 2 C. Sanders and T. Liebig vast: computation power, memory limitations, energy consumption, etc. Besides, possible applications pose varying requirements for data management. While se- curity is more critical in case of processing vulnerable private information (e.g., medical data), memory consumption or power consumption could be more es- sential for other application domains. Once the sensor data in the mesh should be analyzed, one faces the challenge of how to store the data distributedly and how to perform the analysis on this data. This also incorporates the problem of keeping the information amongst the devices consistent. A possible technology that might provide a solution to these issues is the blockchain. These are sequences of unbreakably linked tuples, so-called blocks, of data, transactions, timestamp and the hash value of the ancestral block. A consensus method is required to extend such a blockchain; this is a procedure how multiple network participants find a new block which is added to the blockchain. Existing consensus methods have requirements in computation costs that ubiquitous devices hardly meet. Thus, the paper-at-hand fits under the topic of ubiquitous knowledge discov- ery [13]. Which connects current advances of and machine learn- ing with the latest developments in -of-things and mobile, distributed systems of heterogeneous devices. This work, therefore, aims at answering the question, whether blockchains are a technical-ready method to process data in distributed heterogeneous networks. We will examine various consensus methods. With well-suited experiments, figures are provided which assist in assessing the general utility of the different blockchain technologies. This raises the following questions: – How should a consensus method operate that meets requirements of cpu us- age, memory usage and power consumption originating from a decentralized usage? – Which drawbacks make current consensus method implacable? and How could they be tackled? – Which challenges and requirements remain after analysis of the consensus methods? Many domains for decentralized knowledge discovery could be imagined. Es- pecially citizen science projects, where citizens build sensors and voluntary col- lect data poses opportunities to distributed immutable without any centralized coordinator. As a blockchain does not alter the data nor restrict access, the analysis results will not differ from a knowledge discovery in [5] or streaming method [4]. However, different consensus methods are more suitable than others. To assess the methods and evaluate communica- tion load, memory consumption, and CPU load, we carry out experiments using publicly available distributed sensor data from opensensemap1. It is a citizen sci- ence project (maintained by the University of M¨unster)and guides the direction for future applications. We use a state-of-the-art event monitoring method, geo- metric monitoring [19], which uses very few communication to monitor a global

1 https://opensensemap.org/ Knowledge Discovery on Blockchains 3 function. Thus, we propose a fully decentralized application of the previously coordinated geometric monitoring process. As another contribution, we are first to implement and evaluate the initially vague proposal of Distributed Proof-of- Work [2]. Thus, we borrow some from the Practical Proof-of-Kernel Work. The latter is included in our software library and (in contrast to pre- viously proprietary implementation) for the first time made available for open public development2. The following second section of the paper presents different works that are related to the presented topic. It is followed by a general introduction of the functionality of a blockchain. The fourth section describes different approaches for achieving consensus in a blockchain and analyzes them by evaluating their advantages and disadvantages. After understanding the different methods it is time to put them to the test in the form of an experiment, which will be evaluated in the fifth section. This evaluation is followed by the last section containing the conclusion of the paper.

2 Related Work

While the field of ubiquitous knowledge discovery is established [13, 20] and nowadays receives much attention [8, 11] not only at major data mining and symposiums but with the spread of Industry 4.0 and internet- of-things also in application domains, just a few works focus on the chances a decentralized immutable storage of data could have for knowledge discovery and information retrieval. One famous exception is the application with health care data [7], which focuses on automated distributed monitoring of patients. Another highlight was the recent initial coin offering of a machine learning blockchain [3]. The authors offered a market space for algorithms and data, based on smart contracts, but it lacked balancing the workload with a smart consensus method. In the following we briefly describe how a blockchain operates.

3 Blockchain Fundamentals

In the following, we give a brief introduction to the blockchain technology. Hash functions will play an important role in the next sections. Thus it is important to recall that those are one-way functions which are easy to compute but hard to reverse. A common choice for such a hash function is SHA256 [18]. This hash function is a combination of bitwise logical functions (AND, OR, XOR) and shifts (LSHIFTS, RSHIFTS), for the details, we refer the interested reader to the secure hash standard definition in [18]. The bitwise manipulation is part of the basic instruction set of most computer chips nowadays, this speeds computation

2 Our sources and link to the data are available at https://bitbucket.org/cedric_ sanders/abschlussarbeit/src/master/. 4 C. Sanders and T. Liebig up. Another important property of these hash functions is to map different input most likely to different output3. Blockchains first gained attention with the publication of the white paper ”Bitcoin: A Peer-to-Peer Electronic Cash System“ of Satoshi Nakamoto [16]. The blockchain is described as a data structure which consists of smaller elements, the so-called blocks. A block comprises of

1. data4: contains the actual observations (e.g., transactions or sensor readings), 2. timestamp: is used to define a temporal order on blocks, 3. hash: hash value of the previous block.

Every block contains the hash of the previous block, which in turn holds the hash of its predecessor. In case one of the old blocks is modified it is simple to recognize in future blocks as the hash value will not fit the one stored previously. To use this data structure in a decentralized network, a consensus method has to be added.

4 Consensus Methods

The consensus is an essential part of distributed systems. With blockchains, con- sensus methods are the class of algorithms that describe how multiple parties find consent on blocks and which novel blocks are added to the chain. Nakamoto describes in his work [16] Proof-of-Work, which is still in use nowadays in Bitcoin, as one of these methods. In the meantime a bunch of new methods was intro- duced, for example, Proof-of-Burn [17], Proof-of-Luck [15], Proof-of-Stake [10], and Proof-of-Authority. Ethereum (a distributed platform empowering develop- ers to develop blockchains on existing infrastructure) uses a slightly modified version of Proof-of-Work that may cope with large memory requirements of the participating devices [1] [21]. A recent development with the potential to solve the current problem is the Distributed Proof-of-Work (vaguely proposed in [2]) and Practical Proof-of- Kernel-Work of Xain [12]. Cicada [2] is a group of programmers that suggest a decentralized democratic system based on unique human identifiers. Their vaguely proposed consensus method, however, has some strong points. In the paper at hand, we are the first implementing and analyzing this consensus method. The origin of Xain [12] is a group of British scientist with a background in machine learning. The primary focus is the improvement of blockchains by online adjusting block mining difficulties using reinforcement learning. Their distributed method to grant temporary access to physical doors was successfully implemented in vehicular prototypes.

3 We are aware that by reduction of dimensionality collisions must occur, but as the hash function is hard to reverse also the collisions are hard to find. 4 Due to the strong connection to cryptocurrencies often named transactions. Knowledge Discovery on Blockchains 5

4.1 Proof-of-Work

As described beforehand, Proof-of-Work (compare Algorithm 1.1) is the original for consensus on a blockchain, introduced in [16]. The basic idea is that a party has to gain the right to publish a novel block. This is done by proving that he spent work in terms of computational power for the generation of the block. The proof is enabled by requiring the answer to a complex problem for publishing a block. Usually, a so-called nonce has to be found, which in combination with the new block has a hash value ending on a specific sequence. By changing the length of this predefined sequence, the hardness of the proof could be adjusted. As no party knows in advance which party will add a new block, data needs to be broadcasted to all parties. Parties that aim for publication of a block have to collect the data and combine them in a novel block. Afterward, they could start to find a nonce such that the necessary hash condition is met. Whoever performs these steps fastest may publish the new block. It might happen that multiple parties mine the block at the same time, this causes the creation of branches (alternative versions of the blockchain). Proof- of-Work tolerates them for a couple of iterations until one of the branches is longer than its alternatives. As a longer branch represents more computational power, it will be considered correct and alternatives will be deleted. Blockchain participants do not need to participate in the mining process. As some incentive, there is a reward in crypto-currencies per block and processing fees for transactions.

Input : Last Block Output : New Block 1: nonce ← 0 2: while proof is not valid do 3: proof ← ProofofWork(Last Block, nonce) 4: nonce ← nonce + 1 5: end while 6: RewardMiner() 7: CreateBlock(Last Block, proof)

Algorithm 1.1: Mining with Proof-of-Work

Proof-of-Work is the oldest consensus method incorporated in this study. And since it is the most popular, it is used most often in literature. Thus, the following techniques were introduced to cope with its challenges. Proof-of-Work has been introduced to implement a decentralized cryptocurrency. Therefore, safety, decentralism, and resilience were the most critical aspects of the design. In his paper, Nakamoto describes attack scenarios and clarifies that one party needs to be faster in adding blocks to the chain than all others to cause damage. At the same time, he estimates the probability an attacker could do this [16, 11. Calculations]. However, this safety has its cost. The blockchain is safe as 6 C. Sanders and T. Liebig long as a majority of the computation power is used for mining novel blocks. This has several properties: The energy consumption of parties that aim for maintaining consistency is exceptionally high. From the very beginning till a possible shut down. This is a problem especially for potentially small or mobile devices which have limited energy budget. Another drawback is the distribution of computational power which is directly coupled with the integrity of the chain. The problem is often also called 51% problem, as any cooperative group of parties holding 51% of the computational power gains a higher impact on the system and attacks on the integrity get easier. In the blockchain, parties may join or leave the network at any time, without any inconsistencies (e.g., duplicate or missing data). Therefore data storage has some redundancy. So blockchains are excellent for applications where memory consumption does not matter, and a high resilience needs to be guaranteed. The last point is the high communication cost with blockchains in general. The distributed storage blockchain requires lots of communication as broadcasts are necessary for transmission of transactions or data, for distribution of novel blocks, for finding the longest blockchain and resolving branches.

4.2 Proof-of-Stake

Proof-of-Stake (compare algorithm 1.2) is a concept introduced by Sunny King and Scott Nadal, for the PPCoin [10]. It follows an entirely different approach which is stronger coupled with the application as currency. A party does not need to prove an amount of workload but has to prove it owns a certain amount of the currency. For this reason, Proof-of-Stake uses coinage. Coinage describes the ownership of coins over a certain period. If a party owns 100 coins over 10 time slices, he holds coinage 1.000. The coinage sinks if coins are spent. If a party wants to add a block, it transfers itself coins to reduce its coinage. By this transaction, it gains as a reward a simplification of the mining problem.

Input : Last Block Output : New Block 1: coinage ← CalculateCoinAge() 2: investment ← Random(coinage) 3: MakeInvestment() 4: nonce ← 0 5: while proof is not valid do 6: proof ← ProofofWork(Last Block, nonce, investment) 7: nonce ← nonce + 1 8: end while 9: RewardMiner() 10: CreateBlock(Last Block, proof)

Algorithm 1.2: Mining with Proof-of-Stake Knowledge Discovery on Blockchains 7

Proof-of-Stake requires any form of ownership to invest it for gaining impact on the blockchain. This reduces its applicability in practice. For sure application domains could be extended with such a concept (as done with Ether tokens in Ethereum). But if the consensus method does not provide a considerable benefit over the other methods this workaround should be scrutinized. Implementing this method raises new questions. For example, an investment strategy of tokens for miners to spend their tokens. This provides a vast poten- tial for ubiquitous devices with limited computational power. They could take high investments at a low frequency to optimally use their limited computation capabilities.

4.3 Distributed Proof-of-Work The novel consensus method of cicada is called distributed-Proof-of-Work and bases on Proof-of-Work [2]. It structures the mining process into small contests called mining races. In contrast to Proof-of-Work, access to mining is limited. For each mining race, a set of participants is randomly selected. Using, Distributed Proof-of-Work any party needs to take part in mining, making them eligible for mining races. Distributed Proof-of-Work restricts access to the original Proof-of-Work to overcome some of its problems. However, the description of the authors is some- what vague; for example the selection of the miners which causes difficulties for implementation. In the following section on Practical Proof-of-Kernel Work ver- ifiable random functions (VRF) are applied for that purpose. Thus we borrow the concept also for Distributed Proof-of-Work. VRF is a concept to execute a random function, i.e., a function with an unknown result while the processor is capable of proving to other parties that the obtained value is correct. Variable random functions could be constructed with different cryptographic methods. One example is the approach by Goldberg [6] using elliptic curve cryptography. The basic procedure follows these steps [14]: – A so-called generator provides public pk and private keys sk to each party. – Given its private key sk and a publicly known seed x a party is capable of computing a random function f which calculates a proof p. – Every other party could now verify that the proof is the result of the random function given p, pk and x. 8 C. Sanders and T. Liebig

Input : Last Block Output : New Block 1: selected ← False 2: if enough nodes selected then 3: if selected then 4: nonce ← 0 5: while proof is not valid do 6: proof ← ProofofWork(Last Block, nonce) 7: nonce ← nonce + 1 8: end while 9: RewardMiner() 10: CreateBlock(Last Block, proof) 11: else 12: Wait for next mining race 13: end if 14: else 15: selected ← VerifiableRandomFunction(seed, difficulty) 16: if selected then 17: Let the other nodes verify the Verifiable Random Function 18: else 19: Wait for other nodes to do the lottery 20: if not enough nodes selected then 21: Reduce the difficulty of being selected 22: end if 23: end if 24: end if Algorithm 1.3: Mining with Distributed Proof-of-Work Knowledge Discovery on Blockchains 9

4.4 Practical Proof-of-Kernel-Work Practical Proof-of-Kernel-Work (compare algorithm 1.4) also bases on Proof-of- Work but includes access control. The used methods are more complex than those of Distributed Proof-of-Work. Three mechanisms control participation in Proof-of-Work: – A whitelist that lists trustable parties. – A set of dynamic rules. For example the creator of the last block could be banned for the next three iterations. – A random selection of parties from the whitelist. This latter selection routine uses a continuous seed that is embedded in the blockchain. This seed may be used by the parties to perform a lottery based on variable random functions. The selection process not only guarantees a random selection but also prevents others from obtaining any knowledge on the selected parties. This prevents an attacker from performing targetted attacks on parties. The chosen parties, in turn, can prove that they have been selected. The access control reduces energy consumption and weakens the 51% problem described above (compare Section 4.1) as neither computation nor ownership has an impact on future selection. Practical Proof-of-Kernel-Work reduces the likelihood of branches as fewer parties participates in mining. Storing a whitelist on the blockchain holds potential problems for its scalability. In large networks, this would cause huge memory consumption which restricted memory devices may have issues with. Also, the computation of a verifiable random function possesses challenges to computationally weak devices.

5 Experiments

Especially for the two novel and promising consensus methods Distributed Proof- of-Work and Practical Proof-of-Kernel-Work (compare Sections 4.3 and 4.4) no implementation was available and description was rather vague. Thus, we con- tribute implementations of these consensus methods. With the focus on poten- tially heterogeneous devices and latest developments of micropython and cir- cuitpython as rapid prototyping ’operating systems‘ for ultra-low power devices, we picked python as a programming language5. To obtain comparable results also the two established consensus methods Proof-of-Work and Proof-of-Stake are part of our library. The use of a blockchain could expect no improvement or drawback, thus we will not report on the performance of geometric monitor- ing, but on our measures of interest CPU load, memory usage, through-put and communication costs. In the following, we briefly describe the problems we faced. The analysis presumes random access to the data. Thus the choice of the algorithm is not crucial for a comparison. We implemented nodes that are processed on a system and hold essential functions: 5 The resulting sources are made publicly available at https://bitbucket.org/ cedric_sanders/blockchain-experiments/src/master/. 10 C. Sanders and T. Liebig

Input : Last Block Output : New Block 1: selected ← False 2: if enough nodes selected then 3: if selected then 4: nonce ← 0 5: while proof is not valid do 6: proof ← ProofofWork(Last Block, nonce) 7: nonce ← nonce + 1 8: end while 9: RewardMiner() 10: CreateBlock(Last Block, proof) 11: else 12: Wait for next mining race 13: end if 14: else 15: if Node on Whitelist then 16: if CheckRuleset() then 17: if VerifiableRandomFunction(seed, difficulty) then 18: selected ← True 19: end if 20: end if 21: end if 22: if selected then 23: Let the other nodes verify the Selection 24: else 25: Wait for other nodes to do the lottery 26: if not enough nodes selected then 27: Reduce the difficulty of being selected 28: end if 29: end if 30: end if Algorithm 1.4: Mining with Practical Proof-of-Kernel-Work Knowledge Discovery on Blockchains 11

– running transactions, – mining blocks, – syncing the blockchain, – reading a stream of observations from a file and inserting observations to the blockchain, – logging of metrics for the analysis we present next, – running a local model for data analysis. The overall goal is to keep different methods comparable. Thus we did not focus on implementing individual cases but a plain structure of the methods, as described in Section 4. All our implementations consist of three building blocks 1) a RESTful server that provides an API to other parties, 2) a part for mining and maintenance of the blockchain and 3) a part that performs the actual calculations. For the basic Proof-of-Work, the difficulty was set to 6 leading zeros, which corresponds to 40 seconds block time and fits quite well to the sampling interval of the opensensemap data6 we use. The practical Proof-of-Kernel-Work requires the inclusion of the verifiable random functions. We applied the approach of [6, Definition 4.1] for the lottery. Syncing the needed seed amongst the parties, however, caused some unexpected problems as the operations are not atomic and there might already be consensus for a new seed once a node finished mining, we overcame this chance of asynchronicity by relaxing the verification. Thus, we allow also ancestor and of current seed as valid. Another important decision is when the results of the lottery are broadcasted. If the set of miners is published directly (before the actual creation of the block) the other parties could send their data or transactions directly to the selected ones. An alternative method would be to reveal the decision of the lottery after mining of the new block, this requires a broadcast of all observations. The latter techniques would be more secure. It prevents targetted attacks on single nodes. But, as we want to see the potential benefit of the consensus method, We decided for the first option which reduces energy consumption and communication cost. Besides, we added a white list that keeps a record of the trustable devices. As soon as a malicious party sends fraudulent blocks to the network, it will be removed from the whitelist. Distributed Proof-of-Work operates similar as Practical Proof-of- Kernel-Work, but the time and memory consuming additions are removed. This includes dynamic rules and whitelist. To compare the consensus methods on their feasibility. We need to test it with a distributed data analysis task. Usage of the blockchain does not alter the data, neither do the considered consensus methods. A distributed analysis, therefore, produces the same result as without using a blockchain. The application we are aiming for is a distributed monitoring task with multiple sensors. We perform analysis with the well suited geometric monitoring approach [19] that reduces communication costs and bases on a simple concept. Recent improvements were published in [9,11]. The primary task is that a global threshold function should be monitored without communicating every single data item. The communication

6 https://opensensemap.org/ 12 C. Sanders and T. Liebig is reduced by introducing local threshold conditions which need to be raised to start communication with the coordinator. The coordinator checks the global function and updates the threshold parameters of the parties. Challenge for the application of geometric monitoring is the design of the local conditions. The requirements to local conditions are:

– Correctness: As long as all local conditions are below the threshold, also the global threshold is not reached. – Communication efficiency: The number of necessary communications is min- imal. – Efficient computation: the calculation required to test local conditions is low.

As recently shown in [11], finding these local threshold functions is more straight- forward when the global function is convex. Then it is sufficient to find a close upper bound for the global function. In combination with blockchain, the geometric monitoring approach could be applied coordinator free, fully decentralized. Every node has the required information to test global functions. As described above (compare Section 1), we perform the tests using data from opensensemap7. This is a network of citizen sensor data consisting of 3909 so- called senseboxes. In general, they are situated around the globe, but mostly they are in Germany. The attributes of each sensor records differ much. While just a few sensors record special features as gamma radiation, temperature and wind speeds are prevalent attributes. To perform tests with geometric monitoring, in this study, we decided for the temperature feature8. Setting a global threshold on the average temperature is easy. The experiments were conducted on a cluster of multi-core computers each running a process of a node. We tested for 5, 10, 20 and 40 participants. To validate even larger networks future implementation could make use of MPI or other interprocess communication protocols. Direct test in a distributed sensor mesh is another option but in a fully distributed coordinator free setup analysis of the experiment also requires centralization and eventually clock synchronization. We analyze four aspects: communication, mining, memory and CPU usage. The communication analysis is split into the types, data request, transaction received, coordinating blocks and transactions. The Figures 1 to 4 reveal that Proof-of-Stake requires more communication than the other methods. One reason is additional transactions to communicate the coinage. The additional checks of the coinage cause more communication rounds on the blockchain. Most communication originates from the access on the blockchain data. The transmission of transactions and blocks are neglectable. Next, we test the temporal performance of the blockchain. How much time is required for block generation and whats the block through-put? In Figures 5 to 8

7 https://opensensemap.org/ 8 The data we apply the method to is obtained in the interval from March, 23rd 2019 till March, 24th 2019, in the WGS 84 box [5.98865807458, 47.3024876979, 15.0169958839, 54.983104153]. Knowledge Discovery on Blockchains 13

12000 20000 Blockchain requested Blockchain requested 18000 Transaction recieved Transaction recieved 10000 Block recieved 16000 Block recieved Mining coordination Mining coordination 14000 8000 Connection error Connection error 12000

6000 10000

8000 4000 6000 Communication processes Communication processes 4000 2000 2000

0 0 PoW PoS DPoW PPoKW PoW PoS DPoW PPoKW Consensus method Consensus method

Fig. 1. Average communicationcost Fig. 2. Average communicationcost for 5 nodes over 1 hour for 10 nodes over 1 hour

Blockchain requested 36000 Blockchain requested 28000 34000 Transaction recieved Transaction recieved 26000 32000 24000 Block recieved 30000 Block recieved 22000 Mining coordination 28000 Mining coordination Connection error 26000 Connection error 20000 24000 18000 22000 16000 20000 14000 18000 16000 12000 14000 10000 12000 8000 10000

Communication processes 6000 Communication processes 8000 6000 4000 4000 2000 2000 0 0 PoW PoS DPoW PPoKW PoW PoS DPoW PPoKW Consensus method Consensus method

Fig. 3. Average communicationcost Fig. 4. Average communicationcost for 20 nodes over 1 hour for 40 nodes over 1 hour

Proof-of-Stake stands out again. The time between consecutive mining Figures 9 to 12 processes of one party could become quite high. Therefore broadcast of the transactions is important; otherwise, they would be available with a huge delay. Next Figures 13 to 16 depict the memory consumption of the methods. All four methods require a similar amount of memory. Proof-of-Work a little bit less, whereas the two methods with access control need a bit more memory. The CPU usage is depicted in Figures 17 to 20. Proof-of-Work requires a high percentage of the calculation power continuously. The two methods Practical Proof-of-Kernel-Work and Distributed Kernel-Work distribute the computation load more amongst the partners, and thus each CPU is used less.

6 Conclusion and Future Works

The paper-at-hand assessed the suitability of blockchains to decentralized data processing scenarios. After we discussed promising consensus methods, we per- formed an event monitoring task. Thus we used a fully decentralized geometric monitoring. The analysis reveals that in cases of low data rates, where latencies by mining do not cause harm the methods could be integrated. A major draw- back of blockchain is the requirement for broadcasts in the network. Besides communication costs, it also causes a blockchain on restricted memory parties to have a limit of participants given by the address list of the parties. CPU usage 14 C. Sanders and T. Liebig

300 100 275 PoW Created blocks PoW Created blocks PoS Length of the chain PoS Length of the chain 250 DPoW DPoW 225 PPoKW 75 PPoKW 200 175 150 50 125

Blockcount 100 Blockcount 75 25 50 25 0 0

0 10 20 30 40 50 0 10 20 30 40 50 Time in minutes Time in minutes

Fig. 5. Average created/total Fig. 6. Average created/total blocks for 5 nodes over 1 hour blocks for 10 nodes over 1 hour 90 90 PoW Created blocks PoW Created blocks 80 PoS Length of the chain 80 PoS Length of the chain 70 DPoW 70 DPoW PPoKW PPoKW 60 60

50 50

40 40 Blockcount Blockcount 30 30

20 20

10 10

0 0 0 10 20 30 40 50 0 10 20 30 40 50 Time in minutes Time in minutes

Fig. 7. Average created/total Fig. 8. Average created/total blocks for 20 nodes over 1 hour blocks for 40 nodes over 1 hour does not cause a problem anymore as Distributed Proof-of-Work and Practical Proof-of-Kernel-Work overcome shortcomings of Proof-of-Work. The current analysis reveals that Proof-of-Work and Proof-of-Stake are not well suited for resource-constrained devices. In future work, the hardness and the verifiable random functions can be studied more. Also, investment strategies of coinage in combination with the restrictive consensus methods are promising. Knowledge Discovery on Blockchains 15

650 600 PoW Mining Time 650 PoW Mining Time PoS Block Time PoS Block Time 550 600 DPoW 550 DPoW 500 PPoKW 500 PPoKW 450 450 400 400 350 350 300 300 250 250

Time in seconds 200 Time in seconds 200 150 150 100 100 50 50 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Time in minutes Time in minutes

Fig. 9. Average Mining Time/ Fig. 10. Average Mining Time/ Block Time for 5 nodes over 1 hour Block Time for 10 nodes over 1 hour 1400 750 PoW Mining Time 1300 PoW Mining Time 700 PoS Block Time 1200 PoS Block Time 650 DPoW DPoW 600 1100 PPoKW PPoKW 550 1000 500 900 450 800 400 700 350 600 300 500 250 Time in seconds Time in seconds 200 400 150 300 100 200 50 100 0 0 0 10 20 30 40 50 0 10 20 30 40 50 Time in minutes Time in minutes

Fig. 11. Average Mining Time/ Fig. 12. Average Mining Time/ Block Time for 20 nodes over 1 hour Block Time for 40 nodes over 1 hour

PoW PoW 70 PoS 80 PoS DPoW DPoW 65 PPoKW PPoKW 70 60

55 60

50 Memory in MB Memory in MB 50 45

40 40

35 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time in minutes Time in minutes

Fig. 13. Average memory usage for Fig. 14. Average memory usage for 5 nodes over 1 hour 10 nodes over 1 hour 80 80 PoW PoW PoS PoS DPoW DPoW 70 70 PPoKW PPoKW

60 60

Memory in MB 50 Memory in MB 50

40 40

0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time in minutes Time in minutes

Fig. 15. Average memory usage for Fig. 16. Average memory usage for 20 nodes over 1 hour 40 nodes over 1 hour 16 C. Sanders and T. Liebig

100 100

80 80

60 60

40 40

PoW PoW CPU core workload in % CPU core workload in % 20 PoS 20 PoS DPoW DPoW 0 PPoKW 0 PPoKW

0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time in minutes Time in minutes

Fig. 17. Average CPU-Workload Fig. 18. Average CPU-Workload for 5 nodes over 1 hour for 10 nodes over 1 hour 100 100

80 80

60 60

40 40

PoW PoW CPU core workload in % CPU core workload in % 20 20 PoS PoS DPoW DPoW 0 PPoKW 0 PPoKW

0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time in minutes Time in minutes

Fig. 19. Average CPU-Workload Fig. 20. Average CPU-Workload for 20 nodes over 1 hour for 40 nodes over 1 hour Knowledge Discovery on Blockchains 17

References

1. Buterin, V., et al.: A next-generation smart contract and decentralized application platform. white paper (2014) 2. CICADA: Cicada: A distributed direct democracy and decentralized appilcation platform. Web: iamcicada.com/whitepaper/, last accessed 30.03.2019 3. DML: Decentralized machine learning. Web: https://decentralizedml.com/DML_ whitepaper_31Dec_17.pdf, last accessed 02.04.2019 4. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Kdd. vol. 2, p. 4 (2000) 5. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI magazine 17(3), 37 (1996) 6. Goldberg, S., Naor, M., Papadopoulos, D., Reyzin, L.: Nsec5 from elliptic curves: Provably preventing dnssec zone enumeration with shorter responses. IACR Cryp- tology ePrint Archive 2016, 83 (2016) 7. Griggs, K.N., Ossipova, O., Kohlios, C.P., Baccarini, A.N., Howson, E.A., Haya- jneh, T.: Healthcare blockchain system using smart contracts for secure automated remote patient monitoring. Journal of medical systems 42(7), 130 (2018) 8. Kamp, M., Adilova, L., Sicking, J., H¨uger,F., Schlicht, P., Wirtz, T., Wrobel, S.: Efficient decentralized deep learning by dynamic model averaging. In: Joint Eu- ropean Conference on Machine Learning and Knowledge Discovery in Databases. pp. 393–409. Springer (2018) 9. Keren, D., Sagy, G., Abboud, A., Ben-David, D., Schuster, A., Sharfman, I., Deli- giannakis, A.: Geometric monitoring of heterogeneous streams. IEEE Transactions on Knowledge and Data Engineering 26(8), 1890–1903 (2014) 10. King, S., Nadal, S.: Ppcoin: Peer-to-peer crypto-currency with proof-of-stake. self- published paper, August 19 (2012) 11. Lazerson, A., Keren, D., Schuster, A.: Lightweight monitoring of distributed streams. ACM Transactions on Systems (TODS) 43(2), 9 (2018) 12. Lundbæk, L.N., Beutel, D.J., Huth, M., Kirk, L.: Practical proof of kernel work & distributed adaptiveness, manuscript Version 1.2, 2018 13. May, M., Berendt, B., Cornue, A., et al.: Research challenges in ubiquitous knowl- edge discovery. In: Next Generation of Data Mining, pp. 154–173. Chapman and Hall/CRC (2008) 14. Micali, S., Rabin, M., Vadhan, S.: Verifiable random functions. In: Foundations of Computer Science, 1999. 40th Annual Symposium on. pp. 120–130. IEEE (1999) 15. Milutinovic, M., He, W., Wu, H., Kanwal, M.: Proof of luck: An efficient blockchain consensus protocol. In: Proceedings of the 1st Workshop on System Software for Trusted Execution. p. 2. ACM (2016) 16. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system (2008) 17. P4Titan: Slimcoin: A peer-to-peer crypto-currency with proof-of-burn (2014) 18. PUB, F.: Secure hash standard (shs). FIPS PUB 180(4) (2012) 19. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring thresh- old functions over distributed data streams. ACM Transactions on Database Sys- tems (TODS) 32(4), 23 (2007) 20. Wolff, R., Bhaduri, K., Kargupta, H.: A generic local algorithm for mining data streams in large distributed systems. IEEE Transactions on Knowledge and Data Engineering 21(4), 465–478 (2009) 21. Wood, G.: Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, 1–32 (2014)