Knowledge Discovery on Blockchains: Challenges and Opportunities for Distributed Event Detection Under Constraints?

Knowledge Discovery on Blockchains: Challenges and Opportunities for Distributed Event Detection under Constraints? Cedric Sanders1 and Thomas Liebig1;2;3 1TU Dortmund, Artificial Intelligence Unit, 44269 Dortmund, Germany [email protected], [email protected] http://www-ai.cs.uni-dortmund.de/index.html 2Universtiy of Nicosia, Artificial Intelligence Unit, PO 24005 CY-1700 Nicosia, Cyprus 3Materna Information & Communications SE, Artificial Intelligence Unit, 44141 Dortmund, Germany Abstract. We study the applicability of blockchain technology for distributed event detection under resource constraints. Therefore we provide a test-suite with several promising consensus methods (Proof-of- Work, Proof-of-Stake, Distributed Proof-of-Work, and Practical Proof- of-Kernel-Work). This is the first work analyzing the communication costs of blockchain consensus methods for knowledge discovery tasks in resource constraint devices. The experiments reveal that our proposed implementations of Distributed Proof-of-Work and Practical Proof-of-Kernel-Work provide a benefit over Proof-of-Work in CPU usage and communication costs. The tests show further that in cases of low data rates, where latencies by mining do not cause harm proposed blockchain implementations could be integrated. However, usage of blockchain requires data broadcasts, which leads to communication overhead as well as memory requirements based on the address list. Keywords: Blockchain · Consensus Method · Ubiquitous Knowledge Discovery. 1 Introduction arXiv:1904.07104v2 [cs.DC] 6 Oct 2020 The current shift towards edge analysis and distributed knowledge discovery [8, 11] is mostly driven by making use of large computation clusters and the internet of things. Indeed, applications that benefit from decentralized data management and analysis are, amongst others, sensor networks and mobility based services. In both scenarios, a potentially large number of heterogeneous devices is connected and forms a system. The differences amongst the devices could be ? Supported by German Research Foundation DFG under grant SFB 876 "Provid- ing Information by Resource-Constrained Data Analysis\ project B4 "Analysis and Communication for Dynamic Traffic Prognosis\. 2 C. Sanders and T. Liebig vast: computation power, memory limitations, energy consumption, etc. Besides, possible applications pose varying requirements for data management. While se- curity is more critical in case of processing vulnerable private information (e.g., medical data), memory consumption or power consumption could be more essential for other application domains. Once the sensor data in the mesh should be analyzed, one faces the challenge of how to store the data distributedly and how to perform the analysis on this data. This also incorporates the problem of keeping the information amongst the devices consistent. A possible technology that might provide a solution to these issues is the blockchain. These are sequences of unbreakably linked tuples, so-called blocks, of data, transactions, timestamp and the hash value of the ancestral block. A consensus method is required to extend such a blockchain; this is a procedure how multiple network participants find a new block which is added to the blockchain. Existing consensus methods have requirements in computation costs that ubiquitous devices hardly meet. Thus, the paper-at-hand fits under the topic of ubiquitous knowledge discovery [13]. Which connects current advances of data mining and machine learning with the latest developments in internet-of-things and mobile, distributed systems of heterogeneous devices. This work, therefore, aims at answering the question, whether blockchains are a technical-ready method to process data in distributed heterogeneous networks. We will examine various consensus methods. With well-suited experiments, figures are provided which assist in assessing the general utility of the different blockchain technologies. This raises the following questions: { How should a consensus method operate that meets requirements of cpu usage, memory usage and power consumption originating from a decentralized usage? { Which drawbacks make current consensus method implacable? and How could they be tackled? { Which challenges and requirements remain after analysis of the consensus methods? Many domains for decentralized knowledge discovery could be imagined. Es- pecially citizen science projects, where citizens build sensors and voluntary col- lect data poses opportunities to distributed immutable knowledge extraction without any centralized coordinator. As a blockchain does not alter the data nor restrict access, the analysis results will not differ from a knowledge discovery in databases [5] or streaming method [4]. However, different consensus methods are more suitable than others. To assess the methods and evaluate communication load, memory consumption, and CPU load, we carry out experiments using publicly available distributed sensor data from opensensemap1. It is a citizen science project (maintained by the University of Münster)and guides the direction for future applications. We use a state-of-the-art event monitoring method, geometric monitoring [19], which uses very few communication to monitor a global 1 https://opensensemap.org/ Knowledge Discovery on Blockchains 3 function. Thus, we propose a fully decentralized application of the previously coordinated geometric monitoring process. As another contribution, we are first to implement and evaluate the initially vague proposal of Distributed Proof-of- Work [2]. Thus, we borrow some concepts from the Practical Proof-of-Kernel Work. The latter is included in our software library and (in contrast to previously proprietary implementation) for the first time made available for open public development2. The following second section of the paper presents different works that are related to the presented topic. It is followed by a general introduction of the functionality of a blockchain. The fourth section describes different approaches for achieving consensus in a blockchain and analyzes them by evaluating their advantages and disadvantages. After understanding the different methods it is time to put them to the test in the form of an experiment, which will be evaluated in the fifth section. This evaluation is followed by the last section containing the conclusion of the paper. 2 Related Work While the field of ubiquitous knowledge discovery is established [13, 20] and nowadays receives much attention [8, 11] not only at major data mining and machine learning symposiums but with the spread of Industry 4.0 and internet- of-things also in application domains, just a few works focus on the chances a decentralized immutable storage of data could have for knowledge discovery and information retrieval. One famous exception is the application with health care data [7], which focuses on automated distributed monitoring of patients. Another highlight was the recent initial coin offering of a machine learning blockchain [3]. The authors offered a market space for algorithms and data, based on smart contracts, but it lacked balancing the workload with a smart consensus method. In the following we briefly describe how a blockchain operates. 3 Blockchain Fundamentals In the following, we give a brief introduction to the blockchain technology. Hash functions will play an important role in the next sections. Thus it is important to recall that those are one-way functions which are easy to compute but hard to reverse. A common choice for such a hash function is SHA256 [18]. This hash function is a combination of bitwise logical functions (AND, OR, XOR) and shifts (LSHIFTS, RSHIFTS), for the details, we refer the interested reader to the secure hash standard definition in [18]. The bitwise manipulation is part of the basic instruction set of most computer chips nowadays, this speeds computation 2 Our sources and link to the data are available at https://bitbucket.org/cedric_ sanders/abschlussarbeit/src/master/. 4 C. Sanders and T. Liebig up. Another important property of these hash functions is to map different input most likely to different output3. Blockchains first gained attention with the publication of the white paper "Bitcoin: A Peer-to-Peer Electronic Cash System\ of Satoshi Nakamoto [16]. The blockchain is described as a data structure which consists of smaller elements, the so-called blocks. A block comprises of 1. data4: contains the actual observations (e.g., transactions or sensor readings), 2. timestamp: is used to define a temporal order on blocks, 3. hash: hash value of the previous block. Every block contains the hash of the previous block, which in turn holds the hash of its predecessor. In case one of the old blocks is modified it is simple to recognize in future blocks as the hash value will not fit the one stored previously. To use this data structure in a decentralized network, a consensus method has to be added. 4 Consensus Methods The consensus is an essential part of distributed systems. With blockchains, consensus methods are the class of algorithms that describe how multiple parties find consent on blocks and which novel blocks are added to the chain. Nakamoto describes in his work [16] Proof-of-Work, which is still in use nowadays in Bitcoin, as one of these methods. In the meantime a bunch of new methods was intro- duced, for example, Proof-of-Burn [17], Proof-of-Luck [15], Proof-of-Stake [10], and Proof-of-Authority. Ethereum (a distributed platform empowering develop-

Load more