A Blockchain-Based Storage System for Data Analytics in the Internet of Things

A Blockchain-Based Storage System for Data Analytics in the Internet of Things Quanqing Xu, Khin Mi Mi Aung, Yongqing Zhu and Khai Leong Yong Abstract Without a central authority, blockchains can easily enable the management of transactions. Smart contracts stored on blockchains are self-executing contractual states that are not controlled by anybody, so they can be trusted. In addition, due to increasing improvements in processor and memory technology, IoT (Internet of Things) devices have more powerful processing power and greater memory space, which allow them to execute user-defined programs, e.g., smart contracts. Shifting part of applications’ tasks to IoT devices reduces the transferred data amount over the IoT network. The parallelism of large-scale storage systems is employed to decrease many basic data analytics tasks’ execution time. Blockchain can be used as smart contracts that facilitate and enforce the negotiation of a contract in the IoT. This chapter proposes a blockchain-based storage system, named Sapphire, for data analytics applications in the Internet of Things. All the IoT data from the devices forms objects with IDs, attributes, policies, and methods. We present an OSD-based smart contract (OSC) approach employed in Sapphire as a transaction protocol, where IoT devices interact with such blockchains. For data analytics applications, the IoT device processors execute application-specific operations. By doing so, only the results are returned to clients instead of datafiles read by them. Therefore, the Sapphire system can greatly decrease the overhead of data analytics in the Internet of Things. Keywords Internet of things ⋅ Blockchain ⋅ Storage system ⋅ Data analytics ⋅ Smart contract Q. Xu (✉)⋅ K.M.M. Aung⋅ Y. Zhu⋅ K.L. Yong Data Storage Institute, A*STAR, Singapore, Singapore e-mail: [email protected] K.M.M. Aung e-mail: [email protected] Y. Zhu e-mail: [email protected] K.L. Yong e-mail: [email protected] © Springer International Publishing AG 2018 119 R.R. Yager and J. Pascual Espada (eds.), New Advances in the Internet of Things, Studies in Computational Intelligence 715, DOI 10.1007/978-3-319-58190-3_8 120 Q. Xu et al. 1 Introduction The IoT (Internet of Things) is a network that is able to connect many objects to the Internet via a large number of devices, e.g., sensors, cameras, smart phones, and RFID (Radio-frequency identification) readers. All common physical objects in the IoT have an IP address or URI, and they can exchange information among them. It finally reaches a goal of intelligent management and recognition. The IoT devices (or things) are seamlessly linked into a virtual world via the IoT network, enabling anywhere and anytime connectivity. There would be 50 billion devices by 2020 as there are increasingly smart devices per person [1]. There would be 100 billion IoT connections, and thefinancial impact of IoT may be as much as $3.9 to $11.1 trillion on the global economy by 2025 [2]. Data in the IoT environment is from a large number of different devices. It represents billions of objects, thus it would be so extremely large that we must build a scalable distributed storage system. IoT has received extensive attentions from both academia and industry recently, and its basic idea is to integrate the things into the Internet with provision of various services to users [3]. There are typical killer applications of IoT, such as smart home [4, 5], smart grid [6], and smart building [7]. As we see that there are increasing IoT devices, it is possible to use blockchain technologies [8] that manages numer- ous unspecified devices and processes, including communications and transactions among the devices. Without a central authority, blockchain technologies guarantee the security and credibility of data. Transactions among IoT devices are recorded on blockchains as smart contracts [9] and executed automatically to improve transaction efficiency greatly. For example, transactions and settlements are completed automatically irrespective of past relationships with other relevant IoT devices. Without central third parties, mutually distrustful IoT devices are allowed to transact safely in emerging smart contract systems. The decentralized blockchain [10] makes sure that IoT devices obtain commensurate compensation regardless of contractual breaches or aborts. For example, Ethereum Virtual Machine allows executing code in the form of so-called smart contracts on Ethereum [11], which is a Turing-complete decentralized smart contract system. Many companies or organizations have been building smart contract applications over Ethereum. The rapid growth of data-driven applications shifts the nature of distributed storage systems. In object-based storage (or object storage) [12] systems, each object has a unique OID allowing a server/client to obtain it without its physical location, and all objects reside in aflat address space. An object-based storage device (OSD) manages lower-level space functions after allocating space for objects. Upper applications and users connect with many objects through APIs. Increasingly, we design a distributed storage system mainly for capacity instead of performance. Distribu- tion and tiering are vital, and analytics applications are both essential and routine across any heat of data and any dimension. The applications primarily depend on semi-structured or unstructured data that is inexpensive and easy to create. As a consequence, the value of data management and analytics fuel the growth for data preservation in storage systems. In order to achieve the explosive storage growth, we A Blockchain-Based Storage System for Data Analytics ... 121 need to remove layers of inefficiency from traditional storage system architectures and present a new method optimized for scale-out application requirements. This chapter presents a blockchain-based distributed storage system, called Sap- phire, as an evolution of Gem[ 13, 14], for large-scale data analytics applications in the IoT. This system is able to support diverse data-intensive applications. In this chapter, we depict the blockchain-based large-scale storage system, which is fol- lowed to put forward data analytics based on the storage system. We present an OSD- based smart contract (OSC) as a transaction protocol, in which IoT devices interact with such blockchains in Sapphire. We develop blockchain-based storage and processing techniques, in which object storage devices employ embedded processors in the devices to process apart from storing data. Direct data process in the drives can lead to a dramatic performance growth, by avoiding redundant data transfers across storage buses and networks. The rest of this chapter is organized as follows. We introduce background and motivation in Sect. 2. We describe the system architecture of Sapphire in Sect. 3. We propose a location- and type-sensitive hashing mechanism, and a dynamic load balancing method in Sect. 4. In Sect. 5, we present an OSD-based smart contract (OSC) mechanism. In Sect. 6 IoT data analytics is presented. We summarize this chapter in Sect. 7. 2 Background and Motivation In some applications, e.g., IoT storage, social network services, and cloud storage, object storage performs better than SAN (Storage Area Network) and NAS (Network-Attached Storage) [15] for large-scale semi-structured or unstructured data sets. 2.1 Object Storage Object storage can easily support the explosive growth of data since we can scale- out OSDs geographically. Distributing data replicas can enhance data protection across multiple storage nodes. Object storage is an attractive solution to efficiently manage large-scale semi-structured or unstructured data sets from the Internet of Things. An object, as shown in Fig. 1a, consists of its ID, data, and attributes, which include metadata, policies (e.g., replication), methods (e.g., encryption/decryption), and user-/application-defined functions. As shown in Fig. 1b, each object has unique ID and pathname in object mapping. Object storage is able to be utilized for archiv- ing the IoT data, e.g., sensor, camera, and smart phone data, with high compliance. Object storage systems offer the benefit of releasing storage space by enabling users to correctly differentiate data. 122 Q. Xu et al. Fig. 1 Object-based storage 2.2 Requirements of IoT The Internet of Things enables the most efficient and effective stack including systems, interfaces, protocols, and devices, to do optimizations for distributed applications. In addition, it enables object-oriented distributed applications to directly utilize storage and fuels scale-out distributed systems. In such a way, it also enables signifi- cant gains in performance and TCO (Total Cost of Ownership). The IoT data comes from a large number of different devices generating billions of data objects, and is sampled by various of perception devices, e.g., cameras, smart phones, sensors, and RFID (Radio-frequency identification) readers. However, the IoT data from different devices has distinct structures and semantics. The IoT network consists of a large number of perception devices that automatically and continuously collect information, resulting in the explosive growth of data scale. In addition, the IoT applications usually integrate a large number of sensors to simultaneously monitor many indi- cators, such as humidity, light, pressure, and temperature, so the sampled data is usually multidimensional. Different from traditional Internet data, the IoT data has two attributes: time and space inherently to depict dynamic state changes of object locations. Most of IoT applications are isolated, but the IoT network has tofinally realize data sharing to facilitate collaborations among different IoT applications. 2.3 Smart Contract Blockchain technology has been widely utilized by companies or organizations as a means to reorganize their centralized networks due to its decentralized nature. A blockchain, as an append-only distributed database, stores transactions that are a time-ordered set of records. Transactions are grouped into blocks and form a cryp- tographic hash chain in a decentralized network.

A Blockchain-Based Storage System for Data Analytics in the Internet of Things

SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations

Veritas: Shared Verifiable Databases and Tables in the Cloud

SQL Database Management Portal

Plan Stitch: Harnessing the Best of Many Plans

Exploring Query Re-Optimization in a Modern Database System

Queryguard: Privacy-Preserving Latency-Aware Query Optimization for Edge Computing

Lesson 4: Optimize a Query Using the Sybase IQ Query Plan

Query Processing for SQL Updates

CPS216: Advanced Database Systems Notes 03:Query Processing

Protecting User Privacy Using Declarative Preferences During Distributed Query Processing

Anylog: a Grand Unification of the Internet of Things

HP Nonstop SQL/MX Release 3.1 Query Guide