A Decentralized and Secure Computing Paradigm Gihan J

1 Blockchain as a Service: A Decentralized and Secure Computing Paradigm Gihan J. Mendis∗, Yifu Wu∗, Jin Wei∗, Moein Sabounchi∗, and Rigoberto Roche’y ∗Department of Computer and Information Technology Purdue University, West Lafayette, Indiana yNASA Glenn Research Center, Cleveland, Ohio Abstract—Thanks to the advances in machine learning, data- able to contribute to machine learning tasks in a privacy- driven analysis tools have become valuable solutions for various preserving manner with reasonable financial incentivation. To applications. However there still remain essential challenges to achieve this goal, in this paper we develop a decentralized and develop effective data-driven methods because of the need to acquire a large amount of data and to have sufficient computing secure computing infrastructure that enables an effective and power to handle the data. In many instances these challenges privacy-preserving collaboration between the available end- are addressed by relying on a dominant cloud computing ven- users and data creators that are called computing nodes in this dor, but, although commercial cloud vendors provide valuable paper. These computing nodes can have restricted computing platforms for data analytics, they can suffer from a lack of power and limited computing intelligence. Additionally, they transparency, security, and privacy-perservation. Furthermore, reliance on cloud servers prevents applying big data analytics can be scattered and untrustworthy to each other. in environments where the computing power is scattered. To In recent years, several techniques have been proposed address these challenges, a decentralize, secure, and privacy- to achieve decentralized and privacy-preserving computing. preserving computing paradigm is proposed to enable an asyn- In [6], Shokri et al. proposed a privacy-preserving deep- chronized cooperative computing process amongst scattered and learning mechanism with secure multi-party computations to untrustworthy computing nodes that may have limited computing power and computing intelligence. This paradigm is designed update the single initial deep learning model. However, the by exploring blockchain, decentralized learning, homomorphic deep learning model is centralized while multiple parties encryption, and software defined networking(SDN) techniques. contribute to the model training in a manner that guaranties The performance of the proposed paradigm is evaluated via the privacy-preservation. Federated learning introduced in [7]– different scenarios in the simulation section. [10] is a distributed machine learning method that enables Index Terms—Blockchain, Decentralized and Secure Learning, model training on decentralized data. In this method, multiple Machine Learning, Privacy, Security copies of the central model can be available in distributed computing devices for training, which eliminates the single point of failure. However, both of these two methods I. INTRODUCTION must be managed by one centralized authoritative controlling Due to the advances in sensing and computing, data-driven agent, which may raise security and privacy concerns. To methods, such as machine learning techniques [1], have be- address these issues, in our work we develop a decentral- come very promising solutions for different applications [2]– ized and secure computing paradigm, which does not have [4]. However, there are two essential challenges in imple- any centralized authoritative controlling agents, by exploiting menting these techniques: (1) acquisition of large amount of blockchain, machine learning, and homomorphic encryption arXiv:1807.02515v3 [cs.CR] 4 Sep 2019 data, and (2) requirement of enough computing power, which technologies. Blockchain is an emerging technology, which enforces the reliance on a dominant cloud computing vendor. can be considered as an immutable and decentralized digital Although the cloud servers provide valuable platforms for big ledger [11], [12]. A blockchain is a growing list of records, data analytics, there remain several essential challenges in called blocks, which are linked via cryptography. Each block adopting the commercial cloud vendors: (1) transparency, (2) contains the hash value of the previous block, the transaction security, and (3) privacy [5]. With the rise of awareness for data, and the timestamp. Due to the inherent cryptographic data privacy, end-users have become reluctant to share their chaining, if a malicious party tries to manipulate certain data. Also, in another perspective, user data are becoming a transaction data, it will causes the changes of the hash values valuable asset. In some domains, such as the healthcare sector, of the block containing the transaction and those of all the federal and civil service offices, there is an abundance of subsequent blocks, which can be easily detected. Therefore, valuable data, however, due to privacy laws and regulations, generally speaking, blockchain technology provides a very these data cannot be shared with the third party. Furthermore, promising solution for integrity security. Amongst various the reliance on cloud servers also limit the potentials of existing blockchain platforms, Ethereum and Bitcoin are two applying big data analytics for the environment where the of the most widely adopted ones [11]–[13]. Compared with computing power is scattered. Therefore, it is meaningful to Bitcoin, Ethereum platform provides a trustful automation of develop a reliable infrastructure in which end-users or data programs via smart contracts that run on virtual machines. creators are able to secure the ownership of data while being In our work, we exploit Ethereum blockchain to execute 2 the secure and privacy-preserving decentralized computing Because of this, most profitable parties in the field of machine functionalities automatically. learning are large organizations, which hold both valuable Furthermore, our computing paradigm enables the effec- big data and sufficient computational power to process it. tive decentralized and cooperative learning via an effective As illustrated in Fig. 1(a), these large organizations collect learning-model fusion mechanism. Fusing multiple learning data from data contributors to advance the capabilities of their models is an active area of research and fusion strategies in machine learning techniques. One of the essential challenges literature can be divided into two main categories: (1) Late for creating large-scale datasets via data acquisition, from mul- fusion that comprise predicting the labels based on the labels tiple parties, is the issue of privacy and the related concerns. given by each learning model to be fused and (2) Early fusion Potential data providers may not get motivated to share the that takes the feature vectors given by the individual learning data because of the high potential for data privacy violations. models as the inputs and learns a classifier on top of them. Additionally, collecting tremendous raw data from multiple Although late fusion requires lower computational cost com- parties results in a huge demand on communication bandwidth pared with early fusion in many practical applications as stated and a dramatically increased attack surface. Furthermore, a in [14]–[16], early fusion can achieve a more optimal way to large amount of computational power is required by the central combine learned models compared with late fusion [17]. In server to process the collected big data. this work, our learning-model fusion mechanism belongs to the type of early fusion, in which the feature vectors to be fused present features with the highest level of abstraction. One solution is the implementation of distributed learning Additionally, we consider two strategies for designing the architectures [6]–[10]. As shown in Fig. 1(b), in distributed fusion mechanism: (1) using a fully connected structure with learning, rather than collecting and processing data in a single a single hidden-layer to map concatenated features to labels central server, data processing is distributed partially to the and (2) implementing gradual fusion to explore the uniqueness individual data providers. By doing so, the distributed learning of the individual learning models and the correlation amongst is implemented in such a way, that the computing contribu- the learning models. tors process their local data by training the given machine To further enhance the security and achieve privacy- learning models or their own machine learning models and preservation, we design a encryption interface with a zero- then share the trained model with a central controlling agent. knowledge proof protocol by exploiting homomorphic encryp- Since the data are not shared, we can say that the data tion (HE), which enables evaluating the performance of the privacy is preserved in this architecture. Additionally, the contributed learning models without revealing the sensitive machine learning models are trained in distributed locations details of the learning models. HE technology is one form with smaller sets of data, and thus the computational power of encryption that allows the computation operations to be required by the individual computing contributors is much directly implemented in cipherspace and achieves an encrypted lower, compared with that of a central server. However, in this results that, when decrypted, match the results of the opera- solution, the machine learning architecture is fully controlled tions as if they had been performed on the plainspace [18]. The by

Load more