IMPROVED DISTRIBUTED LEDGER TRANSACTIONS WITH HOMOMORPHIC COMPUTATIONS

by

HANES BARBOSA MARQUES DE OLIVEIRA

B.S., Universidade Potiguar, Brazil, 2009

A thesis submitted to the Graduate Faculty of the

University of Colorado Colorado Springs

in partial fulfillment of the

requirements for the degree of

Master of Science

Department of Computer Science

2020 © Copyright by Hanes Barbosa Marques de Oliveira 2020 All Rights Reserved This thesis for the Master of Science degree by Hanes Barbosa Marques de Oliveira has been approved for the Department of Computer Science by

Edward C. Chow, Chair

Carlos Paz de Araujo

Philip N. Brown

3 December 2020 Date

ii Barbosa Marques de Oliveira, Hanes (M.S., Computer Science) Improved Distributed Ledger Transactions with Homomorphic Computations Thesis directed by Professor Edward C. Chow

ABSTRACT

This thesis proposes a model to protect and transfer data ownership securely, driven by Homomorphic Encryption (HE) primitives and protocols applied to private technologies. Regarding Blockchain-as-a-Service (BaaS), our contribution seeks to alleviate issues such as third-party custody of digital assets, opportunism, and other adversarial issues. We addressed a recurrent challenge in a Distributed Ledger Technology (DLT) by adapting a HE to the inherent blockchain principles. Both blockchain and HE technologies suffer from performance issues, and the combination of these tools can escalate the issue. Therefore, we apply Clifford Algebra as the algebraic structure for devising an efficient solution. We show the relationships between each aspect of the proposed model, the architecture and related principles, along with the concepts and values expected for a BaaS environment. Our prototype performs well with larger keys, where the system demonstrated a linear performance when keys ranged from 128 to 4096 bits, in operations such as encryption, the addition of ciphertexts, and decryption. Although the key generation procedure showed exponential behavior on keys greater than 2048 bits, symmetric schemes such as the one analyzed can apply lower security parameters than asymmetric constructions, staying inside of the performance range of distributed ledgers.

iii DEDICATION

This work is dedicated to a very distinct group of individuals called family, my great booster in all of my efforts. To my wife and sons that I love with all my heart. To my beloved parents, my brother, and my sister. I know I am always in your prayers. To “vovó” and all my family abroad, always rooting for me. May God bless your heart!

iv ACKNOWLEDGEMENTS

First and foremost, I thank the world’s architect and most prestigious scientist, the Almighty God, Who gave me breath. What an unspeakable privilege! I am grateful for the life of Dr. Carlos Paz de Araujo, a visionary not just for the greatness in the astonishing things but also for the perception of the hidden treasures in the small ones. I also express my gratitude to Greg Jones for his encouragement in moments where his support was paramount. To Dr. Edward C. Chow, for sharing his expertise in the guidance of this research. Also, my appreciation for the insights of distinguished professors, Dr. Jonathan Ventura, Dr. Sang-Yoon Chang, and Dr. Philip N. Brown. To Ben Westrich, for his enlightenment on economics. I want to express my special gratitude to David Honorio, my brother in arms in a journey full of challenges. David proposed the cryptographic scheme for this work, accepting my suggestions on how to mold the construction into a blockchain suited proposition. Also, to Marcelo Xavier, Bryan Sosa, Bhagiradh Kantheti, and Jordan Pattee, a distinct breed of partners. Finally, my appreciation to all my family, colleagues, and professors that somehow helped in this achievement by pouring their advice and knowledge over me.

v Table of Contents

CHAPTER

1 Introduction 1 1.1 The Beginning of Blockchain ...... 3 1.1.1 The Ideological Design of Bitcoin ...... 4 1.1.2 Blockchain Fundamental Properties ...... 5 1.2 Challenges in the Opportunity ...... 6 1.2.1 Trust Rooted Problems ...... 7 1.2.2 Custody of Data and Property of Information ...... 7 1.3 Relevant Solutions ...... 8 1.3.1 as a Common Factor ...... 9 1.3.2 Private ...... 10 1.4 Our Contribution ...... 11 1.4.1 Objectives ...... 14 1.4.2 Organization ...... 15 1.4.3 Methodology ...... 16

2 Related Work 19 2.1 Bitcoin: the Blockchain Seminal Work ...... 21 2.1.1 Transactions and Scripts ...... 23 2.1.2 Mining Blocks for Consensus ...... 26 2.1.3 The Economic Vocation of Blockchain ...... 31 2.1.3.1 Opportunism and Adversarial Problems ...... 32 2.1.3.2 Property and Ownership ...... 32 2.2 Distributed Systems ...... 33 2.2.1 What is a Distributed System? ...... 34

vi 2.2.2 Fault Tolerance ...... 36 2.2.3 Byzantine Agreement ...... 39 2.2.4 Consensus ...... 41 2.3 Permissioned Blockchains ...... 44 2.4 Blockchain as a Service ...... 45 2.4.1 Old Problems for new Chains ...... 46 2.4.2 The Ownership Management ...... 48 2.4.2.1 Information as the Data Asset ...... 50 2.4.3 BaaS Adversarial Problem ...... 51 2.5 Promising Candidates ...... 53 2.5.1 Homomorphic Encryption as a Tool ...... 55 2.5.2 Geometric Algebra as the Catalyst ...... 56

3 Theoretical Analysis and Discussion 58 3.1 Theorizing a Solution ...... 59 3.1.1 Contractual Solutions ...... 59 3.1.2 Renegotiation Mechanism ...... 61 3.1.3 Ownership Transfer ...... 61 3.2 A Theoretical Cryptographic Model ...... 63 3.2.1 The Encryption Scheme ...... 63 3.2.2 A Protocol For Exchanging Keys ...... 66 3.2.3 Transferring Ownership by Key Update ...... 67 3.3 Summary of Our Contribution ...... 68

4 Research Results and Propositions 71 4.1 Applied Objects and Operations ...... 71 4.2 Blockchain Framework ...... 73 4.3 Cryptographic Scheme ...... 74 4.3.1 Key Generation ...... 75 4.3.2 Encryption ...... 75 4.3.3 Addition ...... 77

vii 4.3.4 Scalar Division ...... 77 4.3.5 Decryption ...... 78 4.4 Key Exchange Protocol ...... 79 4.5 Key Update Protocol ...... 82 4.5.1 Token Generation ...... 82 4.5.2 Key Update ...... 83 4.6 Conclusion ...... 84

5 Software Architecture of SCOT 86 5.1 System Development ...... 87 5.1.1 Use Case Scenario ...... 87 5.1.2 Requirement Analysis ...... 88 5.1.3 System Constraints ...... 89 5.2 SCOT System Design ...... 90 5.2.1 SCOT Architecture View ...... 91 5.2.2 Why a CLI Prototype? ...... 93 5.2.3 The Application Lifecycle ...... 94 5.2.3.1 Persisting Information ...... 94 5.2.3.2 Generating a Report ...... 95 5.2.3.3 Generating a Result ...... 96 5.3 Application ...... 97 5.3.1 Prototype Execution ...... 99 5.3.2 Inserting Sensitive Data Into The Blockchain ...... 99 5.3.3 Creating Analysis Over Third-Party Data ...... 102 5.3.4 The Ownership Transfer ...... 103 5.3.5 Analyzing the Information Transferred ...... 105

6 Performance Evaluation of SCOT 107 6.0.1 Key Generation ...... 108 6.0.2 Encryption ...... 109 6.0.3 Addition ...... 110

viii 6.0.4 Decryption ...... 111 6.1 Experience Outline ...... 112

7 Lessons Learnt 114 7.1 A Chain of Events ...... 115 7.2 Misleading Conceptions ...... 116 7.3 Analytical Benchmarking ...... 117 7.3.1 Generated Artifacts ...... 118 7.4 Development Challenges ...... 120

8 Future Directions 122

9 Conclusion 123

Bibliography 125

APPENDIX

A Setup And Configuration Guides 141 A.1 Prerequisites ...... 141 A.1.1 Git ...... 141 A.1.2 Fabric Samples Repository ...... 141 A.1.3 Golang ...... 142 A.1.4 Docker And Docker Compose ...... 142 A.2 Setup ...... 142 A.2.1 Preparing The Smart Contract ...... 144 A.2.2 Installing And Instantiating The Smart Contract ...... 144

ix List of Tables

TABLE

6.1 Bit lengths for different SCOT security levels...... 107 6.2 Bit lengths for different security levels...... 108 6.3 Equivalence of security parameters...... 108 6.4 Key generation data...... 108 6.5 Encryption data...... 109 6.6 Addition data...... 110 6.7 Decryption data...... 111

x List of Figures

FIGURE

2.1 Double-spending solution decision flow...... 20 2.2 structure...... 22 2.3 Transaction structure and dependency...... 23 2.4 Locking and unlocking scripts...... 25 2.5 Locking and unlocking scripts evaluation...... 26 2.6 Bitcoin block structure...... 27 2.7 Bitcoin network...... 28 2.8 Bitcoin block linked structure...... 29 2.9 Bitcoin sequential transactions...... 30 2.10 Groups of processes communicating as logical units...... 35 2.11 Fault-tolerant strategies...... 37 2.12 Inconsistent replication of the state...... 38 2.13 Byzantine failures seminal works...... 40 2.14 Flat and hierarchical groups...... 41 2.15 Byzantine agreement evolution...... 43 2.16 Correlation between characteristics of ownership...... 47 2.17 IBM BaaS trust model...... 49 2.18 Main private frameworks timeline...... 54

3.1 Mapping and abstract model contribution...... 69 3.2 Abstract solution triad for ownership management...... 70

4.1 Concrete model contribution...... 85

5.1 Complete contributions diagram...... 87

xi 5.2 Abstract Architecture ...... 92 5.3 Key Generation And New Record ...... 95 5.4 Report Generation ...... 96 5.5 Token generation and key update...... 97 5.6 List of available functions...... 99 5.7 Key generation output...... 100 5.8 Key ring file created by the key generation command...... 100 5.9 Chaincode invoke successful output...... 101 5.10 Chaincode query output...... 102 5.11 Proposal query output...... 103 5.12 Decryption output...... 104 5.13 Token generation output...... 104 5.14 Chaincode query output...... 105 5.15 Chaincode query output...... 106

6.1 Key generation benchmark...... 109 6.2 Encryption benchmark...... 110 6.3 Addition benchmark...... 111 6.4 Decryption benchmark...... 112

7.1 Simplified blockchain decision tree...... 117

A.1 Docker compose command output ...... 143 A.2 Docker ps command output ...... 143

xii CHAPTER 1

Introduction

Blockchain is a combination of existing contributions in an evolutionary development for a new approach to electronic cash. It dawned on its more prominent appearance as the Bitcoin system [1], a solution for transactional autonomy. The motivation behind it was influenced by the perception that the government controlled the whole society through the manipulation of information and means of exchange [2]. The nascent concept relied heavily on cryptography to guarantee trustworthy transactions along with anonymity [3–5]. Also, as a Peer-to-Peer (P2P) distributed network, the model enforced that no third party should control commercial relations, manipulate transactions, or impose censorship. Since then, blockchain has become a trendy buzzword, denoting electronic money, a distributed ledger, or just the next Ponzi Scheme. Nonetheless, the idea has drawn attention from many industry and academia sectors over the last years and has been scrutinized with scientific rigor. As the technology popularized, its vulnerabilities and limitations became an active debate [6–8], and components of the system were modified to accommodate new functional objectives. Now we can witness two main groups of development with two different core philosophies, represented by the public and private blockchains. The public model, also known as permissionless, retains the initial principle of protecting the anonymity of its users. This is aligned with the crypto-anarchical utopia that influenced the first archetype. The private model goes in the opposite direction, making sure that all users will identify themselves, thus restricting relations to an authorized group. This model is defined as permissioned. Nevertheless, blockchain has evolved while still relying partially on principles of its seminal construction. However, the industry-driven principles are now guiding the choices of which components should be valuable for the tasks at hand. Concerns about centralized systems have diminished, and private solutions are exploring the advantages of more efficient Chapter 1 Introduction

architectures. In fact, the technology used for digital cash was already present in the academic circles [9]. Fault-tolerant distributed computation ideas, and consensus mechanisms were proposed long before Bitcoin, and some of these earlier innovative ideas are now finding their way into Distributed Ledger Technologies (DLTs)1 applications. Nevertheless, the cryptocurrency movement, combined with a context of bankruptcy, was the catalyst for the adoption of Bitcoin and, consequently blockchain. Choosing the right model, though, either public or private, will depend upon the alignment between the premises of each business and the principles behind a solution. However, some building blocks remain indispensable for both of them. Although not every blockchain shares the same origin, secrecy permeated the concept since the beginning. Regardless of the design used to keep the consistency of data, cryptography has been used in every case. For instance, permissionless solutions can use public-key cryptography to allow one to create transactions inside a network. On the other hand, permissioned solutions will demand a certified identity that falls into the same requirement of cryptographic signing. Hence, it is easy to see why each blockchain version makes use of cryptography at its core. When it comes to cryptocurrencies, cryptographic proof is essential to establish an electronic payment system among all the building blocks. Many derivations of the initial ideas, even those not related to an electronic currency, rely heavily on cryptography to improve security and privacy due to the permanence and distribution of participant’s data. The immutability of transactions in an ever-growing ledger also creates an extensive database and can become a problem when it comes to forward secrecy. In this work, we investigate the private blockchain model, analyzing its privacy. Our purpose is to assess how the private sector can benefit from computing data with secrecy so that commercial transactions can be efficiently undertaken while adversarial issues are addressed. Among others, we delve into the problem of data custody and the risk of economic loss that inhibits blockchain adoption in an enterprise setting.

1 The state replication in a private blockchain setting is comparable to 2-phase protocols [10], a traditional mechanism of shared databases. Also, its centralization level makes the blockchain classification not completely accurate [11]. However, we explore attributes that share similarities with the core principles of a blockchain. Therefore, for generality, we use DLT and blockchain interchangeably.

2 1.1. The Beginning of Blockchain

In our approach, executable code can be shared with peers while the sensitive data is kept private. Finally, the results of such computation are accessible to all authorized participants. We propose an architecture where the encryption and decryption mechanism is planned to reside outside the chain. However, parties can persist previously agreed upon scripts for every operation applied to the ciphered data on the chain. The executable code is known by all parties and can be tested on their own. The algorithmic scrutiny will favor trust, ensuring that the immutable code is accurate. The propositions here rely on the assumption that, regardless of the distributed ledger construction, the availability of a cryptographic model for management of assets will favor privacy, security, and efficiency that may foster acceptance of private blockchain solutions.

1.1 The Beginning of Blockchain

The blockchain core description is usually associated with a chained track of events, assembled into an immutable ledger distributed and maintained by a set of participants in a network. This description is arbitrary, as many of the definitions describing the technology. Blockchain does not have an official definition [9], being a loose umbrella for many technologies that resemble, in some degree, components, and principles behind Bitcoin, the peer-to-peer electronic cash system [1]. Bitcoin is not the first electronic cash system; neither is made of any cutting edge component that isolated represents the contribution from its creator(s). For instance, digicash [12], hashcash [13], b-money [4] and bit gold [14] had already been proposed as means for digital money when Bitcoin was implemented. Bitcoin’s success and recognition come from how the technology combines old academic ideas into a new approach for a decentralized system. Additionally, its first appearance was timely aligned with a context where the predominant trade model had weakened the notion of trust. These factors empowered the adoption of the new exchange method. Therefore, the Bitcoin initiative is considered the model that brought to life the economic perspective of the cypherpunks [2] and thus is considered the work that bears the philosophy and technology comprising the blockchain concept.

3 1.1. The Beginning of Blockchain

We investigate the model’s evolution, from its crypto-anarchical vocation to its adoption by enterprise-grade applications and cloud services. We start by analyzing the Bitcoin ideological and technical background. Firstly, we present its political and philosophical roots. Secondly, we show how these beliefs connect to technological choices. This is important in understanding why the constituent parts were chosen, examining not just the technical aspects but also the economic motivation in the gathering of technologies. In doing so, we can differentiate which aspects of a blockchain are appropriate when devising new solutions. Although blockchain spawned in many different evolutionary paths, our focus goes towards the private sector and how the technology settled for joint efforts.

1.1.1 The Ideological Design of Bitcoin

When public-key cryptography became available, the US government opposed its circulation, fearing that it would be an advantage for America’s enemies, criminals, and terrorists abroad [2]. Consequently, under an authoritarian state cloud, an insurgency of unsatisfied individuals versed in mathematics, cryptography, computer science, and economics emerged. With a strong political and philosophical motivation, a small group started to have regular meetings, resulting in the creation of the cypherpunks. The name was coined after the cyberpunk sci-fi genre, which also inspired the group’s ideology. For instance, a Frequently Asked Questions (FAQs) section in a cypherpunk document would advise the reading of “True Names,” a sci-fi novel and seminal work of this writing style [15], and “The Machinery of Freedom,” a nonfiction book that advocates for an anarcho-capitalist society [16]. The cypherpunks consisted mostly of individuals ranging in a spectrum of anarchism that targeted the state as the enemy [2]. What all of them had in common was the belief that cryptography and the fight for privacy and freedom in cyberspace had the utmost importance. As the main political voice in the group, Timothy C. May nurtured the movement with his thoughts. May was a former chief scientist from Intel and one of the group’s founders, along with Eric Hughes, a Berkeley mathematician, and John Gilmore, a former computer scientist from Sun Microsystems. May regarded the state as a source of evil and advocated for tax avoidance, money laundering, and markets for information. Some of his ideas were considered extreme, such as “assassination markets” that inspired a lottery fund for the assassination

4 1.1. The Beginning of Blockchain

of politicians [2]. In his work Cyphernomicon [5], May defines what a crypto-anarchy is, inspiring Wei Dai to share b-money [4]. In this protocol, a medium of exchange (money) and a way to enforce contracts is proposed using untraceable entities. Afterward, Nakamoto uses b-money as a reference for a design without a trusted party in Bitcoin. For the cypherpunks, the most important tasks were related to writing software that would protect communications. For instance, each new subscriber to the mailing list received a message written by Eric Hughes, stating the core principle laid down by his formative Cypherpunk Manifesto [3]: the devotion to apply cryptography in defense of privacy. Also, they incentivized discussions about politics, economics, public-key cryptography, Perl scripts, digital cash, DC-nets, mail handlers, and remailers. Nevertheless, their focus leaned towards public-key schemes and its combination with remailers and digital cash ideas. First, because remailers would apply encryption to allow anonymous communication by encrypted e-mail traffic. Moreover, an electronic cash solution based on cryptographic proofs would be used as a way of disguising transactions, and therefore realizing a mean for markets (e.g., markets for information). In this context, digital cash starts to play an essential role in the cypherpunk agenda, because it would create the crypto-anarchy that has been romanticized amid rebels. Also, remailers were rapidly created in the community, whereas digital cash had not yet emerged after years of active discussion due to its complexity [5]. Furthermore, Bitcoin was proposed in an environment eager for a working concept that would impersonate the philosophy and principles defended by the community.

1.1.2 Blockchain Fundamental Properties

Before defining blockchains’ general properties, it is crucial to narrow down the most common attributes amongst known blockchain initiatives. These properties come from the characteristics of the underlying technologies. For example, the non-repudiation that is frequently associated with blockchain is inherent from the digital signatures used in the technology. Likewise, blockchain had its meaning expanded from a system that implemented a cryptocurrency to become a data store and a computational infrastructure. Now, the technology is also used to manage assets and guarantee the execution of agreed-upon business logic.

5 1.2. Challenges in the Opportunity

A blockchain is a distributed ledger structured in a chain of blocks, where each block is linked to its predecessor by a cryptographic hash [17]. A distributed ledger is an immutable storage of transactions distributed across many machines. Immutability, in this context, means that transactions can only be appended and not deleted or modified without awareness. Xu et al. define five properties that are enhanced by a blockchain named immutability, non-repudiation, integrity, transparency, and equal rights [17]. Bashir also emphasizes high availability, simplification, and cost-saving as advantages, along with the benefits mentioned above [11]. In summary, blockchain establishes trust in the cryptographic proof, enabling all the advantages mentioned. Additionally, it avoids the imbalance of power when sharing data, creating a common ground where parties can cooperate in a consensual manner. For our purposes, we focus on the blockchain capacity to provide immutability, non-repudiation, transparency, cost saving and a balanced management of power over assets. Transparency lies at the heart of our discussion since data privacy is one point of criticism for blockchains, mainly the public ones [17]. It is also part of the custody problem that is faced when using blockchain in a third-party provider or cloud service. Again, cost-saving has incentivized the adoption of Blockchain-as-a-Service (BaaS) and goes hand in hand with the problem of managing information property. BaaS is a third party production and operation of cloud-based software for companies interested in the development of blockchain apps.

1.2 Challenges in the Opportunity

BaaS has been adopted by many industry players nowadays [18], with the Fintech sector creating momentum for the technology [19]. However, besides the lack of skilled workers to drive the change and the skepticism that permeates companies, businesses may retreat due to problems mostly based on trust issues [20]. Initially, the study of the trust model introduces the dilemma here, a notion typically present in commercial relations. Nevertheless, it helps to lay the groundwork for understanding a more critical problem related to data custody. These issues will occur when businesses start evaluating their own set of rules or the rules set by their business partner.

6 1.2. Challenges in the Opportunity

1.2.1 Trust Rooted Problems

In his cypherpunk manifesto [3], Eric Hughes mentioned that faceless organizations would not grant us privacy out of their beneficence. The cypherpunk community was explicitly against the reputation-based business model known as the trust model. This model helps perpetuate many of the flaws, according to their beliefs, that compromise our privacy and anonymity. Therefore, when considering transactions and exchanges for goods and services, they enforced cryptographic proof instead of trust. For instance, we can see such a model’s implications when an individual applies for a personal loan. Before applying through a bank institution, he first needs to identify himself with proper documents issued by an accredited authority. Second, he must create a good reputation and positive scores along the way, so that the institution can check a successful track of events to calculate the best interest rate available. In a favorable outcome for the applicant, he can sign a contract and agree to a specific monthly payment. On the other hand, after lending the loan amount to the applicant, the bank has no guarantees that it will receive all monthly payments. Although collateral can be enrolled in the documentation, there are no guarantees that it will be available in the near future. The trust model also presents some problems when it comes to data and information. One of them relates to the fact that a contract is the foundation of the certainty of obedience; moreover, no contract can prevent business partners’ adversarial actions. Furthermore, even a law-enforced contract does not guarantee that parties will abide by such an agreement.

1.2.2 Custody of Data and Property of Information

When a blockchain network’s participants decide to share a common ledger, they will persist data by agglomerating transactions into a ledger. This structure is available to all participants by some synchronization process. Consequently, storing data on a distributed database is equivalent to sharing ownership over the information inferred from it. Therefore, from how agreements are made based on trust, the issue of custody will arise. Data has special characteristics. For instance, it can be replicated without losing its original counterpart, unlike physical assets. Data assets, for example, can also be recreated

7 1.3. Relevant Solutions

in several ways, such as orally, written, or electronically. Consequently, when data represent an advantage as a competitive asset, a great part of its value relies on its exclusivity [17]. The sharing of such resources does not eliminate the original form, but it actually affects the value of possession. Parties can behave conflictually in a business relationship [21–24] and legal contracts can not limit how partners interact with third-party data [25–27], since documents, if applicable, depend on law enforcement processes with corresponding punishment. Consequently, as stated earlier, we have the possibility of two key results in this case:

1. If one company perceives a high probability of misconduct from its partners and the asset’s value exceeds the relationship’s worth, then agreements (e.g., data sharing) will not be realized;

2. After deciding to share data and perceiving that improper use of assets occurred, one company will go to court and probably count the losses.

Both results are not desirable, and some firms will avoid the struggle of sharing data by acquiring their partners rather than cooperating [26].

1.3 Relevant Solutions

Organizations either will share information in its plain form or use tools for privacy to securely transact in a blockchain setting, although it reduces its usefulness. Many solutions are available, either on-chain or off-chain, but in general, these tools will segregate participants or data, keeping different versions of the historical truth, also known as a fork. In doing so, computation over that information is not allowed unless the data owner grants other participants access for analysis. Many initiatives already offer privacy at some level. Regardless of their permissioned or permissionless construction, their conceptual ideas are promising. For instance, Enigma [28], Solidous [29], Confidential Assets [30] and Hawk [31] are some of the examples of protocols for confidentiality. Private solutions like Hyperledger Fabric [32] have insights for on-chain secure- MPC [33] along with its already existent channels and private collections. In the permissioned

8 1.3. Relevant Solutions

venue, interesting features are offered by r3 Corda [34] and Quorum [35]. Nonetheless, in Section 1.4.1, we investigate what principles will be considered when analyzing existing frameworks.

1.3.1 Cryptography as a Common Factor

Every blockchain framework applies some cryptographic resource when implementing private communications and the share of data. Cryptography became a source of studies for many fields and applications. According to Katz and Lindell [36], it is a scientific study of techniques for securing digital information, transactions, and distributed computations. Initially used for military concerns to enable secret communications, it has spread to every digital form of industry interaction. Furthermore, advances in cryptography allowed new protocols from One-Time Pad (OTP) to private-key and public-key schemes. Nonetheless, the ever-growing need for communication, storage, and computation pushed the use of additional functionalities, such as Multi-party Computation (MPC), Zero-Knowledge Proof (ZKP), and Oblivious Transfers. One of the most prominent ideas that emerged is Homomorphic Encryption (HE). Although HE is still a work in progress in cryptography, it promises a solution that com- bines many of the desired attributes in a secrecy system. A HE scheme allows computability over encrypted data, keeping its meaning. Initially, it was proposed by Rivest et al [37], but its first implementation came from Gentry [38]. Since then, various schemes have been presented to improve some aspects of the concept. However, efficient, secure, and lightweight solutions that can meet industry-grade products’ requirements are yet to be proved. When we look at the blockchain history, one of the pillars is the cryptographic proof. It enforced a cryptographic correlation between blocks, favoring what is claimed to be its immutability. With the requirement for ciphered data inside transactions, a homomorphic scheme would greatly benefit the private distributed ledger scenario mainly because organiza- tions rely on third-party architectures for computation and storage (e.g., cloud). Therefore, we started our research by analyzing a scheme initially designed for Internet of Things (IoT) devices called Enhanced Data-Centric Homomorphic Encryption (EDCHE) [39]. The scheme shows satisfactory performance and efficiency, also being adaptable to meet market standards

9 1.3. Relevant Solutions

in new forms, such as in [40]. However, security concerns drove our attention to a new construction that still uses the same mathematical framework but offers better security assumptions [41]. The HE scheme in this work was chosen to compute transactions through smart contracts on-chain. Additionally, a mechanism for changing the key homomorphically is investigated. Again, operations will only happen inside the blockchain, and no external resource will be used to deliver the outcome. Therefore, every participant has access to the scripts, although each agent may have different encrypted data.

1.3.2 Private Blockchains

Organizations and firms have been adopting blockchain for their immutable track of events, provenance, and shared ownership. In the beginning, public versions of the technology, kept their users anonymous, preserving their identities. Participants should agree on a single history, and all transactions were made public. In the Bitcoin case, its design aimed to avoid censorship by the government. In the private sector though, the government is part of the equation since the firm’s creation, in a scenario where deals are enforced by law and regulations. In the case of private organizations, the interaction between companies is made upon identification and contract settlement. All of those stages happen before sharing information or cooperating into a system. For that reason, a blockchain solution must meet the needs of the niche, promoting governance and accountability. Consequently, the presumed risk comes from partners, and there is constant management of the trust. For those cases, permissioned blockchain solutions are more appropriate because the objective is to put in place a tool to mediate expectations. Solutions like Hyperledger Fabric or Quorum are sufficient to promote such type of supervised cooperation. Private architectures also make it possible to execute operations through scripts (i.e., smart contracts). The data generated is shared under a consortium; therefore, no hierarchy is enforced over it without previous agreement. Consequently, institutions will have balanced ownership by the implementation of a consensus protocol and related policies. Additionally,

10 1.4. Our Contribution

a consensus is designed to be efficient in this context, not depending on computationally expensive procedures or economic incentives. Nevertheless, since companies may have a delicate commercial relationship to navigate and ways of participating their information, they probably want to remove signification from data intentionally. However, creating an analysis from a shared asset is still a perk for the partnership and can leverage negotiations. In this situation, what is sensitive in a database is arbitrary. However, meaning could be extracted from ciphered data preventing legal consequences like in medical applications [42] or shared data from individuals [43].

1.4 Our Contribution

This work introduces a new combination of existing solutions to yield a framework for sharing information in a distributed ledger environment while protecting ownership over information. Our goal is to infer analysis through homomorphic operations over encrypted data inside a private blockchain architecture. If desired, our construction can be further expanded to a BaaS setting by applying different technologies through our implementation-agnostic model. Experimental constructions addressing the problem of combining a HE scheme with mechanisms to manage the ciphertext’s ownership have been discussed in [44]. Those propositions are based on non-conventional mathematical resources for cryptographic solu- tions [40,45,46], which can eventually be considered strategic candidates for the realization of ideas discussed in this manuscript. The present work focuses on formal definitions that are meant to be technology and implementation agnostic. We hope to be able to establish a theoretical ground that supports several practical applications. However, as a fundamental first step towards proper constructions, we also propose an architecture to translate the formal model into a practical approach. Additionally, we implement a reduced version of our theoretical model in order to test its feasibility. Finally, we perform a comparative benchmark for a focused analysis of our cryptographic library. The contributions above are presented through two artifacts. First, a thesis report document that introduces the theoretical background and discusses solutions based on an analysis of the problem at hand. Second, a proof of concept system implements a reduced

11 1.4. Our Contribution

version of the solution proposed by the thesis. These deliverables are listed with its constituent parts below:

• Thesis report:

1. Analysis: We provide an analysis of the problem, along with a correlation between theoretical solutions and technological measures that can be applied to BaaS environments;

2. Solution: We propose a solution as a combination of existing technologies that can alleviate the effects of the studied problem;

3. Model: We devise an abstract model to the proposed solution that serves as a scaffold in the understanding on how to encrypt, compute and transfer ownership of data homomorphically, in a DLT setting;

4. Implementation: We create a prototype that implements our formal model into a working system, following the design and constraints of a private blockchain framework.

• System:

– Network scripts;

– Docker scripts:

∗ Dockerfile script for Hyperledger Fabric CA (Certificate Authority) Im- age;

∗ Dockerfile script for Hyperledger Fabric Orderer Image;

∗ Dockerfile script for Hyperledger Fabric Peer Image.

– Bash scripts:

∗ CA Identity scripts;

∗ Deployment scripts;

∗ Blockchain query scripts.

– FHE Golang library;

12 1.4. Our Contribution

– Smart contract;

– FHE Golang library Command Line Interface (CLI) client.

To summarize, our contribution is comprised of individual and shared efforts towards the realization of a conceptual mapping, design pattern, concrete model, and a prototype. We list here these contributions with a clear separation of roles in the achievement of our final proposition. The conceptual mapping (1) is a combination of components 1a, 1b, and 1c. The analysis that component 1a is suitable to enforce contractual solutions is approached by the literature [25,47,48]. However, the relationship between items 1b, and 1c to theoretical economic propositions is my contribution. Again, the combination of the previous items (i.e., 1a, 1b, 1c) as an unique block is also my contribution. The design pattern (2) is an organization of items 2a, 2b, and 2c. I worked with David W. H. A. da Silva to devise a separation of essential functions that would serve as abstract components of a blockchain. Both conceptual mapping and design pattern are discussed in Chapter 3. The concrete model (3) is the mathematical implementation of the placeholders defined in 2 by items 3a, 3b, and 3c. Along with da Silva in [41] we proposed a construction that has useful characteristics for DLTs. The model uses concepts from the aforementioned design pattern (2) that makes a scheme useful for a blockchain setting. The concrete model is approached in Chapter 4. The prototype (4) is composed by a blockchain testing network based o Hyperledger Fabric (4a), and three Golang based components: library (4b), smart contract (4c), and command line system (i.e., CLI) (4d). The test network 4a is an available resource from the Hyperledger Fabric open source project. However, the library, smart contract, and CLI are my contribution to test the concepts approached in Chapter 3 and Chapter 4. The prototype is discussed in Chapter 5.

1. Conceptual mapping:

(a) Blockchain as a contractual solution enforcer;

13 1.4. Our Contribution

(b) HE as a renegotiation mechanism;

(c) Key update protocol as an ownership transfer mechanism.

2. Design pattern:

(a) Encryption scheme pattern;

(b) Key exchange pattern;

(c) Key update pattern.

3. Concrete model:

(a) Encryption scheme;

(b) Key exchange;

(c) Key update.

4. Prototype:

(a) Blockchain test network;

(b) Library;

(c) Smart contract;

(d) CLI.

1.4.1 Objectives

Companies are already implementing permissioned versions of blockchain for their niches. In doing so, they minimized operations costs by using third-party facilities like cloud services, code versioning tools, or proprietary software. Again, many organizations count on off-chain computation and third-party custody. Unfortunately, some of these building blocks have their privacy relying on legal statements. We assume that companies are willing to cooperate to a higher degree if their tools would preserve data ownership to a chosen extent. For that, we suggest that a private blockchain can allow encrypted data to be computed on-chain, relying on cryptographic proof instead of trust. This work does not aim to inhibit the use of BaaS architectures

14 1.4. Our Contribution

but to promote research and analysis of the best practices to use these services securely. In our proposition for a solution, we defined some principles to guide our choices, namely Confidentiality, Autonomy, Self-Sovereignty, and Computability (CASC). These concepts were chosen based on blockchain principles and the ownership management analysis presented in Chapter 2. Therefore, CASC stands for:

• Confidentiality: sensitive data is always encrypted and decrypted outside the blockchain environment (off-chain), remaining encrypted inside the blockchain ap- plication (on-chain);

• Autonomy: members of a blockchain network can compute encrypted third-party’s data;

• Self-Sovereignty: the transfer of ownership for an specific datum can only be realized by its owner;

• Computability: computations over encrypted data can only be executed through previously agreed upon smart contracts.

1.4.2 Organization

We outline here the order of information for the thesis. The research and proposed solutions are presented in the following structure and sequence: Chapter 2: We introduce the seminal work of Bitcoin, along with distributed systems’ concepts that set the stage for blockchain while looking into the philosophical, political, and economic reasons why such advances were created, adopted, and evolved. Furthermore, the data ownership and related custody problem is analyzed for commercial relationships and cloud service providers. Finally, we discourse about available solutions and propositions for the privacy and secrecy of data. Chapter 3: In this chapter, we present our theoretical contribution, explaining the rationale behind our assumptions. This section lays down the foundation for the implemen- tation that takes place in Chapter 4. First, we map theoretical propositions from economics to cryptographic solutions in a novel approach, gathering the resulting combination into

15 1.4. Our Contribution

a distributed ledger environment. Moreover, we investigate the proper administration of digital assets by using key management protocols that possess attributes comparable to the introduced theoretical solutions. Chapter 4: This chapter takes as input the abstraction presented in Chapter 3 and shows an implementation-agnostic model. Furthermore, existent technologies are chosen to define the components for a blockchain private framework, cryptographic scheme, and key update protocol. Chapter 5: Here, we present our proof of concept named Smart Contract for Ownership Transfer (SCOT). This system applies the abstract model proposed in Chapter 3 and the cryptographic construction devised in Chapter 4. First, we define a fictional scenario with restrictions and policies that resemble the adversarial problem defined in Section 2.4.3. Second, we show an application that transfers ownership in the scenario mentioned above by implementing the cryptographic scheme in an industry-ready framework. We also discuss the generated artifacts in the application by comparing its results to an implementation of the Paillier cryptosystem. Chapter 6: After being introduced in Chapter 5, the SCOT system have its performance evaluated here. Again, we use the Paillier implementation introduced in Chapter 5 as a baseline. We analyze both implementations’ main common functionalities, such as key generation, encryption, addition, and decryption. Chapter 7: In this section, we list misconceptions and challenges found in this research. We explain how we overcame problems and our learning process in order to accomplish the research goal. Chapter 8: To improve this research, we suggest continuity venues and open the discussion with questions that can nurture future enrichments. Chapter 9: This chapter summarizes our significant contributions and discusses our concrete results.

1.4.3 Methodology

Our motivation was influenced primarily by the study and analysis of problems related to the custody of data and its effects on DLTs. With the adoption of cloud based services (i.e.,

16 1.4. Our Contribution

BaaS), the role of this new intermediary was also analyzed as a concern when designing secure applications. Initially, the blockchain technology was studied with a careful analysis on its origin and evolution. We also examined how distributed systems contributed to modern DLTs. Following, we analyzed the problems and threats encountered by market players when taking on shared ledgers as a common ground. In addition to that, we investigated which technologies are available to manage the privacy and secrecy of data for blockchains. Furthermore, we followed the root cause for the adversarial problems and how theoretical propositions in the literature devised solutions. As a result, we stated our assumptions and choices for the right technological representatives in order to realize a practical solution. We describe below, the sequence of steps taken in this research:

1. Research on blockchain;

(a) The blockchain seminal work;

(b) Fault-tolerant systems and consensus mechanisms;

(c) Fundamental properties of distributed ledgers;

(d) BaaS and adversarial problems.

2. Research on the economics of assets;

(a) The relations between data, information and ownership;

(b) Contracts and opportunism;

(c) Analysis of theoretical solutions.

3. Analysis of available solutions for DLTs;

4. Definition of principles for an intended solution;

5. Definition of candidates for a new solution model;

6. Applicability by a proof of concept;

7. Definition of metrics for evaluation.

17 1.4. Our Contribution

(a) Metrics to evaluate the mathematical operations;

i. Encryption;

ii. Decryption;

iii. Key Generation;

iv. Computation on cyphertexts.

(b) Metrics to evaluate generated artifacts.

i. Keys;

ii. Cyphertexts.

18 CHAPTER 2

Related Work

Bitcoin was proposed by Satoshi Nakamoto in 2008. The new payment system relied on cryptographic proof to oppose models that depended on trusted third parties (e.g., banks). When a centralized authority is introduced, the fate of the entire system depends on the institution behind it. For instance, this organization can be censored, becoming a single point of failure. Hence, the system was intended to be distributed in a peer-to-peer fashion. The Bitcoin paper is basically an intricate combination of existent solutions to solve the double-spending problem and its derivations. Since computers have virtually no cost to copy data, a design for a distributed cash system must encompass a trustworthy mechanism to consolidate payments avoiding fraud. A trivial solution is the use of mint. In this model, the mint decides the validity and chronology of all transactions, but it incorporates a trusted party subject to intervention. We depict an overall correlation of decisions presented in the Bitcoin paper. The relationships between problems and the main concepts behind the solutions presented by Nakamoto are shown in figure 2.1. Chapter 2 Related Work

STATE REPLICATION MECHANISM (CONSENSUS)

CENTRALIZED PROOF-OF-WORK

DISTRIBUTED MAJORITY DECISION

ALTERNATE CHAIN

INCENTIVES PROBABILITY

DOUBLE-SPENDING TIMESTAMP SERVER

PUBLIC KEY CRYPTOGRAPHY

DOCUMENTS TRANSACTIONS PRIVACY

DATA STRUCTURE (STATE)

Figure 2.1: Double-spending solution decision flow.

In figure 2.1, we separate topics into concepts related to the state to be persisted and the replication mechanism that promotes the distribution of state. Nakamoto proposed a solution beginning with a timestamp server [1], although the service analyzed was a centralized system. Also, this system used generic documents as inputs. So, a data structure was proposed for the state, where the users’ privacy would be satisfied by the use of public-key cryptography as digital signatures. The centralized timestamp server idea was modified into a distributed one by the use of a Proof-Of-Work (PoW). Moreover, other problems were foreseen, such as an attack based on collusion. Therefore, a proper measure was devised with the application of probability and economic incentives.

20 2.1. Bitcoin: the Blockchain Seminal Work

2.1 Bitcoin: the Blockchain Seminal Work

The Bitcoin solution starts with the definition of a peer-to-peer timestamp server. Although Nakamoto introduces a transaction structure first, the proposition to solve the double-spending problem have the TIMESEC project [49] as its initial blueprint, along with publications from Haber and Stornetta [50, 51], and Bayer and Haber [52]. TIMESEC was a Federal Government project in Belgium introducing a design for a digital timestamping system [49]. Its design aimed to lower the trust needed in the centralized component by increasing the amount of work used to verify timestamps in the Secure Time Authority (STA). A digital timestamp is a digital certificate that assures the existence of a digital document at a certain time [53]. A verification of when a document was created or modified is very useful in many situations. For example, a sequentially numbered notebook stamped by a public notary could help the claim that such ideas were written before a specific date. However, in this case, there is a trusted party (i.e., notary) that is assumed to not be corrupt. Similarly, when Massias et al. presented TIMESEC for a service with a minimal trust requirement, they concentrated on the trusted third party approach (i.e., centralized) because the distributed techniques were considered “not really workable in a professional environment” [53]. The method used a binary tree structure described by Haber and Stornetta in [50, 51], based on the Merkle’s tree authentication method [54]. Initially, Haber and Stornetta suggested two different approaches for a solution. First, a centralized Time-Stamping Service (TSS) and potentially untrustworthy. Second, a distributed trust mechanism among a large set of users that allowed each user to sign messages, relying on the assumption that it would not be feasible to corrupt all participants. In both approaches a collision-free hash function h : t0,1u˚ Ñt0,1ul is used, where h converts, in polynomial time, an arbitrary length bit-string into a fixed-length l bit-string. The process timestamps documents by rounds of a fixed duration. For instance, the service can be listening for timestamp requests coming from users between time T0 and T1, comprising a Round. In order to use the service, a client converts a document into a hash string using the hash function h (e.g., Hn “ hpDnq). In figure 2.2, four different documents

21 2.1. Bitcoin: the Blockchain Seminal Work

(i.e., D1,D2,D3,D4) are hashed, generating respectively four hashes (i.e., H1,H2,H3,H4).

Furthermore, hashes H1, H2, H3, and H4 are sent to the server that concatenates the leaf values of this Round to obtain the parent values H12 “ hpH1 | H2q and H34 “ hpH3 | H4q until a single value (Round Root Value) is obtained for H14 “ hpH12 | H34q. At this point, the Round Root Value (i.e., H14) is concatenated with the previous round value forming the

Round Value RH piq “ hpRH pi ´ 1q | H14q. Figure 2.2 shows the resulting structure from the aforementioned procedure in Round.

Hashed Documents

Documents

Figure 2.2: Merkle tree structure.

As a response, users receive back a timestamp t containing the values that can build the respective branch of the tree for their related documents. For instance, the timestamp returned for D3 in this procedure is t3 “ tpH4,Rq,pH12,Lq,pRHi´1,Lqu. Moreover, these Round Values can be publicly published in order to be widely witnessed (e.g., newspaper). By doing so, the procedure performed by the STA can be audited, reducing the probability of fraud. The unmodifiable media publication creates a base for the trust that these Round Values were pinned in a given moment in time. This is the base for the immutability concept of blockchains.

22 2.1. Bitcoin: the Blockchain Seminal Work

2.1.1 Transactions and Scripts

The messages chronologically organized by the timestamp server regarded documents. For Bitcoin, Nakamoto proposed a “usual framework of coins” [1] (i.e., transaction structure) to substitute documents as the state to be transmitted. A transaction in Bitcoin defines the property of electronic money. It is the state that is distributed and persisted on nodes participating in the network. In figure 2.3, the structure of a transaction in Bitcoin and a transfer of ownership are displayed. The property of any value is recognized when a quantity defined in a transaction is transferred from a payer to a payee.

In our example, Owneri transfers money to payee Owneri`1. The coin is transferred when

Owneri digitally signs the hash Hashi`1 generated by the combination of Txi that represents his property and the public key PbKi`1 of payee Owneri`1. A hash algorithm is used to generate a digest, a condensed version of the data. Then the digest is used when computing and verifying a digital signature through a Digital Signature Algorithm (DSA) [55].

Figure 2.3: Transaction structure and dependency.

As a result, with the payer’s public key PbKi, everyone can check that the coin was given to payee Owneri`1.

23 2.1. Bitcoin: the Blockchain Seminal Work

The sequence of property transfer can be checked by anyone in the network by first verifying that the public key PbKi of payer Owneri was recorded in the previous transaction

Txi´1, assuring his property. Second, the payer’s private key PvKi signs the new transaction and can be verified by his public key PbKi. This solution can assure ownership but cannot avoid double-spending. Hence, to not use a trusted authority, nodes in the network share all transactions by broadcasting every new payment. This is a way of being aware of duplication. However, it poses a problem for the secrecy of identities, and since the privacy of users was considered in the highest regards, public-key cryptography, and digital signatures with anonymous public keys were used to protect identities while providing control of ownership. In order to execute a transfer, specific scripts must be triggered. So, Bitcoin implements a scripting language called Script. It is a simple, stack-based language that is not Turing complete, thus limiting the complexity. Although this script concept was a precursor of advanced smart contracts, it is still very limited. To make a transfer possible, a locking script is placed on the data output of a transaction (figure 2.4). An output is defined either as an Unspent Transaction Output (UTXO) or a spent transaction output. For instance, it may represent one of the Bitcoin algorithms, such as scriptPubKey. On the other hand, an unlocking script is placed on the data input of a new transaction that transfers ownership of a coin (i.e., Tx1). In this case, it may be represented by the scriptSig algorithm. These scripts are concatenated and evaluated in order to check if the unlocking script satisfies the conditions defined in the locking script (e.g., 1 “ Eval pscriptSig | scriptPubKeyq). We use a hypothetical function called Eval for simplification. In figure 2.4, the locking script scriptPubKey on Tx0 output defines that

the owner of the specific private key PvKi from which public key PbKi was derived, and hash therefore generated hash PbKi , can unlock the amount kept by the output. As a result,

Tx1 can be a valid transaction.

24 2.1. Bitcoin: the Blockchain Seminal Work

INPUT OUTPUT INPUT OUTPUT

INPUT OUTPUT INPUT OUTPUT

Figure 2.4: Locking and unlocking scripts.

In this scenario, scriptPubKey has various commands, along with a hash of the public hash key PbKi (PbKi ) that identifies the owner of the unspent amount. On the other hand, scriptSign is placed in the input of transaction Tx1. scriptSign is comprised by a signature

Signi and a public key PbKi. Figure 2.5 shows the sequence of evaluated parameters and commands of the concatenated scripts.

25 2.1. Bitcoin: the Blockchain Seminal Work

Figure 2.5: Locking and unlocking scripts evaluation.

The public key PbKi, given by scriptSig, is hashed by a SHA-256 and RIPEMD-160 algorithms using command OP _HASH160, such that

hash PbKi “ HSHA´256pHRIPEMD´160pPbKiqq. (2.1)

hash The hashed public key PbKi is then compared by OP _EQUALV ERIFY , to the existent hash hash hashed public key representing the owner of the UTXO from Tx0. If PbKi “ PbKi , then the command OP _CHECKSIG uses public key PbKi to check signature Signi allowing the spent of the currency.

2.1.2 Mining Blocks for Consensus

In Bitcoin, the batch of organized transactions is a block. Along with other information, it comprises a Merkle Tree and a nonce (figure 2.6). The nonce in this context is a 32-bit value that allows the implementation of a PoW, a condition for the acceptance of a new block by the network. When participants of the network accept the proposed block and enough blocks are added to this chain, then the transfer of ownership was consolidated.

26 2.1. Bitcoin: the Blockchain Seminal Work

Block Header

Merkle’s Tree

Figure 2.6: Bitcoin block structure.

The first transaction of a block is a coinbase transaction. This is a special type of transaction that rewards a miner with a subsidy (newly generated coins) and all the fees paid by the included transactions in the block. Blocks in Bitcoin are units for packing transactions into a deliverable that represents the next round of transferred values. When a new block is created, it means that participants are transferring units of a coin, and miners are earning fees and rewards. Every transferred amount in the system represents a snapshot of all money available in a specific instant of time. In fact, this is the state that must be kept consistent. Therefore, a procedure for consensus is necessary to replicate the state and make those transactions accepted by all participants. The consensus in Bitcoin is implemented through a PoW, transforming the centralized timestamp server presented in 2.1 into a distributed mechanism. The Bitcoin PoW is an improvement based on the Hashcash denial of service counter- measure [13]. In this work, a CPU cost-function computes a token that is used as proof that enough CPU effort was expended. A cost-function must be expensive to compute but efficient to verify in order to delay contributions. For instance, it could be used to avoid flooding attacks, although Adam Black also suggests it as an option for the minting mechanism in the electronic cash proposal from Wei Dai [4].

27 2.1. Bitcoin: the Blockchain Seminal Work

Similarly, to contribute with a new block in Bitcoin, a network participant must scan for a value that, when inserted in a block that is hashed, would create a value with a specific number of leading zeros. According to the mode of operation, there are two main types of nodes in a Bitcoin Network (figure 2.7). First, Miners competing to find the next nonce and therefore create a new block. Second, wallets performing many functions, such as the generation of private keys and respective public keys, monitoring of outputs spent, creation, and signing of transactions.

MINER

WALLET

MINER

WALLET

Figure 2.7: Bitcoin network.

Mining is realized by iterating over the value of a nonce in the block header (figure 2.8), and the node executing the brute force procedure is called a miner. That allows the block to be accepted, unlocking its capacity to grow the chain.

28 2.1. Bitcoin: the Blockchain Seminal Work

Figure 2.8: Bitcoin block linked structure.

The process iterates the nonce value and uses it as input for the block header. The block header is then hashed and checked against a difficulty threshold. If a hash that satisfies the targeted threshold is found, the block is broadcasted to the network. If not so, the nonce is incremented. Wattenhofer defines a boolean PoW function Fd pc,xqÑttrue,falseu as

2224 F pc,xq Ñ SHA256 pSHA256 pc | xqq ă , (2.2) d d where difficulty d is a positive number and the challenge c and nonce x are bit-strings [56]. The longest chain of blocks resulting from the PoW mechanism represents the decision of the majority of nodes, having the greatest CPU effort invested and therefore being a consensus between participants. Every transaction has inputs and outputs, although only UTXOs can be used for payment, consequently becoming a new input. A transaction can have many inputs and outputs. However, an output can only reference one input in another transaction, because any additional input originated from the same output would be an attempt to double spend the same value, and therefore is forbidden. In figure 2.9 we show an example of sequentially transferred amounts. The Bitcoin (BTC) is the currency unit for the cryptocurrency implemented by the Bitcoin system. It has the Satoshi (SAT) as its smallest unit, representing 1 ˆ 10´8 BTC. In this scenario, we use an amount of 100k Satoshis to be transferred between users of the system.

29 2.1. Bitcoin: the Blockchain Seminal Work

60K SAT

100K SAT 50K SAT

10K SAT

10K SAT

10K SAT UTXO

30K SAT

10K SAT

10K SAT

Figure 2.9: Bitcoin sequential transactions.

Figure 2.9 shows that the owner referenced by Tx0 had 100k SAT and transferred 60k

SAT to one user through Tx1, and 30k to another one through Tx2. In this process, a fee of 10k SAT was reserved for whoever decides to include transaction Tx0 into a new block.

Furthermore, transaction Tx1 transfers 50k SAT to Tx3, reserving 10k SAT for fees. Tx2 transfers respectively 10k SAT to Tx3, and 10k SAT to Tx4, paying 10k SAT in fees. After these transfers were completed, the final state of all unspent money in the system ends up being the sum of all UTXOs, along with fees and rewards (coinbase transaction) earned by miners. The value in a transaction also is completely spent after a payment. One or more fractions of the value go to one or more payees, and the remainder (change) can go back to the payer. In this case, if the values registered for inputs exceed the outputs, then the difference will be considered a fee, to be given to any miner that includes the new transaction in a block. If the values on the outputs exceed the inputs, then the transaction is rejected [57], because users should not be able to spend more than their earnings. Hence, we have that @fee ě 0:

30 2.1. Bitcoin: the Blockchain Seminal Work

i j fee “ inputn ´ outputn (2.3) ÿ0 ÿ0 When fee “ 0, it means that no compensation was reserved for miners. Consequently, miners might prefer to create blocks with transactions that carry higher fees, resulting in longer times for transactions with lower fees to be consolidated in the system. A miner first selects transactions and creates the structure of the block in order to generate the block header. Then, the nonce can be iterated, changing the resulting hash of the header. Finally, when the proper nonce is found, the miner can furthermore receive a reward for mining the block.

2.1.3 The Economic Vocation of Blockchain

Blockchain became popular as electronic cash, more specifically through Bitcoin. However, since computers have virtually no cost to replicate information, double-spending is a risk for digital money analogous to printing fiat currency with no collateral. Additionally, a centralized trusted party (i.e., mint) would be a single point of failure. Thus, Bitcoin was a proposition for a distributed solution to the double-spending problem to represent virtual cash properly. The motivation behind it relied on the cypherpunk ideology that opposed how governments keep societies captive by controlling their means of exchange [4]. Since a market is a medium for exchanges [21] and to participate, one must be able to trade, the system implemented the transaction concept, representing the exchange of goods, services, or specifically, in this case, currency. From the ground up, the approach is aligned with economic principles discussed by Oliver Williamson [58] and Ronald Coase [59] that considered market transactions a core discussion for microeconomics [60] [61]. Around the conception of a transaction, other elements of economics were built. For instance, the investment (CPU and electricity) spent to put coins into circulation would be rewarded by system-generated fees, also helping the system to stay inflation free. As a result, a greedy attacker would find it more profitable to cooperate instead of assembling CPU power to defraud users because the creation of electronic cash relied on a proof-of-work mechanism [13], which made impractical for a minority group of attackers to perpetrate a

31 2.1. Bitcoin: the Blockchain Seminal Work

fraud. Therefore, Bitcoin intended to represent mechanisms of the market and anticipated opportunistic behaviors through an economic perspective and offered technological solutions to remediate them. Therefore, blockchain-based systems can emulate a microeconomic1 system.

2.1.3.1 Opportunism and Adversarial Problems

In the cypherpunk cryptoanarchy [2], a society is defined by the cooperation between its members, a medium of exchange, and no government interference [62]. However, to deliver such community from controling institutions, one should not rely on law enforcement mechanisms [4] or third party regulations [3], creating a laisser faire2 (free trade). When representing markets, a model must consider problems arising from human traits such as greed. Therefore, which mechanisms could implement control and still be compliant to a cryptoanarchy? Nakamoto considered a scenario where a greedy attacker would create an alternative chain. He characterized the race between the attacker and honest nodes as a Binomial Random Walk [63], analyzing the probability of success by the perspective of the Gambler’s Ruin Problem [64]. This insight related the adversarial behavior to a known issue, helping devise the use of a PoW [13] in order to difficult collusions against the system. However, Bitcoin implemented a hard-coded solution for a specific adversarial behavior. Although Nakamoto mentioned escrow mechanisms in the seminal paper, smart contracts were not presented as mature concepts. On the other hand, new blockchains developed a more flexible model to define behavior, where parties can agree on business rules that govern the interaction. Therefore, Bitcoin was designed to avoid the double-spending money problem but adapting theoretical solutions into computational mechanisms to deal with an opportunistic participant. Hence, blockchain- based systems can create control mechanisms to deal with opportunism.

2.1.3.2 Property and Ownership

Nowadays, many of the most valuable businesses have their operations based on digital assets [65]. Hart and Grossman showed that a firm is a group of assets owned by some

1 Economists find it challenging to separate the concepts of microeconomics and macroeconomics [21]. 2 Economic school emphasizing a limited interference of governments on markets [21].

32 2.2. Distributed Systems

entity [66]. Therefore, in this scenario, jeopardizing data means threatening the very existence of such companies. Bitcoin was designed to manage the property of electronic money. The ownership of the intangible coin was based on the public announcement of its transfer, differently from the mint based model (similar to the expensive reconciliation process performed by banks) where the centralized design decides the sequence and validity of transactions. In the Bitcoin case, property awareness was the knowledge that a quantity was awarded to someone. New DLT environments have similarities with this concept, although its state replication is comparable to traditional shared databases’ mechanisms, such as 2-phase protocols [10]. Also, their level of centralization and identification makes the blockchain description not entirely accurate [11]. However, the attributes explored in this work share similarities with the traditional model of blockchain. Differently from Bitcoin, the value of a transaction in a DLT environment mostly relies on the confidentiality and commercial advantage of the shared information. For many BaaS consortiums, the value lies in the sense that can be extracted from data. Therefore, companies want to own the meaning, hiding its knowledge from an undesired observer, consequently preserving ownership. As can be seen in Coase’s theory of the firm [67], there is an equivalency observed between the corporate structure, assets, behaviors, and how blockchains can represent it [19]. Therefore, DLTs have the potential to simulate currencies, assets, contracts, and enforcement mechanisms creating an environment where ownership of goods is transferred. Consequently, blockchain-based systems can manage the property with provenance and traceability.

2.2 Distributed Systems

At the beginning of modern computation, issuing a mathematical operation to a machine would not just need extensive knowledge but also would take a long time to complete. Later on, computers started to get physically reduced and even more abstracted. Consequently, less specialization led to broader use, where generalized machines started to be employed in

33 2.2. Distributed Systems

a variety of applications. Furthermore, the advent of the Internet created the scenario for worldwide interactions along with new problems to be solved [68]. In the current platform economy, with companies operating on a global scale, computer systems are essentially distributed. In general, most of the web services, social media applications, games, and video streaming products are inherently working in a distributed fashion and consequently can benefit from the same advantages and suffer from the same shortcomings. They share common requirements of high reliability, availability, safety, and maintainability, attributes that would only be demanded from critical mission applications decades before. Thus, options for boosting the overall performance were found on parallelism, replication, and other techniques. In doing so, systems did benefit from increased speed, availability, and virtually infinite storage. However, distributed architectures brought new problems in regards to synchronization, coordination, and communication. Blockchains aggregate many concepts from distributed systems. For instance, co- ordination mechanisms (e.g., consensus) are at the heart of every blockchain discussion. Coordinating a decision in a distributed architecture is a complex task because it depends on the agreement of participants of a certain network. In addition to that, failures can occur, making the creation of fault-tolerant protocols even more difficult [69]. We define what is a distributed system and its main constituent parts, laying down the necessary information to approach fault tolerance. Moreover, we discuss fault-tolerant strategies such as redundancy and how systems can coordinate replicated components through consensus. As a result, the introduced concept of Bitcoin and its correlation with the evolution of the blockchain model as a fault-tolerant system can set the knowledge for blockchains in a cloud service environment.

2.2.1 What is a Distributed System?

A distributed system is a group of autonomous computing components that collaborate to serve users, performing as a unique system [69]. Components in a distributed system can be either software (i.e., process) or hardware elements. In both cases, each logical unit is considered a node. However, in order to accomplish the goal of behaving as a single coherent system, components must cooperate. Although nodes can act independently, collaboration

34 2.2. Distributed Systems

lies as a central matter when developing and analyzing distributed systems. As a result, components must communicate properly through the exchange of messages. However, the acknowledgment of a message and effective response can raise concerns related to latency and fault detection. These cases lead to questions regarding synchronization and how nodes can decide on the output, also bringing the subject of coordination and decision making. Nodes can also work as a group inside the system, generating outputs as a logical unit. Since nodes can represent either software or hardware components, a group can be a mixture of elements acting as one node. In this case, a node may be a server, client, or both, communicating over a network. These topologies are often organized as overlay networks, where processes have a list of its neighbors in order to connect [70]. For instance, Bitcoin is a type of overlay network, precisely a P2P network. In figure 2.10, groups of processes G1,

G2, and G3 communicate as three different nodes, where the organization of the network architecture can be a P2P setting.

Figure 2.10: Groups of processes communicating as logical units.

Processes can cooperate, giving the perception of an unique, coherent system where operating members can compensate for the effects of faulty ones (e.g., P1, P2, P3). One of

35 2.2. Distributed Systems

the goals for a distributed system is to be able to fail partially while the system continues to provide its services and recover [69]. In this sense, a distributed system that tolerates faults can be considered to some extent a trustworthy or dependable system. The dependability of a system is measured by its degrees of availability, reliability, safety, and maintainability [71]. When a system fails (i.e., system failure), the purpose for which it was designed cannot be accomplished, and this happens when one or more of its services cannot be provided. The root cause for a failure is a fault, for instance, caused by a poor transmission medium. In this case, a fault generated an error in the state, and the system was not able to tolerate a faulty component by detecting errors, providing correction, or compensating the function performed by an absent element. Achieving fault tolerance is strongly related to the reliability aspect of dependability. Reliable communications are part of a successful decision-making process for groups, such as consensus mechanisms. A fault-tolerant system promotes a reliable inter-operative environment. The development of dependable systems is, therefore, a matter of controlling faults, hence making the system fault-tolerant [69]. Although a system can be designed to prevent, forecast, or remove faults [72], in this work, the tolerance aspect is more suitable for what blockchains employ to mitigate failures. Also, different faults may lead to different types of failures. Sometimes, a new proposed algorithm or architecture aims to prevent one type of failure by focusing on specific types of faults, consequently improving the whole system. Classification schemes have been developed for the seriousness of failures. Cristian [73] and Hadzilacos and Toueg [74] provide a model where we can classify failures as crash, omission, timing, response and arbitrary. Arbitrary failures are also known as byzantine, the most serious kind of failure [69]. In this case, a component produces outputs that may not be trusted but also cannot be detected as erroneous. Pease et al. [75] and Lamport et al. [76] were the first to analyze byzantine failures. This is the main type of failure that is approached by blockchain consensus mechanisms and algorithms.

2.2.2 Fault Tolerance

Many applications adopted critical mission architecture for systems. In some cases (e.g., nuclear plants, flight controls, etc.), a malfunction can result in catastrophic scenarios.

36 2.2. Distributed Systems

Therefore, avoiding a failure depends on finding a way to mitigate logical faults [77]. We focus on techniques that employ redundancy, where redundant information outweighs incorrect ones [77]. For instance, majority voting is a fault-masking procedure that greatly evolved in blockchain circles. When increasing reliability, two approaches can be used: fault avoidance (fault intol- erance) and fault tolerance [77]. Fault avoidance tries to reduce the probability of failure, whereas fault tolerance uses redundancy to alleviate the effects of a faulty system (figure 2.11).

DEPENDABILITY

RELIABILITY

FAULT TOLERANCE FAULT AVOIDANCE

FAULT DETECTION MASKING REDUNDANCY DYNAMIC REDUNDANCY

Figure 2.11: Fault-tolerant strategies.

As seen in figure 2.11, fault-tolerant techniques are divided into fault detection, masking redundancy, and dynamic redundancy. Redundancy allows a distributed system to achieve enough resilience for processes. In this situation, tolerance grows when processes are organized into groups of similar processes, where a failing process can be masked by a surrogate in the group. Depending on how they are organized, some groups will have a hierarchy for decisions, while other groups may decide collectively. Non-hierarchical groups have no single point of failure, whereas groups with coordinators must perform some sort of leader-election algorithm when the leadership fails. In this scenario, the replication of processes (e.g., state

37 2.2. Distributed Systems

machine replication), states, and a way of reaching consensus are important in order to implement a fault-tolerant service. State replication is successfully achieved when every participant of a distributed system comes to an agreement on the sequence and integrity of a set of transactions [56]. For instance, a set of nodes achieves proper replication of the state if all nodes execute a sequence of commands c1, c2, c3, ..., cn in the same order. However, servers can be geographically located closer to some clients and consequently receive some messages faster. If the interaction does not follow a suitable protocol for this task, it can lead to an inconsistent state among servers (e.g., nodes in a distributed system), where each machine have different versions of the truth.

This is demonstrated by a construction in figure 2.12, using two servers (i.e., s1, s2) and two clients (i.e., c1, c2). In this setting, both clients send a command at time t0 to update variable x on servers s1 (i.e., x1) and s2 (i.e., x2), where initially x “ 0. Client c1 sends the command x “ x ` 1, whereas c2 sends the command x “ x ˆ 2.

Figure 2.12: Inconsistent replication of the state.

Considering that messages had different delays, server s1 receives the message from client c1 first, at time t1. Similarly, server s2 received the message from client c2 first, at time t2. Consequently, after receiving messages from c1 and c2, s1 computes x “ p0 ` 1q ˆ 2 yielding x “ 2, and s2 computes x “ p0 ˆ 2q ` 1 yielding x “ 1.

38 2.2. Distributed Systems

The above example shows how a system can become inconsistent and highlights the importance of coordination. Fault-tolerant services are usually implemented through active (i.e., state machine approach) or passive replication (i.e., primary-backup approach) [78]. In the active replication, the service state is replicated at all servers, where all non-faulty machines receive client requests. On the other hand, the passive replication works by designating one server as the primary and the rest of them as backups. The primary receives messages, and if it fails, one of the backups is designated to take over. In many blockchain initiatives, the state machine replication [79] is applied as part of their state replication strategy, where every node executes every transaction in order.

2.2.3 Byzantine Agreement

Since the 40s, studies towards the tolerance of failures in computational systems had gained interest [77], even when engines were built upon relays [80]. Furthermore, fault-tolerant techniques from dynamic redundancy were used to rollback processes to a consistent point. Concepts such as retry, journaling, and checkpointing were investigated in the pursue of critical mission computers. In the 60s, Algirdas Avizienis, sponsored by NASA’s Jet Propulsion Laboratory (JPL), started to search for a more fault-tolerant computer. The effort resulted in the Self Testing And Repairing (STAR) computer [81]. It employed checkpointing, as many other long-life application architectures [82,83]. Later on, NASA’s Software Implemented Fault Tolerance (SIFT) project was designed in order to meet the critical flight safety requirements [84]. It used mainly processor replication and voting and started the study of byzantine failures [56]. The reliability of the system was achieved by the majority voting, where tasks were executed by different computers communicating over unidirectional links [77]. Leslie Lamport participated in the project and collaborated on the design and early specification hierarchy [85]. Byzantine failures may result in a distributed ledger having forks with different coexisting versions of the history. This could happen due to an adversary trying to disrupt the ledger’s operation, and linked timestamping is not sufficient to resolve forks [86]. Such byzantine faults were presented in work “The byzantine generals problem” [76], written by Shostak, Robert, Pease, Marshall, and Lamport. Also, a work on how to reach agreement in the

39 2.2. Distributed Systems

presence of faults [75] preceded the generals’ problem, creating a track for prolific literature that followed. Figure 2.13 shows a chronological proposition of ideas for byzantine agreements in the area of fault tolerance along with other blockchain influential works.

TIMESTAMPING CRYPTOGRAPHY DIGITAL CASH FAULT-TOLERANCE

Diffie & Hellman Key Exchange

Rivest et al. NASA Public-key Cryptography SIFT

Merkle Pease et al. 1980 Merkle Tree Byzantine agreement

Andrew Yao Chaum Millionaires' problem Untraceable pseudonyms

Lamport Byzantine Generals

Chaum Blind signatures

Chaum Security without identification

Chaum Untraceable electronic cash

Goldwasser et al. Lamport Zero-Knowledge Proof Paxos

1990

Figure 2.13: Byzantine failures seminal works.

A byzantine agreement is achieved when n nodes reach consensus, where f of these components are byzantine ones [56]. Pease et al. showed that byzantine agreement is unsolvable for n ď 3f [75]. Paxos [87,88], was an early solution proposed by Lamport for a system where processes may exhibit crash failures, but not byzantine ones. It considered state replication for unreliable communication channels when a minority of nodes exhibit certain faults [9]. Later on, Castro and Liskov introduced Practical Byzantine Fault Tolerance (PBFT) [89], a protocol that could handle byzantine failures. Its practical implementation for a difficult combination of safety, liveness, and performance. As a result, it has been adopted for permissioned blockchains.

40 2.2. Distributed Systems

2.2.4 Consensus

In Section 2.2.1, we explained how systems could achieve protection against failures by replicating processes as groups [69]. This redundancy can be approached mainly by primary- based (primary-backup protocol or passive replication) or replicated-write protocols (active replication) [69]. In a primary-based replication, a group is organized as a hierarchical group where a designated process coordinates write operations, choosing which processes (i.e., workers) will be better suited for the job. On replicated-write protocols, a flat group organization is used, with no hierarchy. Consequently, not having a single point of failure, such as a coordinator.

In figure 2.14, G1 is a flat group comprised by processes P11, P12, P13 and P14. On the right side, G2 is a hierarchical group comprised by processes P21, P22, P23 and P24, where process P21 is a coordinator and the other processes are workers.

Figure 2.14: Flat and hierarchical groups.

However, it is important to understand how much redundancy is necessary in order to properly choose a process resilience strategy and consensus. A system is k-fault tolerant if it can meet its specifications in the presence of k faulty components [69]. It means that such a system tolerates failures in at most k nodes. For instance, for fail-silent failures, a system may work properly if it has k ` 1 nodes. But the k-fault tolerance factor changes according to the types of failures encountered in a system. The consensus in a system with fail-stop

41 2.2. Distributed Systems

failures (i.e., crash failures) may be feasible with 2k ` 1 nodes. When facing fail-arbitrary failures (i.e., Byzantine failures), a set of 3k ` 1 nodes is required to reach an agreement [69]. In Section 2.2.2, we discussed a client-server architecture example, where a client A could send commands to server S. When transitioning to a multiple server scenario, such an algorithm could be improved, but again we could meet inconsistency. In this case, an option would have a server as a coordinator in the network, organizing acknowledgments from other machines. Yet, this arrangement easily shows a downside since the coordinator is a single point of failure. Also, the difficulty quickly escalates when multiple clients are considered. An improvement for this setting would be the use of mutual exclusion [90], by guaran- teeing that only one client is sending one command at a time and assuring that every server would have the same instruction at a given moment. Because once a client has a lock from every server, it can send a command to each one of them. But in this case, we are assuming that all servers are responsive, and now we go back to the same problem of the coordinator availability. This concept of acquiring locks from the servers is present at two-Phase Locking (2PL) protocols [10], where concurrency control is implemented for transactions by serialization so that it can organize requests in order to achieve state replication. Also, another distributed algorithm for coordination is the two-Phase Commit (2PC) protocol, where the figure of a coordinator is paramount to achieve consensus. This is an approach usually found in databases. The latter resembles modern versions of consensus protocols in blockchains, but these algorithms are not yet the best option when it comes to handling failures. A more suitable algorithm for fault tolerance is Paxos [91]. In Paxos, locks are represented by tickets and issued by servers, where only the most recent one will be accepted. In this protocol, client’s failures are expected since tickets are counted, and its expiration can be determined. A client can stop its attempt to process a command amongst the servers and start a new one anytime, eliminating the need for establishing proper values for timeouts. Paxos assumes that a system is partially synchronous, and processes may exhibit crash failures, but not byzantine ones [69]. In this setting, groups consist of 2k ` 1 processes in order to mask k crashed members.

42 2.2. Distributed Systems

Similarly to Paxos, PBFT also assumes that communication between processes is unreliable, and the system is partially synchronous [69]. However, two expressive differences are assumed. First, a faulty process can exhibit arbitrary behavior. Second, the sender is identified by signing its messages. Under those premises, PBFT uses a primary-backup approach with a number of 3k ` 1 nodes. Figure 2.15 shows a chronology of Paxos and PBFT in regards to other technologies that influenced Bitcoin and consequently, the concept of blockchain.

TIMESTAMPING CRYPTOGRAPHY DIGITAL CASH FAULT-TOLERANCE

Lamport Paxos

1990

Haber & Stornetta Document timestamping

Bayer et al. Efficiency for timestamping

NIST Eric Hughes SHA-0 Cypherpunk’s manifesto

Szabo Smart contracts

Netscape SSL 1.0

Haber & Stornetta Bit-string naming

IETF Dai TLS 1.0 B-money

Massias et al. Castro & Liskov TIMESEC PBFT Fox & Brewer CAP theorem 2000

NIST Back SHA-2 Hashcash

Gilbert & Lynch CAP proof

Nakamoto Bitcoin

2010

Figure 2.15: Byzantine agreement evolution.

43 2.3. Permissioned Blockchains

A consensus mechanism is used when a protocol must be followed by replicated parts of a system in order to agree on the results produced by the group. In this case, a distributed system is designed with a certain level of tolerance, assuming which faults are going to be mitigated. In figure 2.15 is possible to see the evolution of technologies that influenced public and private blockchains, where Bitcoin was the catalyst for adoption. Private blockchains are known for identifying its members and consequently have been adopting primary-backup approaches for consensus mechanisms. Public blockchains, such as Bitcoin, work with anonymous participants, also influencing its choices for state replication. In summary, choosing the level of centralization, anonymity, and protocols will inevitably define the performance and efficiency of a blockchain-based system.

2.3 Permissioned Blockchains

The majority of blockchain services are offered with private frameworks and have been adopted by companies that are transitioning into a more cooperative version of their businesses. We center our attention on permissioned blockchains and how providers are offering the respective services. Therefore, what is a private or permissioned blockchain? To answer this question, we revisit concepts of redundancy, state replication, and consensus. As explained in Section 2.2.4, redundancy in the process level is approached by passive or active protocols. Active protocols have a flat group of processes, consequently showing inferior performance due to its voting process [69]. Although these concepts refer to traditional distributed systems, many authors will refer to blockchains as decentralized instead of distributed [11]. In fact, in a distributed system, components collaborate to behave as a unique element 2.2.1. In some blockchains (e.g., public), nodes may not collaborate but compete in order to keep a consistent state. In that sense, blockchain has a distributed state (i.e., ledger) replicated by a coordination mechanism (i.e., consensus), but nodes in a blockchain are independent, not depending on a trusted third-party or central authority, creating a type of decentralized consensus [11]. For instance, Bitcoin works in a P2P fashion, where nodes resemble the organization of a flat group. However, the consensus mechanism (PoW) is more related to a leader-election (active approach) procedure, where a winner

44 2.4. Blockchain as a Service

proposes the next block. Bashir describes two main categories of consensus mechanisms for blockchains [11]:

• Proof-based (Nakamoto consensus), where a leader (winner) proposes a final value in a permissionless type of consensus;

• BFT-based (Byzantine Fault Tolerance based), as a traditional approach with votes, also known as permissioned type of consensus.

Since redundancy is used to tolerate faults, a mechanism must be used to coordinate the decision between the replicated parts. Consequently, the designed consensus will influence the characteristics of the system. For instance, BFT-based consensus mechanisms identify its nodes (e.g., PBFT), while in the Nakamoto consensus (e.g., Bitcoin), nodes are anonymous. Bitcoin nodes behave like processes in asynchronous systems, leaving or rejoining the network at will [1]. On the other hand, synchronous systems can set time boundaries for its components, and assume failures, discarding the participation of some process. That also improves the performance and the liveness of systems such as in the PBFT [69]. In summary, a permissionless or public blockchain is open to anyone willing to participate (anonymously) in the decision-making process, while in the permissioned or private blockchain, only allowed nodes can join (onymously) a consortium or group [11]. Consensus mechanisms that rely on computational efforts, such as in the Nakamoto consensus (public blockchain), have a lower rate of added blocks in comparison to private initiatives since a smaller number of trustworthy nodes are used [17]. On the other hand, private consortia such as Hyperledger, R3, and Ripple are working in solutions that support more confidentiality and performance while tolerating some degree of centralization [17].

2.4 Blockchain as a Service

Back in the 1970s, virtualization was introduced to run legacy software on mainframes [69]. Virtualization extends or replaces the interface of a system by mimicking its behavior, providing interoperability between components (e.g., hardware, software, etc.). A common use is to provide a system as an interface, offering a complete instruction set as a virtual

45 2.4. Blockchain as a Service

machine. That interface may be available simultaneously to many programs, such as operating systems. In that setting, virtualization provides enough isolation in shared physical resources for users, although it can compromise the performance to some extent. Cloud computing (cloud) is a resourceful pool of computation shared between users [92], and it is the most important application of virtualization, from a distributed system’s perspective [69]. Due to the virtualized resources in a cloud setting, a company can rent out computation where customers can share the same physical machine. Cloud providers offer basically three types of services, named Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). IaaS is the hardware and infrastructure layer, while PaaS covers the platform layer. The PaaS layer would be comparable to what an operating system offers through its system calls. The SaaS, by its turn, is a service where applications can be executed and customized. BaaS combines blockchain and cloud computing into a utility service, where one can manage various elements of the technology such as nodes, smart contracts, ledgers, and applications, similar to SaaS [18]. In an economy of platforms where competition urges companies to reduce costs and overheads, BaaS brings efficiency when adopting distributed ledgers and blockchain technologies [20]. However, although the environment raises trust concerns, the facilitation for development, deployment, and accessible expertise compensates the risk of adoption. Also, BaaS promotes the share of a reliable resource to communicate information between companies, departments, and individuals [93]. These organizations already exist under the strict scrutiny of governments and regulating agencies. Additionally, many of them already have a commercial relationship with cloud solutions and providers under the Service Level Agreement (SLA) agreements, creating an acceptance towards third- party services. Therefore, the use of blockchain services is a natural choice when exploring the new technology since BaaS is just another layer of a cloud computing service [94] by which the economy of scale provides an attractive cost-benefit factor.

2.4.1 Old Problems for new Chains

Data is becoming ubiquitous, and expectations on its high availability are pushing the adoption of virtualized services (e.g., SaaS, BaaS, etc.). As a result, this new paradigm is

46 2.4. Blockchain as a Service

challenging the existing regulations on data custody and the management of privacy and ownership [95]. In this section, we analyze the problem related to the property of data assets in a BaaS setting or third-party custody. We approach the subject by expanding the management of ownership into three nested layers, named data control, information ownership, and ownership transfer (figure 2.16).

OWNERSHIP TRANSFER

INFORMATION OWNERSHIP

DATA CONTROL

Figure 2.16: Correlation between characteristics of ownership.

Such classification will help us investigate solutions for problems that usually unfold in that order. For instance, power imbalance is a problem encountered when companies are trying to approach data control and governance in a partnership. Nonetheless, when they manage to cooperate in a distributed database, the exclusive ownership of information is lost. In some cases, it is acceptable to reveal the data, but sometimes it may be undesirable or illegal, leading companies to encrypt sensitive data. As a result, market players may restrain from using third-party facilities or engaging in a shared data effort. Additionally, encrypted data impose a challenge for computability in many of the service providers, making the analysis over encrypted data unfeasible. Many of these problems may share a root cause, named as opportunism, although not every issue is ignited by the same trait. However, the assumption of opportunism and adversarial behavior may help devise strict solutions. Furthermore, we will discuss an instance of a problem that is a blueprint for the BaaS adversarial problem, consequently examining respective solutions that can be replicated for our purposes.

47 2.4. Blockchain as a Service

2.4.2 The Ownership Management

The International Data Corporation (IDC) estimates that in 2025, public cloud environments will store 49% of the world’s data [96]. Inevitably, data centers are becoming the new repository for enterprises, and digital assets became a new competitive advantage [65]. Therefore, the volume of data is increasing, and consequently, higher demand for storage and computation will follow. In parallel, blockchain has been adopted as a new virtualized resource, along with machine learning, Virtual Reality (VR), and IoT. For instance, IBM, Hewlett Packard (HP), Amazon, Microsoft, and Oracle, offer the new cloud feature as a service (i.e., BaaS) [18]. However, its underlying cloud infrastructure reintroduces a third-party intermediary (i.e., service provider), rising concerns regarding the privacy and custody [20,97]. We first analyze a data control situation through the IBM’s Tradelens [98]. TradeLens is a platform, initially created as a jointly-owned product between IBM and Maersk, an ocean and inland shipping company [99]. The initiative is powered by IBM Cloud and IBM Blockchain and aims to digitalize paperwork for the global supply chain industry [100]. However, the service struggled to enroll Maersk’s competitors [101], since IBM and Maersk had full control over the platform. For instance, a spokesman of Hapag-Lloyd, the world’s fifth-largest ocean carrier, expressed disagreements with the initial governance model proposed for Tradelens [99]. In another case, Nestlé partnered with the OpenSC blockchain platform to develop a distributed ledger system [102] instead of using its previous partnership with the IBM Food Trust blockchain [103]. What these companies could foresee was a model where control, legally or physically, belongs to a third-party such as in figure 2.17. As a result, businesses may run their blockchain application using an in-house approach, such as the Kuehne + Nagel’s Verified Gross Mass (VGM) system [104]. Kuehne + Nagel, the number one sea freight in the world, applied blockchain for its VGM system in order to verify its containers’ mass. Their guiding principle of governance is that the ownership of the platform will not be restricted to the founding parties [104]. In the VGM solution, all the data is stored on-chain, and the information is encrypted. The company hosts its own nodes in an on-premise setting, whereas partner nodes can be

48 2.4. Blockchain as a Service

TRUST MODEL

Figure 2.17: IBM BaaS trust model. hosted anywhere else. However, on-premise implementations are highly expensive in contrast to BaaS [18]. Additionally, from their partners’ perspective, the VGM initiative may be considered another model based on the trust that Kuehne + Nagel will abide by agreements. Companies are concerned with who has control over their assets, but BaaS environments can reduce the cost of deploying, computing, and maintaining a blockchain application once partners find a neutral environment. For the cases where a common ground is found for data control, even when equal rights are settled, information ownership becomes the next layer of concern when a firm wants to protect the meaning carried by its digital assets. In these cases, SLAs are not sufficient for property protection, such as in a data breach incident. A data breach is unauthorized access to sensitive data, compromising the Confidentiality, integrity and availability (CIA) of an information asset [105]. With its public awareness, users become conscious that agreements were not sufficient to protect their information, such as in the Equifax incident [106, 107]. Therefore, companies are also concerned about information ownership, along with the control and custody over their data. The ownership concern can be depicted by the Democratic National Committee (DNC) disagreements with their state parties. The DNC argued that a data trust should be created

49 2.4. Blockchain as a Service

as a for-profit entity in order to manage data from voters, benefiting candidates with the inferred information gathered from it [108]. On the other side, the head of the Association of State Democratic Committees alleged that they could lose data ownership to a group with unknown motives and political bias. Also, the referred data was considered the main source to raise money for operations, being their only asset, as stated by some operatives [27]. Similarly, in healthcare, legal restrictions can discourage any further cooperation due to privacy challenges [109,110]. Although encryption has been used for data protection and computation, analysis over encrypted data is still a challenge that some initiatives are addressing through different methods. Additionally, transferring the property of calculated results (i.e., changing the keys) should not rely on trusted environments (more on this topic will be presented in Section 2.5). Consequently, we need to assure data control, information ownership, and an ownership transfer mechanism.

2.4.2.1 Information as the Data Asset

The concepts of information and data are usually used interchangeably, but we abstract data as the result of a transmitted or stored signal, whereas information is the semantical aspect of it. Information Theory (IT) formalizes communication systems as stochastic processes [111]. The work of Shannon [112] is the landmark for IT [113, 114]. However, although IT may instill the semantic aspect of information, it is focused on transmission, where the semantics of the data transmitted is irrelevant [114,115]. Therefore, the idea of data is more applicable when analyzing communication, noise, and entropy. On the other hand, information results from pursuing meaning while performing statisti- cal observations or experiments [111]. Therefore, we analyze the information as a byproduct of intelligible data (i.e., readable), making our approach primarily data-centric. A change in the data (e.g., noise) can modify its significance [116], so data must be noiseless to be an object of interpretation. On the other hand, as a random phenomenon, noise can become a desirable attribute (e.g., encryption) for data preservation because it governs the meaning of a transmitted signal [113]. Although the amount of information can be probabilistically

50 2.4. Blockchain as a Service

measured in a message, cryptography may keep its intrinsic value when used as an ownership preserving tool because the value of information can only be perceived on unencrypted data. Noise can be seen as added randomness in the processing of a ciphertext [117]. When homomorphic operations are performed, such noise is propagated (e.g., multiplication), leading to a need for “resetting” the ciphertext to keep performing functions over it, in a process called bootstrapping [38, 118–120]. Otherwise, the ciphertext would reach a limit for how many operations it could bear while keeping its intrinsic meaning. For FHE schemes that make use of noise, most of the efficiency problems come from managing such inserted “randomness”. Wang introduced two noise-free FHE schemes [121] that were considered practical by Barenghi, however other noise-free schemes faced practical breaks [122–125]. Hence, when approaching our construction, we consider the randomization factor and the perception that a noise-free scheme employs randomization but does not need bootstrapping [121].

2.4.3 BaaS Adversarial Problem

The aforementioned problems can be underpinned by a common source: the expectation or detection of opportunism. Williamson refers to opportunism as “a lack of candor or honesty in transactions, to include self-interest seeking with guile” [60]. It lies as the root cause for the adversarial behavior, a frequent issue associated with trust-based designs [47]. But in general, companies are better off with the specialized assistance of a cloud provider. In this setting, business partnerships are preceded by signed agreements in order to regulate the interaction, bringing back characteristics of the trust model. Unfortunately, contracts cannot guarantee that parties will abide by a set of rules, being just a definition of rights, duties, and the consequences for not obeying them. Due to the human limitation of foreseeing all the outcomes (bounded rationality) [126], legal contracts are essentially incomplete [127], not being able to list all the contingencies (smart contracts included). Consequently, this is the cause for most of the legal disputes [128]. In cases of physical assets, a law enforcement mechanism can execute a repossession. But how to repossess information? Unfortunately, failures or malicious servers cannot be avoided by legal artifacts. Nevertheless, contracts are essential to facilitate trade [129] and their

51 2.4. Blockchain as a Service

electronic counterparts (i.e., smart contracts) are agreements between the involved parts on how data can be modified in a blockchain consortium. We show instances of this issue comparing it with the classical Hold-Up Problem, a recurrent discussion in economics [60]. This problem encompasses issues from the custody of data in a third-party provider such as the incapacity to preserve the CIA of digital assets or an honest-but-curious behavior. Also, it represents adversarial or opportunistic actions coming from business partners in a data-sharing consortium. For instance, a company may improperly use the commercial advantage given by the information shared. As a result, by foreseeing the possibility of misconduct, companies may avoid cooperating due to legal or financial risks. The problem is also influenced by human factors [60] and can arise when one party makes investments (e.g., sharing data) before having compensation [21], so posteriorly partners can refuse to perform as expected. It can emerge when hiring workforce [22], licensing software [23], issuing a patent [24] or sharing information [27]. In many cases, it cannot be avoided due to the incompleteness of contracts [25,26], and law enforcement mechanisms (i.e., court) may not guarantee repossession or compensation. The issue can also be experienced when information is the asset. Since information property implies exclusivity [17], sharing intelligible data may result in power imbalance, failures, or loss of ownership. When companies share their information with other firms, some sort of opportunism can take place, and legal contracts cannot avoid the use of insider knowledge. Again, going for arbitration will be a costly way to be compensated. Therefore, a modified version of the problem is created for intangible assets, yet exhibiting the same traits of the classic problem. We specify the traits of a hold-up in a shared database environment in order to define instances of the problem that can be addressed:

1. Without the right instrument for data sharing, a system can apply a hierarchical structure, creating imbalanced management over data;

2. Unencrypted data loses the exclusivity of the information it bears, and encrypted data may not be computable in a third-party facility, losing the cost-benefit of BaaS;

52 2.5. Promising Candidates

3. Even if a dataset is encrypted and computable, it carries no value for the partnership if partners cannot analyze it.

2.5 Promising Candidates

When using a service provider to implement a blockchain platform, the risks related to data control, information ownership, and ownership transfer will arise from the respective third-party service, business partners, and adopted technologies. Consequently, companies will take measures focusing on governance, frameworks, and cryptographic resources. Concerns about control can be addressed by a proper combination of architectures and cooperative tools. Private blockchain frameworks were designed with resources that can arrange the right combination of consensus protocols, smart contracts’ policies, and network artifacts that give balanced management over data. But notwithstanding the governance issue, a lot of the research effort goes towards privacy issues, by which the information ownership can be managed. Innovative ideas towards distributed ledgers started as specific research initiatives, focusing on consensus and cryptographic theoretical models. Some initiatives started with a business-driven motivation, receiving contributions from the industry and accommodating such insights in their distributed ledger construction. As an example, the r3 consortium was formed with a focus on complex and highly-regulated markets. As a result, r3 Corda [34] was devised. Also, the Linux Foundation created the Hyperledger project to contribute to the blockchain ecosystem. Furthermore, Hyperledger Fabric [32] was proposed by IBM, Digital Asset, and Blockstream in an effort to advance cross-industry collaboration on distributed ledgers. Additionally, a JPMorgan blockchain based on Ethereum called Quorum [35] is launched following a similar path by accommodating the goals of the private market (figure 2.18). Bitcoin, Ethereum, Hyperledger Fabric, and r3 Corda are the most referenced platforms when it comes to blockchain [17], and Hyperledger Fabric is the leading framework in most of the BaaS services (e.g., Amazon, SAP, Oracle, Microsoft, Google, IBM) [18].

53 2.5. Promising Candidates

R3 Corda The crisis spreads Financial crisis spreads Quorum

The euro is introduced The first Bitcoin purchase

1993 1999 2008 2009 2012 2013 2014 2015 2016

B-money outlines the basic properties of modern Ethereum white paper cryptocurrency systems Greece’s budget Banks struggle to get balance Cypherpunk’s Manifesto Deficit higher than previously sheets under thought control Bitcoin white paper released Hyperledger Fabric

A permanent rescue fund is planned

Figure 2.18: Main private frameworks timeline.

These frameworks, along with some cryptocurrency-based blockchains, also offer solu- tions for privacy-preserving computation and the sharing of information. The most relevant options use ZKP (Zero-Knowledge Proof) [130], privacy-preserving smart contracts, isolation of participants, transactions, or specialized hardware. Zcash [131], Solidous [29] and Confidential Assets [30], can use blind acknowledgement procedures of ZKPs, but solutions may lack the efficiency or flexibility to manipulate arbitrary data. Hawk [31], can preserve the privacy of smart contracts, while other frameworks segment participants or data, such as Hyperledger Fabric [32], r3 Corda [34] and Quorum [35]. r3 Corda has a pseudo homomorphic computation mechanism through SGX (Intel’s Software Guard eXtensions) hardware enclaves [132]. However, those enclaves assume that side-channel attacks will not be performed. Similarly, other options for trusted environments are available such as AMD SEV (Secure Encrypted Virtualisation) [133], ARM’s TrustZone [134] and CHERI [135], but one should completely trust the chip manufacturer. On the side of MPC (Multi-Party Computation) initiatives like Enigma [28] and a secure-MPC for HF [33], data can be computed privately. However, the latter uses an insecure implementation, whereas Enigma has an off-chain trusted party design.

54 2.5. Promising Candidates

When considering a reduced number of operations and a feasible scheme, some HE schemes can offer some degree of analysis, such that Benaloh [136], ElGamal encryption [137], Goldwasser-Micali [138] and Paillier [139]. These propositions can be practical and meet performance criteria for many applications such as voting and secure computation. In our view, the implementation of such schemes in a smart contract that can be analyzed by all the parties is more aligned to blockchain core principles and resembles our design decision for this work. To summarize, the majority of solutions is segmenting data, segregating participants, and hiding scripts or encrypting data in a way that can underuse BaaS capabilities or threaten the management of ownership showed in Section 2.4.2, opposing our CASC guidelines explained in Section 1.4.1.

2.5.1 Homomorphic Encryption as a Tool

Rivest et al. thought about the possibility of performing computations on encrypted data without prior decryption [37], emphasizing the notion of privacy homomorphisms. Since 1978, solutions have been devised for HE [140–144]. The first Fully Homomorphic Encryption (FHE) scheme was proposed by Gentry in 2009 [38]. After that, contributions have been made towards the improvement of the idea, in order to make it practical and applicable to real-world applications [145–150]. According to the number of times that a number of operations ˝ can be performed over the encrypted data, a scheme can be defined as Partially Homomorphic Encryption (PHE), Somewhat Homomorphic Encryption (SWHE) or FHE [151,152]. In a PHE scheme, one operation can be performed without a limit for the number of executions. A SWHE scheme limits the number of times some operations can be performed, and a FHE scheme allows for an unlimited number of executions for all operations. An arbitrary function f, mapping elements of a set A to elements of a set B, is said to be homomorphic in regards to an operation ˝, if f pa ˝ bq “ f paq ˝ f pbq, for all a,b P A [153]. In this case, f and A are related, forming an algebraic structure [154]. The conjunction of one or more non-empty sets with one or more binary operations, along with a set of identities that satisfies such operations is considered an algebraic structure. For clarity, we exemplify a

55 2.5. Promising Candidates

mapping function (i.e., encryption function) Encpk,¨q for an encryption scheme Π, where the secret key is denoted by k. The set of all messages and the set of all ciphertexts are denoted respectively by M and C such that Enc : M Ñ C. Therefore, Π is homomorphic in regards of ˝ if the encryption function Enc is such that

Enc pk, m1 ˝ m2q “ Enc pk, m1q ˝ Enc pk, m2q (2.4)

@ m1,m2 P M. In a more specific case, where scheme Π is additive homomorphic, the encryption function Enc is such that

Enc pk, m1 ` m2q “ Enc pk, m1q ` Enc pk, m2q , (2.5)

and multiplicative homomorphic when

Enc pk, m1m2q “ Enc pk, m1q Enc pk, m2q . (2.6)

For our purposes, we are focusing in a scheme that can be additive homomorphic, with a fair security assumption, and that can show performance compatible to the framework that will be chosen. This is an important criteria, since blockchain frameworks have a writing cycle bounded by the consensus mechanism in effect.

2.5.2 Geometric Algebra as the Catalyst

Clifford Geometric Algebra (GA for short), is a powerful system [155]. It is a simple and flexible mathematical framework, from which applications might benefit from compactness, implicit parallelism and higher performance [156]. Since the works of Decartes, characterization of quantities were known for geometric constructions. However, Grassmann theorized on extended quantities, expanding the geometry representation beyond the notion of cartesian points and vectors [157]. Also, Hamilton created quartenions for algebraic 3-dimensional rotations and Clifford defined a vector product that would allow rigid body motions. Furthermore, David Hestenes realized that representational techniques in quantum physics were essentially related to some algebra of spatial relationships

56 2.5. Promising Candidates

and advocated on its practical use [157]. GA combines many areas of mathematics such as Grassmann algebra, quartenions and euclidean geometries. GA extends the real numbers including vectors and their products where higher- dimensional objects can be modeled [158]. Amongst its benefits for our purposes, we can mention the ability to gather many of the mathematical concepts already present in the computer science circles, such as modular arithmetic and matrix algebra. Although eminently suitable for computer science fields such as computer graphics, robotics, and computer vision [157], it has been also employed for cryptography [40, 44–46]. Additionally, a set of computationally efficient algebraic tools allows for algorithmic compactness favoring maintenance and straightforward constructions. We use the capabilities of GA to enhance the HE scheme performance, creating a feasible construction for modern DLTs.

57 CHAPTER 3

Theoretical Analysis and Discussion

Blockchain is a modern method of coordinating economic operations and a native digital medium [159]. Its implementations range from cryptocurrencies to securities tokens [19], where its fit for economics reached the point where some of the world’s biggest banks, such as JPMorgan and Goldman Sachs, are advancing the technology. To evaluate its intrinsic ability to mimic elements of a microeconomy 1, we investigated the intentional micro-economic aptitude of Bitcoin in Section 2.1.3. Our purpose was to show that a blockchain-based infrastructure can be an instance of a market and can evolve implementing economic solutions for new problems. Therefore, if such a system can emulate a market as we also have discussed in Section 1.1.1, then it can be extended to implement solutions for market distortions, such as opportunistic actions. This chapter initiates our contributions by establishing the theoretical foundation for the next chapters. To the best of our knowledge, this is the first time that the specific propositions in 3.1 are mapped to a cryptosystem that allows a homomorphic key update protocol in a blockchain setting. Undoubtedly, if blockchain can solve the Hold-Up Problem for data, it has already been asked [25,47,48]. Also, economists have invented alternatives, such as renegotiation design and processes for transparency, but they are difficult to enforce, as they involve a deep commitment to specific trades, a feature that the current contracts and the court system can not provide. However, when dealing with electronic contracts (i.e., smart contracts), the object to be controlled is data, something already in the software realm. But it seems that many of those mechanisms can not implement a structured bargain due to its complexity, as mentioned by Holden and Malani [25]. Additionally, blockchain is still maturing in how to design solutions [17], even the enforceable ones.

1 Microeconomics is the analysis of the economic activity in the economic environment [21]. 3.1. Theorizing a Solution

We offer a solution in three sequential and complementary layers, along with an abstract model that organizes how each concept can be placed in a practical application. We demonstrate our analysis through the following sections:

1. In Section 3.1.1 we explain how contractual solutions can be represented by modern DLT environments;

2. In Section 3.1.2 we show why HE can be the renegotiation mechanism that prevent business partners to become hostages in a commercial relationship;

3. In Section 3.1.3 we define why a homomorphic key update protocol can efficiently provide secure ownership transfer;

4. Finally, in Section 3.2 we show how these abstract concepts can be placed in a real application. We also provide a suggestion for possible functions and parameters as a placeholder.

In this chapter, we do not define which tools, systems, or schemes will be used, but we devise theoretical relationships, patterns, and interfaces for concrete constructions that will be shown in Chapter 4.

3.1 Theorizing a Solution

We described in Section 2.4.3 a problem that is pervasive in commercial relationships, namely, the Hold-Up Problem. Usually, this is an issue related to tangible assets. However, as stated in Section 2.4.2.1, we are considering intangible assets and the insights extracted from data, since information is our focus. We devise an analysis of existent theoretical solutions, such as renegotiation, revision, and revelation mechanisms, in order to suggest technological counterparts.

3.1.1 Contractual Solutions

Klein et al. shows that post-contractual opportunistic behaviors can either be solved by vertical integration or contractual solutions [26]. Vertical integration occurs when a company

59 3.1. Theorizing a Solution

controls its distributors or suppliers, owning their operations. Therefore, one company ends up buying its partner to have full access to its information, since “ownership is the power to exercise control” [66]. In this case, transactions will be internalized if the cost of realizing it is lower than using the market [61]. However, after a cost-benefit analysis, vertical integration may not be feasible. We have discussed how blockchains have intrinsic competence to simulate markets, contracts, and promote free trade. In fact, when companies join a data-sharing effort, they are going to trade in a virtual market. Hence, we chose to analyze contractual solutions. First, because vertical integration would be equivalent to not using the market (i.e., blockchain consortium), thus being a solution that solves the problem by avoidance. Second, contractual solutions face the Hold-Up Problem as a possible cost for trading in the market and hence can equip legal contracts with proper measures. Such solutions can suit smart contracts since they also suffer from incompleteness, which makes them susceptible to instances of hold-up or technical problems. For instance, an infinite loop would not be preventively detected in a script since the halting problem is unsolvable [160]. It could be an honest mistake or a denial-of-service (DoS) attack [32]. Also, an undesirable analysis would not be anticipated by a dataset owner. Therefore, smart contracts are proper candidates for the measures proposed by contractual solutions. However, companies need tools that can help implement contractual solutions col- laboratively by design while keeping a balanced control over data. Launching a shared database effort is challenging due to governance decisions, relationship-specific investments, and property rights [47]. Blockchains can reduce risks and provide neutral territory for organizations. The concept of trust in a DLT arises from the knowledge that no party can unilaterally modify the assets invested in that joint effort. Additionally, DLTs can implement smart contracts to regulate interactions between participants by previously agreed program code [17]. Also, an immutable history enables traceability against fraud and substitution. Therefore, a blockchain-based system can provide immutability, non-repudiation, integrity, transparency, and equal rights [47].

60 3.1. Theorizing a Solution

3.1.2 Renegotiation Mechanism

Humans are limited by bounded rationality, the inability to predict all the future outcomes. This is the root cause of the incompleteness of legal contracts and, consequently, smart contracts. Chung suggests that a way of creating complete contracts (prepared for all the outcomes) is introducing the possibility of renegotiation [161]. Hart also points out that renegotiation could be anticipated for a further revision process of contracts [127]. In general, the literature assumes that renegotiation will occur because parties want to readjust when better information is available [162]. Therefore, parties would be willing to include renegotiation in a contract if clauses were unambiguously written, in a way that could be enforced [127]. In a DLT, business rules can be enforced once parties agree to persist them in a smart contract. Also, the script can be the mechanism that homomorphically computes encrypted data, generating analysis for a second party. That is why the HE component is essential to postpone the acquisition of meaning, creating a further renegotiation and revision process. Employing HE on a shared dataset keeps control over the interpretation since the company that owns the data cannot previously extrapolate all the information that could be inferred by a second party. It forces the requiring company to wait until the owner can analyze and maybe transfer the ownership of the agglomerated data. But, to allow renegotiation, parties need to priorly include some bargaining mechanisms in the contract [161]. In our case, the HE scheme and the smart contract (which defines the homomorphic operations) allow the renegotiation mechanism, also ensuring confidentiality in a BaaS environment.

3.1.3 Ownership Transfer

When a consortium is established through a DLT and uses HE, many problems related to the secrecy of information may be remediated in a BaaS. But notwithstanding this problem, if the secure transfer of ownership is not enabled, it reduces the utility of data for business partners. Therefore, we need a mechanism to execute the trade.

61 3.1. Theorizing a Solution

Aghion et al. [163] and Chung [161] show that the first-best can be achieved if the parties can allocate the bargaining power at the renegotiation game, giving all the control to one of the parties (e.g., data owner). The first-best is the resulting equilibrium of a perfectly competitive market, where all resources are privately owned, and the policy is free trade. Nöldeke and Schmidt [164] corroborate, arguing that the first-best can also be achieved by giving the seller the option to trade. In summary, a market without distortions would be feasible if the power over the asset (i.e., information) would not be lost during the renegotiation stages. HE keeps the bargaining power with the owner during the computation and renegotiation, although a mechanism for transferring property must be implemented. Hart shows that first-best investments can be implemented by writing the trading rule in a contract [127]. We abstract this concept by the analysis that the secret key represents the ownership, and the settlement of trade is made by the key update of a dataset. The key update protocol is also implemented by a smart contract that runs under the consent of the parties. Thus, not only the scheme and computations are known, but also the trading rule. The key update is preceded by a key exchange procedure, where parties will create a new common key. The following update of the key is the step where the property of a specific asset is transferred. However, opportunism can still play a role in this scenario. For instance, the owner could persist useless data, such as registers filled with zeros. That could generate a worthless analysis for a buyer that, after generating an encrypted report, executes a payment in order to have property over a result. Hart makes an observation about legal consequences and disputes when trading. He affirms that “results are sensitive to the trading mechanism and to what the courts can retrospectively determine” [127]. Therefore, the mechanism and the sequence of events need to be verifiable when involving arbitration, but we already showed that the mechanism is verifiable (e.g., smart contract, transactions, etc). Additionally, in this case a court would be able to verify the calculated result through the generation of a new key for verification. Now, in this homomorphically computed blockchain, the delivery of goods is defined by the generation of a report encrypted by the seller’s key, followed by the buyer’s new key substitution in a second report. The mathematical operations can be unambiguously verified

62 3.2. A Theoretical Cryptographic Model

by the analysis of the smart contract prior to its persistence in the blockchain. Moreover, for verification of meaning, new keys (e.g., for a court) could be generated. Therefore, the model is compliant with the verifiable and renegotiable approaches.

3.2 A Theoretical Cryptographic Model

This section proposes an implementation-agnostic model to mold a design that can evolve into the solution for the confidentiality and ownership protection problems. Our construction is intended to work as a meta framework for cryptographic tools (e.g., cryptographic schemes, mathematical operations), where such tools can be placed according to their attributes and compatibility. We introduce model ∆, a tuple of cryptographic tools such that

∆ “ , , , (3.1) ´ź ÿ ď¯ where is a private-key HE scheme. is a symmetric-key exchange protocol and ś ř Ť is a key update protocol. We describe the general syntax before implementation. For several reasons, this is beneficial; for instance, a general syntax may propose various constructions and different schemes, still satisfying the same rationale. Also, a general idea for interfaces helps in the development of scattered components. Since blockchain design patterns are still a challenge, it may define a starting blueprint for new strategies in development circles.

3.2.1 The Encryption Scheme

The previously mentioned abstract model does not need a FHE scheme for all the applications, implying an unlimited number of computations of any type. For demonstration purposes, our application only needs addition and scalar division executed in a reasonable number of times. Therefore, the inference of analysis or aggregated results over a set of transactions can rely on a SWHE scheme. The fact that this abstract model is a scaffold, makes possible for a different scheme to be placed in the respective slot in a different construction.

63 3.2. A Theoretical Cryptographic Model

Our model starts off by the definition of the private-key scheme . We introduce in a ś high level definition what would be a basic homomorphic symmetric encryption. We define as a tuple of polynomial-time efficient algorithms such that ś

“ pGen, Enc, Dec, Evalq . (3.2) ź For security reasons Gen and Enc are probabilistic algorithms, whereas for correctness Dec and Eval are deterministic algorithms. We basically define that this system will need a way to generate keys, encrypt and decrypt messages, and execute a few mathematical operations. The Eval function is a generalization of a binary operation that can be furthermore expanded. Gen accepts a security parameter 1λ as input and returns a private key sk and a public evaluation key ek. The unary notation of the security parameter 1λ expresses that Gen will be executed in polynomial time in regards to λ. We define the function as

psk,ekq Ð Gen 1λ , (3.3) ´ ¯ 1 where the message space is implicitly defined as M “t0,1uλ´ while the key space, comprised by all the possible keys generated by Gen is explicitly defined by K. Enc receives a private key sk P K and a message m P M as inputs, returning a ciphertext c. We define the function as

c Ð Enc psk, mq , (3.4)

where the ciphertext space is explicitly defined as C, comprising all the possible cipher- texts generated by Enc. Dec accepts a private key sk P K and a ciphertext c P C as inputs, returning message m P M. We define the function as

m “ Dec psk,cq . (3.5)

Eval accepts ek, a binary operation ˝ P O and ciphertexts c1,c1 P C as inputs, returning

the evaluated ciphertext ce P C as the result for operation ˝ between operands c1 and c2. We define the function as

64 3.2. A Theoretical Cryptographic Model

ce “ Eval pek, ˝,c1,c2q , (3.6)

where all the generated ce belong to C. For correctness @ , @ sk P K, @ m1,m2,m3 P M ś and @ ˝ P O it follows that

Dec psk, Enc psk, m1qq “ m1, (3.7)

and

Dec psk, Eval pek, ˝, Enc psk, m2q , Enc psk, m3qqq “ m2 ˝ m3. (3.8)

is homomorphic in regards to ˝ P O. For our purposes we let O “t˝| Add,SDivu, ś where Add is a deterministic polynomial-time addition algorithm and SDiv is a deterministic polynomial-time scalar division algorithm. Add receives as inputs ciphertexts c1 and c2, returning a ciphertext ce. We define the function as

ce “ Add pek,c1,c2q . (3.9)

Sdiv receives as inputs a ciphertext c1 and a scalar α, returning a ciphertext ce. We define the function as

ce “ SDiv pek,c1,αq (3.10)

For correctness, it follows that @ ci Ð Encpsk,miq,i “ 1,2 and @ α P R, the following holds:

Dec psk, Eval pek, Add, Enc psk, m1q , Enc psk, m2qqq “ m1 ` m2, (3.11)

and

Dec psk, Eval pek, SDiv, Enc psk, m1q , Enc psk,αqqq “ m1 α. (3.12) L

65 3.2. A Theoretical Cryptographic Model

3.2.2 A Protocol For Exchanging Keys

We formalize an abstract key-exchange protocol as a tuple of algorithms, such that “ ř pIDGen,SubKey,KeyExchq. All elements of are efficient polynomial-time algorithms, where ř each program is intended to be processed in the same sequence, and therefore can be organized in a nested structure. However, they are presented separately for clarity and definition of purpose. The first algorithm of the sequence is IDGen, a probabilistic ID generator that λ accepts a security parameter 1 as input and outputs a private ID IDprivA and a public ID

IDpubA. We define the syntax as

λ pIDprivA,IDpubAq Ð IDGen 1 , (3.13) ´ ¯

where IDprivA and IDpubA belongs to the same user or node (i.e., A). In this case, both parties are trying to reach a common secret symmetric key, and after that IDprivA and

IDpubA are disposable. The second algorithm of the tuple is SubKey, a deterministic sub-key generator. A node running SubKey inserts as inputs its private ID IDprivA and a public ID received from the node it is trying to share a common secret with (e.g., IDpubB). For instance, if SubKey is run by party A, then A provides IDprivA and B’s IDpubB, outputing A’s sub-key subkeyA. We define the syntax as

subkeyA “ SubKey pIDprivA,IDpubBq . (3.14)

After executing SubKey, A sends subkeyA to B through an insecure channel. When the algorithm is run by B, SubKey receives as inputs IDprivB and A’s IDpubA, outputing sub-key subkeyB. Following that, B sends subkeyB to A through an insecure channel. Finally, KeyExch is a deterministic key-exchange algorithm, accepting as input a private

ID IDprivA and a third-party sub-key subkeyB. We define its syntax as

symkeyAB “ KeyExch pIDprivA, subkeyBq . (3.15)

66 3.2. A Theoretical Cryptographic Model

When KeyExch is run by node A, it accepts A’s IDprivA and B’s sub-key subkeyB as inputs, outputting a secret symmetric key symkeyAB. Conversely, when KeyExch is run by node B it accepts B’s IDprivB and A’s sub-key subkeyA, outputting the same secret key symkeyAB. For correctness, the resulting outputs from IDGen and SubKey are required to satisfy the following condition:

KeyExch pIDprivA, subkeyBq “ KeyExch pIDprivB, subkeyAq “ symkeyAB. (3.16)

For our definitions of IDGen (3.2.2), SubKey (3.2.2), and KeyExch (3.2.2), it is implied that the function is executed from the perspective of user A. Therefore, the variable symkeyAB symbolizes that a request for a key exchange procedure was initiated from A towards B.

3.2.3 Transferring Ownership by Key Update

We suggest a definition for a key update protocol as a tuple of deterministic polynomial-time Ť algorithms such that “ pTokGen,KeyUpdq. Ť TokGen is a deterministic polynomial-time algorithm that accepts as input an old secret key skold and old evaluation key ekold, a new secret key sknew and a new evaluation key eknew and outputs a token t. The syntax is defined as t “ TokGenpskold,ekold,sknew,eknewq.

The token t is sent to the new owner of sknew via an insecure channel. KeyExch is a deterministic polynomial-time algorithm that accepts as input a token t and a ciphertext cold and outputs a ciphertext cnew. The syntax is defined as cnew “

KeyUpdpt,coldq.

For all skold,sknew output by Gen, all t output by TokGen and cold output by Enc, given cold “ Encpskold,mq, correctness requires the following:

Dec psknew,KeyUpd pt,coldqq “ m. (3.17)

67 3.3. Summary of Our Contribution

is secure if no efficient adversary A can solve for skold,sknew by knowing t,cnew,cold. Ť

3.3 Summary of Our Contribution

In this chapter, we explained how blockchain could be modeled into a particular solution for a problem with technical and ideological sides. With the evolution of the concept, new issues emerged, showing the same long-time commercial problems. To focus on feasible alleviation measures, we summarized instances of the problem as it can occur in a BaaS environment. The theoretical solution was designed from the information perspective because data is regarded as unintelligible if it is impossible to comprehend its meaning. Consequently, an encrypted datum has its ownership defined by the possession of a key that decrypts it, making the originating plaintext intelligible. Therefore, for all c returned by an encryption function Encpsk,mq, it holds that c is an unintelligible datum for anyone not possessing sk. In addition to that, for all cold output from Encpskold,mq, a homomorphic transfer of ownership consists of a transformation of cold into cnew via the homomorphic change of keys, such that sknew represents the possession of cnew’s meaning. Our contributions target the instances of the problem by managing (1) power imbalance, (2) confidentiality, and (3) ownership. First, we contributed with a mapping between concepts from economics to technological solutions. Second, we devised an abstract model that serves as a design pattern to guide new applications’ development. Figure 3.1 shows our contributions in this chapter, where we make a distinction on the concepts given by the existent literature.

68 3.3. Summary of Our Contribution

BLOCKCHAIN

ABSTRACT MODEL / DESIGN PATTERN

MAPPING

CONTRACTUAL SOLUTIONS RENEGOTIATION MECHANISM OWNERSHIP TRANSFER

LITERATURE CONTRIBUTION

Figure 3.1: Mapping and abstract model contribution.

Furthermore, we define what is our mapping between concepts and the abstract model as follows:

1. Analytical mapping:

(a) In a given set of organizations, a tool must implement a distributed data sharing consortium without an imbalance of power over the database. In this case, a private blockchain (i.e., distributed ledger) can reduce power imbalance through coordination, commitment to the rules (i.e., smart contracts) and control. However, a blockchain framework must implement the intended application under the CASC principles;

(b) If a technological tool provides mediation of data between commercial partners (a), then an efficient mechanism to protect data ownership could be added, allowing privacy-preserving computation. HE can allow that, serving as a renegotia- tion mechanism, while enabling privacy-preserving computation through smart contracts.

(c) If both previously mentioned tools can be incrementally implemented (a, b), they could be equipped with the capacity to transfer the encrypted data ownership without decrypting it. Secure instances of a key update protocol can securely transfer ownership without losing bargaining power in that setting;

69 3.3. Summary of Our Contribution

2. Finally, an abstract model for theoretically scaffolding a solution is provided. The model is intended to guide the placement of cryptographic mechanisms and is imple- mented on top of a blockchain setting that follows the aforementioned criteria.

The HE scheme and the key update protocol are contributions from [41]. In figure 3.2, we summarize the concepts approached in this chapter, giving an overall perception of the solution triad offered by the improvement of a framework with our cryptographic contributions.

• Integration Mechanism • Renegotiation Design • Immutability • Secrecy • Shared Ownership • Computability HOMOMORPHIC BLOCKCHAIN ENCRYPTION

OWNERSHIP TRANSFER • Revelation Mechanism MECHANISM • Bargain Power • Selective Disclosure

Figure 3.2: Abstract solution triad for ownership management.

70 CHAPTER 4

Research Results and Propositions

Aleksandr Yakovlevich Khinchin, a relevant mathematician for the soviet school of probability theory, quotes in his book [113] an interesting observation of P. L. Chebyshev: “The bringing together of theory and practice leads to the most favorable results; not only does practice benefit, but the sciences themselves develop under the influence of practice, which reveals new subjects for investigation and new aspects of familiar subjects. Khinchin pointed out the important relationship between theoretical science and its practical counterpart. This chapter intends to turn the abstract model introduced in Chapter 3 into an implementation-friendly document. We will choose the encryption scheme and cryptographic protocols that fit our goals, along with a DLT framework, providing practical instructions implemented by the SCOT CLI application to be presented in Chapter 5.

4.1 Applied Objects and Operations

The basic object of our operations is a multivector. In GA, these elements consist of a linear combination of scalars and vectors, also being the result of a geometric product of vectors. We use multivectors as our vessels to carry encrypted data, posteriorly applying transformations. In Section 4.3 we define how to encode data into these objects and properly execute the necessary operations. Multivectors are formed by elements described as blades or grades. The number of blades can range from 0 to n, where n represents the dimension of the multivector, forming ¯ a n-blade object M P Gn. G denotes the geometric product space in the n-th dimension. To distinguish a multivector from other structures, we use a capital letter with an overbar such ¯ ¯ as M. A multivector M can be written as 4.1. Applied Objects and Operations

¯ M “ xBy0 ` xBy1 ` ... ` xByn, (4.1)

where x¨yi is the multivector blade, for i Pt0,1,...,nu.

We consider any message m P Zq as our input for encryption. However, such integer must go through an encoding process to become a multivector. Once encoded into such G3 object, many operations will be available in a geometric product space q. A given modulus q is reducing our inputs and also the computation of multivector’s coefficients. The main operations performed for analysis over encrypted multivectors will be the addition and scalar division. The addition of two multivectors is a component-wise sum of coefficients. For instance, ¯ ¯ 2 ¯ ¯ for A, B P G where A “ a0 ` a1e1 ` a2e2 ` a12e12 and B “ b0 ` b1e1 ` b2e2 ` b12e12 the addition is defined as

¯ ¯ A ` B “ pa0 ` b0q ` pa1 ` b1q e1 ` pa2 ` b2q e2 ` pa12 ` b12q e12. (4.2)

Likewise, the subtraction follows the same pattern such that

¯ ¯ A ´ B “ pa0 ´ b0q ` pa1 ´ b1q e1 ` pa2 ´ b2q e2 ` pa12 ´ b12q e12. (4.3) ¯ The scalar multiplication of multivector M by α, where α P Zq is computed by multi- ¯ plying each coefficient of M by α, such that

¯ αM “ αxBy0 ` αxBy1 ` ... ` αxByn. (4.4) ¯ On the other hand, the scalar division M α is computed by multiplying each coefficient ¯ 1 L of M by x “ α´ where x is the multiplicative inverse of α such that αx “ 1. Therefore

¯ 1 1 1 xM “ xBy0 ` xBy1 ` ... ` xBy . (4.5) α α α n However, encryption and decryption will make use of many other transformations such that the multiplication of two multivectors, following the standard geometric product. The concept of decryption for us depends of the invertibility of a multivector, where the inverse

72 4.2. Blockchain Framework

¯ 1 is denoted by M ´ . The inverse of a multivector is a combination of a geometric product, ¯ and involutions such as reverse and Clifford conjugation, respectively denoted by M : and M. A inverted multivector is computed as

1 : : ´ ¯ ´1 ¯ ¯ ¯ M “ M MM MM MM , (4.6) ´ ¯ ˆ ´ ¯ ˙ ¯ ¯ 1 such that MM ´ “ 1. More on GA, its objects and related operations can be found in [157,165–167].

4.2 Blockchain Framework

When analyzing a solution, we proposed a DLT to be used as a tool to establish a common ground between parties. Now, we suggest an application that can offer enough flexibility to apply the abstract cryptographic model. With an industry ready framework at hand, the ∆ model can be implemented by a library, so that it can be imported into a CLI or Software Development Kit (SDK). Moreover, smart contracts can use it to define operations. In doing so, members of a consortium can approve the behavior of a script before having it persisted in the blockchain. Also, critical operations such as Eval and KeyUpd will be executed under the governance of a DLT environment. For the generation of encrypted artifacts, a CLI can be used out of the bounds, because any sensitive data sent through an insecure channel or persisted in a blockchain must be encrypted. Amongst the private blockchain applications analyzed, we choose the Hyperledger Fabric framework. Fabric is an extensible distributed system with a flexible modularity for components. For example, its consensus protocol is pluggable allowing the system to be tailored for specific contexts. Additionally, the framework pioneered the use of general- purpose languages for its smart contracts (e.g., Go, Java, Node.js), differently from many other initiatives that require domain-specific languages. In Fabric, smart contracts are delivered in packages called chaincodes, a term that will be used interchangeably throughout this work.

73 4.3. Cryptographic Scheme

Fabric applies passive (i.e., primary-backup replication) and active replication, favoring performance and flexibility. In the endorsement process, only a subset of peers executes every transaction, addressing any non-deterministic algorithm. Also, endorsement policies define which peers will decide about the correct execution of the chaincode. The script is executed in isolation and the result is checked by every peer. Although it incorporates some degree of centralization, the tool shows a balanced mixture between the security expected by purists and the performance and governance practiced by the industry.

4.3 Cryptographic Scheme

We choose the HE scheme introduced in [41], that is based on the under determination of a system of equations, since that is considered a computational hard problem. Also, the encryption function applies randomization by design, transforming an input message m into ¯ G3 a randomized multivector M P q. As a result, the generated ciphertext will always be different, even for the same input. Additionally, a modular multiplication is performed with a private factor, meaning that the recovery of m will require an operation of a secret operand with a modular multiplicative inverse. The procedure is then finalized with a triple geometric product with two secret multivectors, previously generated as keys. This construction poses a difficulty equivalent to the resolution of an under-determined system of equations that is not redundant.

This encryption scheme works with integers in Zq. Hence, its encryption function Enc receives integers as inputs, creating ciphertexts were the computation was executed over operands modulo a prime number. Likewise, the decryption function Dec also returns integers in Zq. However, the operations allowed over encrypted multivectors are the homomorphic addition, represented by the algorithm Add, and the scalar division, defined in the algorithm SDiv. The homomorphic addition will always result in an integer. On the other hand, a division may result in a rational number that is not closed in Zq. Therefore, the division is executed through the modular multiplicative inverse, and the resulting integer is mapped to its equivalent fractional format through the use of the Extended Euclidean Algorithm (EEA). Consequently, the EEA algorithm is an auxiliary implementation inside the scheme.

74 4.3. Cryptographic Scheme

This scheme is denoted by Π as the following tuple of efficient (e.g., probabilistic polynomial-time) algorithms:

Π “ pGen, Enc, Dec, Add, SDivq (4.7)

4.3.1 Key Generation

The key generation is implemented by the algorithm Gen (equation 4.8), that receives a λ ¯ ¯ security parameter 1 in order to output a tuple comprised by a secret key sk “ K1,K2,g ` ˘ and a public evaluation key pk “ pb,qq.

psk,pkq Ð Gen 1λ . (4.8) ´ ¯ The security parameter 1λ will define the number of bits that will be used in the generation of the multivectors’ coefficients, along with q and g. First, Gen sets b “ λ 8, then L finds the first prime greater than 2b and define it as q. After that, 16 uniform b-bit integers ¯ are chosen, from which 8 integers will become the coefficients of K1 and the remaining 8 ¯ ¯ ¯ integers will define the coefficients of K2. If K1 or K2 do not have an inverse, then the process of finding new integers shall restart. Finally, an uniform b-bit integer is taken to represent g. An algorithmic form of Gen can be seen in algorithm 1.

4.3.2 Encryption

¯ ¯ The probabilistic function Enc receives as input the secret key sk “ K1,K2,g , the public ¯ ` ˘ parameter pk, and a message m P Zq to output a ciphertext C as follows:

¯ C Ð Enc psk, pk, mq . (4.9)

¯ ¯ G3 8 Internally, the message m will be encoded into a multivector M, and since M P q, coefficients must be defined. First, 7 coefficients (i.e., m0,m1,m2,m3,m13,m23,m123) are b uniformly selected from the set 0,...,2 ´ 1 . Second, the m12 coefficient is defined as (

75 4.3. Cryptographic Scheme

Algorithm 1: Key generation algorithm Gen noKeys :“ true; NUMBERKEYS :“ 2; λ b :“ t 8 u; q :“the smallest primeą 2b; g :“ uniformly chosen from t0,...,2b ´ 1u; k :“rNUMBERKEYSs; while noKeys do i :“ 0; for i :“ 0 to NUMBERKEYS ´ 1 do : 15 0 2b 1 sris “ psiqi“0 @ si uniformly chosen from t ,..., ´ u; end while i ă NUMBERKEYS do j :“ i ˆ 8; ¯ : 7 ¯ M “ 0 sk`jenk ; k“ ¯ if HasInverseř M then ¯ ki :“ M; ` ˘ i :“ i ` 1; if i ą 1 then noKeys :“ false; break; end else break; end end end sk :“ pk0,k1,g0q; pk :“ pb,qq; return psk,pkq;

m12 “ p´m0 ´ m1 ` m2 ` m3 ´ m13 ` m23 ` m123 ` mq mod q, (4.10) ¯ where message m becomes part of m12. Finally, the encoded multivector M is defined as

¯ M “ m0e¯0 ` m1e¯1 ` m2e¯2 ` m3e¯3 ` m12e¯12 ` m13e¯13 ` m23e¯23 ` m123e¯123. (4.11)

76 4.3. Cryptographic Scheme

¯ ¯ ¯ Lastly, Enc computes M 1 “ Mg and returns the encrypted multivector C, defined by ¯ ¯ ¯ 1 ¯ C “ K1M K2. The algorithm 2 describes the encryption function Enc.

Algorithm 2: Encryption algorithm Enc : 7 0 2b 1 s “ psiqi“0 @ si uniformly chosen from t ,..., ´ u; s4 :“ p´s0 ´ s1 ` s2 ` s3 ´ s5 ` s6 ` s7 ` mq mod q; ¯ M :“ s0e¯0 ` s1e¯1 ` s2e¯2 ` s3e¯3 ` s4e¯12 ` s5e¯13 ` s6e¯23 ` s7e¯123; ¯ ¯ M 1 :“ Mg; ¯ ¯ ¯ 1 ¯ C :“ K1M K2; ¯ return C;

4.3.3 Addition

¯ The Add function, takes as input a public evaluation key pk and two ciphered multivectors A ¯ ¯ and B to return another ciphered multivector C, such that

¯ ¯ ¯ C “ Add pk, A, B . (4.12) ` ˘ ¯ ¯ The function executes a component-wise addition of every coefficient of A and B. Algorithm 3 describes the steps for the homomorphic addition performed by Add.

Algorithm 3: Homomorphic addition algorithm Add n :“ p0,1,2,3,12,13,23,123q; ¯ 7 C :“ 0 ppai ` biq mod qq e¯ni ; i“¯ returnř C;

4.3.4 Scalar Division

The scalar division algorithm SDiv receives an input a public evaluation key pk, a scalar ¯ ¯ α P Zq, and an encrypted multivector A, returning an encrypted multivector C such that

¯ ¯ C “ SDiv pk,α, A . (4.13) ` ˘ 77 4.3. Cryptographic Scheme

¯ The returned multivector is computed as a scalar division of all coefficients in C by α. Algorithm 4 describes the steps for the homomorphic scalar division performed by SDiv.

Algorithm 4: Homomorphic scalar division algorithm SDiv n :“ p0,1,2,3,12,13,23,123q; ¯ 7 ai C :“ 0 mod q e¯ni ; i“¯ λ returnř C; `` ˘ ˘

4.3.5 Decryption

The decryption function Dec, takes the secret key sk, a public evaluation key pk and an ¯ encrypted multivector C as input, returning the original message m as follows:

¯ m “ Dec sk,pk, C . (4.14) ` ˘ ¯ ¯ The algorithm retrieves the original message by first, calculating M 1 such that M 1 “ ¯ ´1 ¯ ¯ ´1 ¯ ¯ K1 CK2 . The triple geometric product is calculated with the inverse of keys K1 and K1. ¯ ¯ ¯ Moreover, the encoding multivector M is yield by M “ M 1 g. Then, the message is decoded L through

m “ pm0 ` m1 ´ m2 ´ m3 ` m12 ` m13 ´ m23 ´ m123q mod q. (4.15)

However, the resulting value of m must be mapped to a rational representation by the Extended Euclidean Algorithm (EEA), such that m “ a b “ EEApq,mq. The EEA can take L a prime q and a positive integer m P Zq and be computed by algorithm 5.

Consequently, even when using the function SDiv, the EEA allows the decryption of a ciphertext by mapping the output to a rational form of m. The decryption algorithm 6 with the inclusion of algorithm 5 (i.e., EEA) is shown below.

78 4.4. Key Exchange Protocol

Algorithm 5: Extended Euclidean Algorithm EEA a :“rp,cs; b :“r0,1s; i :“ 1; p while ai ą 2 do Yb ] : ai´1 q “ ai ; Y ] ai`1 :“ ai´1 ´ qai; bi`1 :“ bi´1 ´ qbi; i :“ i ` 1; end ai return bi ;

Algorithm 6: Decryption algorithm Dec ¯ 1 ¯ ´1 ¯ ¯ ´1 M :“ K1 CK2 ; ¯ : M¯ 1 M “ g ; m :“ pm0 ` m1 ´ m2 ´ m3 ` m12 ` m13 ´ m23 ´ m123qmod q; m :“ EEApq,mq; return m;

4.4 Key Exchange Protocol

In this section we present a concrete implementation of the algorithms for the key-exchange protocol “ pIDGen,SubKey,KeyExchq. In our example, if users A (e.g., Alice) and B (e.g., ř Bob) want to devise a private symmetric key using a public communication channel, they first must execute the IDGen algorithm. Since the next steps performed by SubKey and KeyExch will use GA operations over multivectors, IDGen will output multivectors for the private and public IDs. First, user A will generate a pair of IDs by issuing the number of bits (i.e., λ) to function IDGen:

¯ ¯ λ Apriv, Apub Ð IDGen 1 . (4.16) ` ˘ ´ ¯ On the other side, user B will also generate a pair of IDs by issuing the same number of bits:

79 4.4. Key Exchange Protocol

¯ ¯ λ Bpriv, Bpub Ð IDGen 1 . (4.17) ` ˘ ´ ¯ Both users will generate a private e public multivectors, that do not have an inverse. Function IDGen is defined in algorithm 7.

Algorithm 7: IDGen i :“ 0; ids :“r2s; while i ă 2 do p :“ GeneratePrimepλq; ¯ M :“ NumberToMultivector ppq; if HasNoInverse M¯ then ¯ idsris :“ M; ` ˘ i :“ i ` 1; end end ¯ ¯ return rApriv,Apubs;

Furthermore, A and B can establish communication and send their public IDs to each ¯ other. Both A and B will locally compute G, a public non-invertible multivector defined as follows:

¯ ¯ ¯ G “ ApubBpub. (4.18) ¯ Both G and respective sub-key of each user will be calculated through the SubKey ¯ ¯ function, inside algorithm 8. A and B have respectively private multivector Apriv and Bpriv ¯ ¯ ¯ as secret identities. Therefore, A must compute Asub “ AprivG and send it to B. Likewise, ¯ ¯ ¯ B computes Bsub “ GBpriv and send it to A. Function SubKey is defined as follows:

¯ ¯ ¯ Asub “ SubKey Apriv, Bpub, isClient , (4.19) ` ˘ where isClient Pttrue,falseu.

80 4.4. Key Exchange Protocol

Algorithm 8: SubKey if isClient then ¯ ¯ ¯ G :“ ApubBpub; ¯ ¯ ¯ Asub :“ AprivG; else ¯ ¯ ¯ G :“ BpubApub; ¯ ¯ ¯ Asub :“ GApriv; end ¯ return Asub;

¯ The geometric product originating G is non commutative, therefore isClient defines who is initiating the request in order to keep operands in a fixed order, such that the public identification of client and server will not change positions in the equation 4.18, regardless ¯ of whom is executing the function. Since G is not invertible, an attacker could not directly ¯ recover the private ID from other’s sub-key. For instance, A cannot recover Bpriv from ¯ ¯ ¯´1 ¯ ¯ ¯ ¯ Bsub because Bpriv ‰ G GBpriv. Similarly, B cannot recover Apriv from Asub because ¯ ¯ ¯ ¯´1 Apriv ‰ AprivGG . ¯ ¯ After computing and sending Asub and Bsub, A and B have what they need to generate the shared symmetric secret key. The definition of KeyExch will change to receive isClient as a parameter such that

¯ Ksym “ KeyExch pApriv,Bsub, isClientq , (4.20)

so that the algorithm can manage the specifics for the commutative characteristics of the geometric product.

The parties now will compute Ksym locally through algorithm 9 (i.e., KeyExch).

As it happens with Diffie-Hellman, the shared secret key is never transmitted. A ¯ ¯ ¯ ¯ ¯ ¯ computes Ksym “ AprivBsub ` G whereas B computes Ksym “ AsubBpriv ` G. Now, A and B have the same shared key. ¯ ¯ ¯ ¯ Since Bsub “ GBpriv, when A computes Ksym it follows that

81 4.5. Key Update Protocol

Algorithm 9: KeyExch if isClient then ¯ ¯ ¯ ¯ Ksym :“ AprivBsub ` G; else ¯ ¯ ¯ ¯ Ksym :“ AsubBpriv ` G; end ¯ return Ksym;

¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Ksym “ AprivBsub ` G “ AprivGBpriv ` G, (4.21) ¯ ¯ ¯ ¯ and Asub “ AprivG when B computes Ksym by

¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Ksym “ AsubBpriv ` G “ AprivGBpriv ` G. (4.22)

4.5 Key Update Protocol

In this section, we propose a development that fulfills the definitions introduced previously in the abstract model in regards to the key update protocol. We convey a key update mechanism that safely permits one to update the secret key of a current ciphertext without applying decryption or any trusted third-party component in the process. Our process consists in generating a token that will not reveal anything from the old and new secret keys. Once created, the token can be shared over an insecure channel since the protocol is designed based on under determinancy, meaning that no one should be able to derive the original message. The key update protocol is a tuple comprised of the following efficient algorithms: Ť

“ pTokGen, KeyUpdq (4.23) ď 4.5.1 Token Generation

The TokGen algorithm receives as input the key skold, representing the old secret key to be changed in a ciphertext, and the new key sknew, as the private key to overwrite the previous one. Also, parameters pkold and pknew are provided. The public parameters will be compared

82 4.5. Key Update Protocol

to check the compatibility between skold and sknew; otherwise, the substitution will not allow correct decryption of old messages. Since pk have the number of bits b, used to generate the key, and modulus q, used for calculations, these values must be equal. The function is defined as

t “ TokGen pskold,sknew,pkold,pknewq . (4.24)

¯ ¯ ¯ ¯ ¯ ´1 ´1 The algorithm will compute t as a tuple T1,T2 , where T1 “ K21K11 g1 g2, and ¯ ¯ ´1 ¯ ` ˘ T2 “ K12 K22. Algorithm 10 shows the steps taken in the calculation of t.

Algorithm 10: Token generation algorithm TokGen

if qold ““ qnew then ¯ ¯ ¯ ´1 ´1 T1 :“ K21K11 g1 g2; ¯ ¯ ´1 ¯ T2 :“ K12 K22; end ¯ ¯ return rT1,T2s;

4.5.2 Key Update

The procedure of updating keys is performed by the KeyUpd algorithm, where the function ¯ ¯ ¯ receives as input a token t “ T1,T2 , a public parameter pk and an old ciphertext Cold. Then, ` ˘ the algorithm will calculate a triple geometric product finally generating a new ciphertext ¯ ¯ ¯ ¯ Cnew “ T1ColdT2, such that

¯ ¯ ¯ ¯ Cnew “ T1ColdT2

¯ ¯ ´1 ´1 ¯ ¯ 1 ¯ ¯ ´1 ¯ “ K21K11 g1 g2K11M g1K12K12 K22 (4.25) ¯ ¯ 1 ¯ “ K21M g2K22 ¯ ¯ ¯ “ K21MnewK22. ¯ The KeyUpd algorithm returns a new version of the previous ciphertext Cold updated ¯ with the new secret key sknew (i.e., Cnew). The function is defined as

83 4.6. Conclusion

¯ ¯ Cnew “ KeyUpd pk,t, Cold . (4.26) ` ˘ The following algorithm 11 represents the computational steps taken by the KeyUpd function.

Algorithm 11: Key update algorithm KeyUpd ¯ ¯ ¯ ¯ Cnew :“ T1ColdT2; ¯ return Cnew;

4.6 Conclusion

We started this chapter by defining components for the model presented in Chapter 3. We defined an existent private blockchain framework in Section 4.2, that shows an architecture able to apply the principles defined in Section 1.4.1. With such a tool at hand one can implement the contractual solution mentioned in Section 3.1.1. After that, we presented our implementation-wise contribution showed in [41], comprised by a cryptographic scheme that allows a key update protocol. Finally, we defined concrete counterparts for our proposition through functions and algorithms. Via functional constructions, we have demonstrated the implementation of a HE scheme and a key upgrade protocol that match our model introduced in Section 3.2, as a way to extend blockchain existing capabilities. With a limited set of basic functions contained in GA we provide easy but effective cryptographic protocols to equip a DLT with a homomorphic smart contract. Without weakening current business principles regarding the BaaS environment, our proposition can be used to homomorphically compute encrypted data, and grant data ownership safely. Figure 4.1 shows how the concrete model develops on top of the previous design pattern introduced in 3.2, and based on the assumptions of 3.1. The figure 4.1 diagram shows the existing concepts in the literature and all the analytical foundation that will be implemented in Chapter 5.

84 4.6. Conclusion

HE SCHEME KEY EXCHANGE KEY UPDATE

CONCRETE MODEL

BLOCKCHAIN

ABSTRACT MODEL / DESIGN PATTERN

MAPPING

CONTRACTUAL SOLUTIONS RENEGOTIATION MECHANISM OWNERSHIP TRANSFER

LITERATURE CONTRIBUTION

Figure 4.1: Concrete model contribution.

85 CHAPTER 5

Software Architecture of SCOT

In this chapter, we aim to materialize the concepts introduced in Chapter 3 and expanded in Chapter 4. We do so by demonstrating a CLI system, a HE library, and a smart contract in a simple private blockchain environment. This section contrasts with the previous chapters by its software engineering characteristic, where we discuss implementation choices for theoretical counterparts. Although modeled in a reduced scope, this proof of concept is used to show that an instance of the problem defined in Chapter 2 can have a feasible solution. We start by analyzing the problem again in a fictional context, creating a small example that poses similar characteristics to the original issue. However, we focus on technical and social real-world constraints to furthermore present an implementation. When presenting the practical solution, we initially explain our choice for a local CLI interactive application as a BaaS environment prototype. This is followed by an explanatory sequence of events that turns our abstract model presented in 3.2 into a concrete interaction of blockchain components. Additionally, we define the functionalities necessary for a complete case of persistence, computation, and retrieval of information. Finally, we provide a guide for the system’s setup and execution, defining the prerequisites and commands for a straightforward demonstration. Figure 5.1 shows the relationships between the theoretical concepts laid down in Chapter 3 (i.e., mapping, abstract model), the concrete model defined in Chapter 4, and the implementation of the cryptographic scheme and key update protocol through the SCOT application. The system is built on top of the Hyperledger Fabric framework, the most adopted tool for BaaS cloud environments. 5.1. System Development

SCOT

CLI

SMART CONTRACT LIBRARY HE SCHEME KEY EXCHANGE KEY UPDATE

CONCRETE MODEL

BLOCKCHAIN

ABSTRACT MODEL / DESIGN PATTERN

MAPPING

CONTRACTUAL SOLUTIONS RENEGOTIATION MECHANISM OWNERSHIP TRANSFER

LITERATURE CONTRIBUTION

Figure 5.1: Complete contributions diagram.

5.1 System Development

The definition of a case contextualizes the problem here. The story describes an organizational relationship where their interactions pose some constraints that will help us visualize the issue related to data custody and privacy. These concerns can also arise when companies start to analyze which of their partner’s rules will play an important role in a legal contract. In our example, these rules are described by restrictions imposed in Section 5.1.3.

5.1.1 Use Case Scenario

We start off by defining a fictional story where the world is aware of a new infectious disease. The new illness is spreading fast; however, nobody can define the new threat’s hazardousness since the information available has been recognized as inconclusive and sometimes deceiving. Therefore, in order to establish a statistical foundation for further decisions by the government, a research institution called Schwarz in Los Angeles CA, was nominated to make an initial

87 5.1. System Development

analysis over the mortality. Schwarz represents the actions of the United States government for this matter. The company tries to find a pattern for those patients that did not recover from hospitalization. On the other side, there is a country called Val Verde, a Central-American location that was reported as the ground zero for the outbreak. DeSouza is the main hospital in Val Verde and was the first institution to report an increasing number of cases, thus getting a lot of international media attention. Consequently, different countries started to ponder about measures and restrictions on their borders. However, too strict measures could affect the country in other areas. On the other hand, less rigorous actions could also overload the capacity to deal with the problem. Now, governments of both countries are willing to cooperate to share information. Understanding which factors make people more vulnerable to the disease will guide the authorities in creating measures that will protect the ones in jeopardy and avoid extreme decisions that compromise the whole economy and diplomacy. Their representatives met in a last-minute call, and both countries decided to put together a system where data will be shared. To develop such a system, one technology team for each institution was appointed by the authorities of their countries. Those teams will be in charge of implementing solutions based on government and health institutions’ restrictions. They will create, modify, and operate the system, finally generating reports.

5.1.2 Requirement Analysis

The situation came to a point where each country’s government representatives met and decided to work together, forming a joint council. Also, they organized teams to support the operation regarding the setup and development of a distributed system. However, since they started this effort, the number of infections has increased daily, and the hospital started to report an alarming number of deaths. The new report made an impact on how the technology teams were designing the new system. Consequently, the council understood that a continuous delivery approach would be better for the actual situation. But notwithstanding the urgency of such a scenario, they need to define which cryptography will be used and test its feasibility. Therefore, both companies

88 5.1. System Development

agreed to implement a proof-of-concept system in order to test the application. This demo will be developed jointly by both parts, and in the first stage, they will avoid any obscurity in the application by executing each component directly. The system in consideration must show the following attributes:

1. A private blockchain system;

• Both companies need to approve the scripts to be deposited into the system;

• Functions available in the system should not be modified without the approval of both parts;

• A company can compute data from another company, but the result should not be disclosed until revealed by the owner of the key;

• The means by which parties will exchange keys are not in the scope.

2. Homomorphic computation;

• Sensitive data must be encrypted, but descriptive unencrypted data is allowed for record selection;

• The chosen scheme must allow the following operations:

– Addition;

– Scalar division.

• The cryptographic library must show performance compatible with the consensus used by the blockchain framework.

3. Homomorphic update of ciphertext’s keys.

Our demonstration comprises this first round of development.

5.1.3 System Constraints

There is a federal law in Val Verde to protect the health information of its citizens. This law encompasses all forms of medical records, including electronic, written, and oral. Therefore, a solution must preserve the privacy of patients’ data, although, in the prototype stage, fake

89 5.2. SCOT System Design

patients’ names will be saved in plain text in order to facilitate the identification of records. In this case, the information to be encrypted is the number of pre-existent diseases of each individual admitted to the hospital. Their names can be encrypted with any other non-HE scheme when the system goes to production. In a definition of roles, Schwarz Research uses data from DeSouza hospital and make the analysis. DeSouza hospital inserts information daily feeding the system. The hospital also analyzes computations made by Schwarz and acts denying or agreeing by making the processed information disclosed to the research company. The mortality data will also be inserted by the hospital when death certificates are issued for an individual. From this registration, Schwarz can know that a patient was marked as "expired." This information is not encrypted since it is used for the selection of patients. With that information at hand, the research company can analyze when grouping those patients by their diagnoses. Schwarz Research will calculate the average number of pre-conditions for this group of patients. The result will be analyzed by DeSouza hospital to make sure it is compliant with its legal agreement.

5.2 SCOT System Design

We previously presented a problem in Section 2.4.1, where we explained the trust model and its influence on the issue of data custody. Therefore, a proposed application should address those concerns by suggesting tools designed to map theoretical solutions to practical functionalities. We bind those issues to a solution that preserves privacy, offering analysis over data without ownership loss. First, we suggest a private blockchain framework to be adopted by the parties [32]. Private blockchains are designed to be more efficient than their public counterparts because their consensus mechanisms are usually not based on brute force. Additionally, all parties involved need to be identified, which resembles a reality already existent when companies agree to a legal contract previously, therefore limiting participants to semi-honest nodes. A distributed system like a private blockchain can alleviate the problem of ownership, assuring that every part will have a copy of the data, reducing power imbalance.

90 5.2. SCOT System Design

Second, we choose a HE scheme introduced in our publication [41]. The reason for a homomorphic scheme is that sharing non-homomorphic encrypted data has very limited use for partners. Additionally, a blockchain is not the right place to store private data if it is not intended to also be used by others. Therefore, the encrypted data needs to be computable, allowing analysis and data aggregation. Also, large datasets are not the best choice for blockchains and can be stored elsewhere. As a result, we focus on small datasets that can offer insights. Finally, even with a distributed system assuring the share of data and a cryptographic scheme that allows computation, we may not have a solution for this problem if we cannot transfer the ownership of computed results. When computing over homomorphically encrypted data, the system will generate results only intelligible for key owners. In addition to that, the transfer of results (i.e., changing the keys) must occur at participants’ sight, through agreed upon algorithms. No third-party environment should be used to perform this operation. Hence, we need to change the keys without decrypting the content in the environment that is common to both parties. For that reason, we apply a key update mechanism that transfers ownership. With those technologies combined, we can promote cooperation in the system, minimiz- ing the risk. To summarize, our proposition comprises:

1. Distributed Ledger Technology;

2. HE scheme;

3. Ownership transfer mechanism.

5.2.1 SCOT Architecture View

We use the abstract model presented in Chapter 3 to describe how each component in our system relates to the tuple ∆ depicted in figure 5.2. The abstract model ∆ was not meant to represent concrete components, but it is an isolation of concepts that can be implemented by any number of software artifacts. We decided to implement the elements of ∆ through three main components: a system library, a CLI program, and a smart contract.

91 5.2. SCOT System Design

The library component (1 in figure 5.2) implements all the necessary mathematical functions used by the encryption scheme , the key exchange protocol , and the key update ś ř protocol . The CLI component (2 in figure 5.2) is a software that imports the library and Ť extends its properties by creating interactive user functions. The smart contract component (3 in figure 5.2) also imports the library in order to compute ciphertexts homomorphically.

1 LIBRARY

2 CLI

USER A USER B 3 BLOCKCHAIN

SMART CONTRACT

Figure 5.2: Abstract Architecture

Although the library implements all the core functions of the scheme, the CLI component has a smaller scope. For instance, it uses the Gen, Enc, and Dec functions from the encryption scheme to equip users with the basic functionalities that should be performed outside the ś blockchain. Users in this context can also be other systems (e.g., web systems, database management systems, etc.) that can trigger these functions as part of their procedures. Also, the CLI makes use of the key-update protocol function TokGen, to generate a token that Ť can be sent later to the blockchain. The smart contract in this model uses Eval from the encryption scheme . In this ś case, Eval represents any available homomorphic operation implemented by the library and is employed in the computation of ciphertexts resulting from Enc. In our calculation of a

92 5.2. SCOT System Design

mean, we will use the homomorphic addition of ciphertexts and the homomorphic division of ciphertext by a scalar. The smart contract component also applies the KeyUpd operation from to update the keys of ciphertext with the use of a received token that was generated Ť and sent previously by a CLI.

5.2.2 Why a CLI Prototype?

A valid question that can be asked at this point would be related to why our results are demonstrated through a command-line program when the main problem introduced in this work is bound to the virtualized service of the blockchain (i.e., BaaS). It is indeed a reasonable inquiry if we take into account that private blockchains have been experienced through online services. In order to answer this question, we first analyze the blockchain framework used for this proof of concept. The Hyperledger Fabric blockchain framework (Fabric for short) has been adopted by the main cloud, and BaaS providers (e.g., Amazon, IBM, Oracle, SAP, etc), and companies started to use the tool due to its open-source nature, modularity, and general-purpose programming language of its smart contracts. Such a combination of factors has put Fabric in the lead. Therefore, how do Fabric works? When looking into the Fabric project, what can be seen is a codebase of many separated sub-projects, that when compiled into binaries for the target machine, will run as separated commands. For instance, a peer node in a Fabric blockchain network is a machine (virtual or physical) that runs the peer binary. The orderer node runs an orderer binary and communicates with the permissioned nodes. Along with the peer and orderer binaries, there are also commands for creating configurations files and network artifacts, such as configtxlator and configtxgen, utility programs for managing Fabric channel configurations. The behavior of these programs depends on environment variables or parameters that were set in the operating system or issued to the command at the time of invocation. Since these commands were previously compiled, they will look for configuration files and variables to make decisions on which channels must be created, where the orderer is in the network, or where are the certificates allowing a given company to participate in a consortium. Consequently, all the cloud services offering Fabric in a virtualized fashion are using these

93 5.2. SCOT System Design

commands in the lower level of their services. Therefore, if a concept can work at this level, then it can be applied to all of those providers. Additionally, distributed systems are complex environments, and errors are easily propagated in a cumbersome effect that can obfuscate the focus and effort employed in the development of a solution. It is important to note that every company created layers of abstraction to offer perks to their users since the open-source solution does not have many of the amenities present in their paid versions. These layers are not necessary for checking our assumptions and can hide important steps showing where the benefit comes from the framework or where the improvement was given by our contribution.

5.2.3 The Application Lifecycle

We organize our demonstration in three main events that can occur at different moments in time. This separation resembles a common use case for a real-world application. First, we show a cycle for persisting new encrypted information in the system. Second, we show the generation of a report or proposal that registers the intention of one of the parties to get insights on proprietary data. Lastly, a result is generated, and the demanding party is able to read the granted information. In this sequential demonstration, we use our theoretical model for the transfer of ownership, not defining any specific technology, but defining steps that could be performed with any tool that is able to do so.

5.2.3.1 Persisting Information

In figure 5.3, we can see a USER A that owns the information to be persisted in the blockchain. But first, he needs to generate a tuple of keys in order to encrypt a dataset. Therefore, he executes Gen through the CLI component, passing a security parameter 1λ. This operation is executed locally without transmitting any data. The CLI component saves a file containing his secret key sk1 and a public key pk1.

94 5.2. SCOT System Design

LIBRARY

USER A CLI USER B BLOCKCHAIN

Figure 5.3: Key Generation And New Record

¯ ¯ With both keys in hands the user can encrypt data m1 into C1 and m2 into C2. Both multivectors are returned as strings and represent the ciphered results of the equivalent messages. Now, USER A can connect to the blockchain and invoke a smart contract function to persist the encrypted data along with other information.

5.2.3.2 Generating a Report

At some point in the future, USER B wants to analyze the data persisted by USER A. To do so, the user will carry a specific selection of transactions to homomorphically compute an aggregated result. All of his actions are limited by the functions available at the smart contract. This script was persisted after careful analysis of both parties so that the code that can be executed does not contravene any of the partnership rules. One of the functionalities agreed between the companies is GenReport, a scalar division of an arbitrary number of ciphered values (figure 5.4).

95 5.2. SCOT System Design

LIBRARY

USER A CLI USER B BLOCKCHAIN

Figure 5.4: Report Generation

To execute the function, USER B must provide the idsledger of the records to be ¯ calculated. At the end of the procedure, a result C3 will be persisted at the blockchain, and the idledger related to the result will be returned to the user. Furthermore, USER B can ¯ negotiate with USER A the ownership transfer of result C3.

5.2.3.3 Generating a Result ¯ After being notified that USER B solicited access to the information encrypted in C3, USER ¯ A downloads C3 from the blockchain and check its value locally by decrypting it. Once ¯ USER A decides to grant USER B access to C3, he will generate a new key pair psk2,pk2q as depicted in figure 5.5. These new keys, along with the old keys psk1,pk1q will be used in the process of generating a token t. Again the CLI component is used to locally generate the token that soon will be sent over an insecure channel.

When the key pair psk2,pk2q is generated, USER A must find a way to share this information with USER B. The generation of new keys could be a result of a key exchange protocol between USER A and USER B, before USER A generates a token t. The CLI component is the system that could implement such a procedure in a client/server fashion. In this version of the prototype, we did not implement the key exchange mechanism.

96 5.3. Application

LIBRARY

USER A CLI USER B BLOCKCHAIN

Figure 5.5: Token generation and key update.

USER A generated token t and now can connect to the blockchain, invoking function

GenResult of the smart contract by passing the idLedger of the encrypted result and token t. The smart contract will use function KeyUpd from the library to update the keys defined by ¯ ¯ T1 and T2.

5.3 Application

This section is a walk-through of the prototype, where we first define the tools and prerequisites that will allow proper execution of commands and systems. Our focus is to favor the simulation in a local machine, avoiding the use of online services that can make blurry the understanding of roles. Before running the prototype, follow the instructions available at appendix A for configuring a proper environment. All the codes used to create our library, CLI program, benchmarks, and fabric test network are available in the GitHub public repository and can be managed by the Git tool. Git is a distributed version control system that efficiently helps to manage the code

97 5.3. Application

base of projects of any size. Due to its distributed, free and open-source nature, Git is used by software development teams to organize and control the creation of code. In general, developers will store their changes to the code in a remote (i.e., online) repository (i.e., repo) such as GitHub. A Git installation guide can be found in Section A.1.1 of the appendix. In order to generate a Fabric blockchain network, we cloned and modified the Hyperledger fabric-samples repo and hosted it at the GitHub online service. Therefore, it is necessary to have the Git tool installed in order to clone (i.e., download) the repo into the machine that will run its files. These samples were originally created by the Hyperledger community to facilitate the experimentation with Hyperledger Fabric by exploring features, building applications, and interacting with blockchain networks through SDKs. We modified the original files to create a new version that is more adapted to our context. A guide for cloning the fabric-samples repository can be found in Section A.1.2 of the appendix. When starting the test blockchain network from the fabric-samples local repository, Docker containers will be downloaded and run in the local machine. Docker is a platform for development that enables a separation of the application being developed from the infrastructure underlying it. Its engine allows the containerizing of an application making it easier to deploy and run the software. A container packages a system in such a way that all its dependencies (e.g., libraries, code, settings, etc.) are included. In our case, the Hyperledger Fabric binaries come installed within the Docker images. Additionally, we modified one of the Docker images to be compiled with our CLI program, making the command available when connecting a terminal to the container that executes the smart contract. Along with the Docker engine, the Docker Compose program helps to manage multi-container applications in a Docker environment, using YAML files to define configurations and settings for the containers at hand. The command tool can orchestrate the execution of containers defined in the configuration file. A guide for installing the Docker platform can be found in Section A.1.4 of the appendix. We programmed our library, CLI command, and smart contract using the Go language, also known as Golang. Go is a programming language developed by the Google team. It is open-source and very efficient. The Hyperledger Fabric framework was built upon Go, and

98 5.3. Application

the language can also be used to write smart contracts for this private blockchain tool. A guide for installing the Go language can be found in Section A.1.3 of the appendix. To summarize, in order to properly execute the tutorial to be introduced in the next steps, it is assumed that the following tools and files are installed and available in the system:

• Git;

• The fabric-samples cloned repository (latest version available);

• Go 1.14.x or higher;

• Docker;

• Docker Compose.

5.3.1 Prototype Execution

The prototype execution starts with the use of our CLI utility program, named phe-cli. The

program is already installed in the cli container, which was instantiated by the docker-compose command (instructions available at A.2). When the Docker command docker exec -it cli sh was issued in the terminal, it became connected to the container as a user connects to a cloud environment via Secure Shell (SSH).

When just typing the phe-cli name without arguments, the program will list a menu with the functions that are available for the user (figure 5.6). This resource is useful to check mandatory parameters of a given functionality.

Figure 5.6: List of available functions.

5.3.2 Inserting Sensitive Data Into The Blockchain

As discussed previously, one user will first generate his keys in order to encrypt sensitive data. For our purposes, the sensitive data is the number of pre-existing conditions of a

99 5.3. Application

patient. Also, the generated keys will be saved in a keyring file. A keyring file groups all the generated keys sequentially, letting the last created one be the valid key for encryption. When creating a token, the system will automatically combine the last and previous keys referenced in the keyring file. A user may have many different keyring files, but the sequence is only kept consistent per file.

Now, we start our key ring file by using the phe-cli command combined with the flag

--keygen. Additionally, we define a security parameter -l of 256. We also give a reference of the key ring file to be created, by using the -kr flag:

phe - cli -- keygen -l 256 -kr ./ vault .kr

As a result, sk and pk are created for any following encryption or decryption (figure 5.7).

This list of keys can always be printed by using the flag --listkeys.

Figure 5.7: Key generation output.

The keys now are saved in a file called vault.kr (figure 5.8). From now on, this file will be used as input for the next cryptographic commands.

Figure 5.8: Key ring file created by the key generation command.

We also make use of environment variables to temporarily save outputs from some commands. Sometimes, the results of our CLI program or smart contract invocation can be placed in a variable in order to be used as input for another command. In fact, we are going to encrypt an integer by issuing the flag --encrypt to the phe-cli command. The right format for the encryption is:

phe - cli -- encrypt -kr ./ vault .kr -m 3

100 5.3. Application

In this case, -m stands for the message to be encrypted: number three. It means that our next patient has three pre-existing diseases. The aforementioned version of the command prints the encrypted message to the screen. But for practical reasons, we will execute this command, saving the result in a variable called NUMBER_PRECONDITIONS:

NUMBER_PRECONDITIONS =$(phe - cli -- encrypt -kr ./ vault .kr -m 3)

After that, we can call the Fabric binary peer, a command that can be used to invoke function of our smart contract. The next command creates a new patient in the ledger (i.e., Kevin Peter Hall) with the encrypted number of conditions:

peer chaincode invoke -n mycc -c '{" Args ":["CreatePatient", " PATIENT1 ", " Kevin Peter Hall", "'$NUMBER_PRECONDITIONS'", "1", "1", " KEY1 "]}'-C myc

The function CreatePatient will be called in the script that was installed in the previous section and will return a response. If the function was executed with no errors, then a message of success is returned (figure 5.9).

Figure 5.9: Chaincode invoke successful output.

We now repeat the same process for a second patient and save a new number of pre-existing conditions:

NUMBER_PRECONDITIONS =$(phe - cli -- encrypt -kr ./ vault .kr -m 5)

This time, the second patient has 5 conditions. Likewise, we invoke again the blockchain script, persisting the information in the ledger:

peer chaincode invoke -n mycc -c '{" Args ":["CreatePatient", " PATIENT2 ", " Shane Black", "'$NUMBER_PRECONDITIONS'", "1", "1", " KEY1 "]}'-C myc

After that, we check if both patients were saved and the data format of their conditions.

If the conditions were encrypted, we would see a multivector for the preExistingConditions field. The following command queries the ledger and returns a set of registers:

peer chaincode query -n mycc -c '{" Args ":["AllPatients", " PATIENT1 ", " PATIENT3 "]}'-C myc

101 5.3. Application

The resulting output shows that the sensitive data was encrypted in a multivector (figure 5.10).

Figure 5.10: Chaincode query output.

5.3.3 Creating Analysis Over Third-Party Data

The next step would be performed by the user that do not have the keys. Since the smart contract imported the HE library, it has the desired mathematical operations and can compute analysis homomorphically by any member of this consortium. In fact, this is the first operation of the blockchain script that uses the specified library. To do so, the

CreateProposal function needs the public parameter pk[0].Q. We call this record a proposal, because one user is proposing a result that still needs to be analyzed by the owner. If the owner agrees with the result calculated, he can grant the right to read the information by executing a transfer of ownership. It is done when the user owning the information does not see a result that compromises the originating data.

You can list again the available keys by executing the phe-cli command with the flag

--listkeys:

phe - cli -- listkeys -kr ./ vault .kr

The output will be the similar to figure 5.7, where the existent keys are listed sequentially.

Now, a environment variable PK0Q can be created with the value of pk[0].Q:

PK0Q ="4294967311"

We now invoke the smart contract function, passing the identification of patients to be summarized and the public parameter:

peer chaincode invoke -n mycc -c '{" Args ":["CreateProposal", "PROPOSAL1", " USER_2 ", " USER1 ", "PATIENT1,PATIENT2", " KEY1 ", "'$PK0Q '"]}'-C myc

102 5.3. Application

Again, if the command was successfully executed, an output such as figure 5.9 will be returned. Now the user can check that a proposal or report was created by querying the ledger with the identification PROPOSAL1:

peer chaincode query -n mycc -c '{" Args ":["FindProposal", "PROPOSAL1"]}'- C myc

The output is presented in figure 5.11. The resulting value is a multivector that encrypts the sum of both patients’ pre-existing conditions divided by 2, the number of patients.

Figure 5.11: Proposal query output.

5.3.4 The Ownership Transfer

The user owning the information was informed that analysis was generated by a participant of the consortium. Therefore, he will first check the result by querying the blockchain and decrypting the value set for PROPOSAL1. We create a regular expression to filter the multivector value into a variable, then we execute the peer binary to find the newly created proposal. Create a multivector pattern as a string in an environment variable:

MULTIVECTOR_PATTERN ="[0-9]+e0\+[0-9]+e1\+[0-9]+e2\+[0-9]+e3\+[0-9]+e12 \+[0-9]+e13\+[0-9]+e23\+[0-9]+e123"

Now invoke the smart contract and save the proposal value by filtering only the multivector value:

PROPOSAL_VALUE =$( peer chaincode query -n mycc -c '{" Args ":["FindProposal", "PROPOSAL1"]}'-C myc | egrep -o $MULTIVECTOR_PATTERN )

Now, the value can be decrypted with his locally persisted keys:

phe - cli -- decrypt -kr ./ vault .kr -c $PROPOSAL_VALUE

If the operation was successfully executed, the user would see that a mean was calculated for the pre-existent conditions. Figure 5.12 shows the result of decrypting the proposal value.

103 5.3. Application

Figure 5.12: Decryption output.

At this point, the owner of the data agrees to transfer the result to another user. He cannot do that by giving a key for the specified result because this result uses the same keys encrypting the original data of patients. As a solution, this result can be duplicated with keys that can be shared with the requester. In doing that, two results will exist in the ledger showing that a transfer was made. To execute a transfer, we will generate new keys and a token. Execute the following command:

phe - cli -- tokgen -kr ./ vault .kr

The new command creates a token based on the old and new keys. The new keys (sk[1], pk[1]) can be shared with the user requesting the analysis. The token tk[1] generated by the previous command (figure 5.13) will be used by the owner to duplicate the result and changing its keys.

Figure 5.13: Token generation output.

Copy the value of tk[1].TK1 and generate an environment variable named TK1TK1:

TK1TK1 ="2872409417e0+85502643e1+1643346174e2+44363374e3+3769890559e12 +868734026e13+1796679143e23+4006194681e123"

Similarly, copy the value of tk[1].TK2 and generate an environment variable named

TK1TK2:

TK1TK2 ="2753682028e0+158373017e1+1637534782e2+3969427492e3 +1070903132e12 +508056651e13+3985828380e23+907784257e123"

104 5.3. Application

Now, the owner can create a granted result for the analysis proposed by the requesting user. Using the tk[1].TK2 and tk[1].TK2 components of a token, and the public parameter pk[0].Q that remains in variable PK0Q, invoke the smart contract function CreateResult:

peer chaincode invoke -n mycc -c '{" Args ":["CreateResult", "PROPOSAL1", "' $TK1TK1 '", "'$TK1TK2'", " KEY1 ", "'$PK0Q '"]}'-C myc

Finally, a new result was generated by the key update mechanism that was performed inside function CreateResult.

5.3.5 Analyzing the Information Transferred

The requesting user is informed that a result of his proposition was granted. Hence, he can query the blockchain for the related result and decrypt the data with the key given by or exchanged with the owner. He can invoke the function FindResult and check if the ledger has any information regarding his report:

peer chaincode query -n mycc -c '{" Args ":["FindResult", " RESULT1 "]}'-C myc

The output for querying the blockchain is shown in figure 5.14. We can see that a result for PROPOSAL1 was generated, but we still need to check its plaintext version.

Figure 5.14: Chaincode query output.

We can copy the resulting value or use our multivector pattern that is still saved in the environment variable MULTIVECTOR_PATTERN. Use the following command to isolate the value that will be furthermore decrypted:

RESULT_VALUE =$( peer chaincode query -n mycc -c '{" Args ":["FindResult", " RESULT1 "]}'-C myc | egrep -o $MULTIVECTOR_PATTERN )

Now the user can decrypt the value using the CLI program, passing the ciphertext flag

-c and the variable RESULT_VALUE:

phe - cli -- decrypt -kr ./ vault .kr -c $RESULT_VALUE

105 5.3. Application

Figure 5.15 shows that the duplicated and key updated result is equal to the first calculation. From the requesting user perspective, analysis now can be inferred.

Figure 5.15: Chaincode query output.

The mechanism allowed both parties to interact over proprietary data and follow a protocol that preserves the ownership but also promotes a cooperative partnership.

106 CHAPTER 6

Performance Evaluation of SCOT

The number of bits needed to achieve the same level of security is different from an asymmetric scheme to a symmetric one. Therefore, we had purposefully increased the size of the keys used by our symmetric construction. The goal is not to compare security but performance, and although the aforementioned schemes are essentially different, we can approximate the efficiency comparison by creating equivalent keys in size before computing a benchmark. It is usual that a symmetric algorithm with a security level of b have a key of length b-bit [168]. However, the SCOT scheme shows that a security parameter of b yields a key of ¯ ¯ length « 2b (table 6.1), due to multivectors K1, K2 and the parameter g.

Security Level (λ) Bits sk (bits) pk (bits) 128 16 272 17 256 32 544 33 512 64 1088 65 1024 128 2176 129 2048 256 4352 257 4096 512 8704 513

Table 6.1: Bit lengths for different SCOT security levels.

It is recommended that a Paillier scheme must be around 2048 bits to be secure; however we run our experiment ranging from 128 to 4096 bits in order to show the tendency of runtime performance graphically. It is also important to note that the relationship between cryptographic strength and security is not as straightforward in the asymmetric case. However table 6.2 shows a general recommended bit lengths equivalency [168]. Therefore, in order to balance the comparison, we will increase the security parameter ¯ ¯ λ input for our scheme in order to have a similar number of bits for each key K1 and K2. Chapter 6 Performance Evaluation of SCOT

Family Cryptosystem Security Level (bits) Symmetric-key AES, 3DES 80 128 192 256 Integer factorization RSA 1024 3072 7680 15360 Discrete logarithm DH, DSA, Elgamal 1024 3072 7680 15360 Elliptic curves ECDH, ECDSA 160 256 384 512

Table 6.2: Bit lengths for different security levels.

Table 6.3 shows a relation of our security parameter in comparison to the number of bits achieved in scheme Π in order to match this specific implementation of Paillier.

Π Paillier Sec. Level (λ) Bits Bits 8192 1024 1024 16384 2048 2048 32768 4096 4096

Table 6.3: Equivalence of security parameters.

We use two 64-bit integer messages m1 and m2, where m1 “ 100 and m2 “ 200 as inputs for the comparative runtime. Also, the key size used to generate keys ranges from 128 to 4096 bits.

6.0.1 Key Generation

In the key generation execution, Paillier showed better performance overall. The Π scheme demonstrated a huge increase when reaching larger keys, especially over 2048 bits. Table 6.4 shows the data generated according to the number of input bits.

Bits Π (µs) Paillier (µs) 128 415.8810 328.0080 256 2876.7540 746.7560 512 3467.7330 2766.4240 1024 77330.2400 15448.5160 2048 614013.5320 177937.8090 4096 7112782.1650 1272091.9210

Table 6.4: Key generation data.

108 Chapter 6 Performance Evaluation of SCOT

Figure 6.1 gives a visual perception of how Π departs from a balanced comparison to Paillier.

8000

7000 Thousands

6000

5000

4000 PHE Paillier 3000 Runtime (microseconds)

2000

1000

0 128 256 512 1024 2048 4096 Bits

Figure 6.1: Key generation benchmark.

6.0.2 Encryption

When encrypting message m1 through the same input parameters, we can see quite the opposite effect. Scheme Π kept a linear behavior, while Paillier exponentially grew, as would be expected from constructions that rely on exponentiation (table 6.5).

Bits Π (µs) Paillier (µs) 128 30.4300 18.3830 256 42.7700 64.3260 512 74.9200 386.1410 1024 148.9750 2044.3790 2048 357.6550 14188.9460 4096 944.5440 105029.2370

Table 6.5: Encryption data.

The graphical plotting in figure 6.2 of both functions gives a better perception of how Paillier encryption departs from the linearity of Π.

109 Chapter 6 Performance Evaluation of SCOT

120

Thousands 100

80

60 PHE Paillier 40 Runtime (microseconds)

20

0 128 256 512 1024 2048 4096 Bits

Figure 6.2: Encryption benchmark.

6.0.3 Addition

The addition operation is a function frequently used in our context, like any other analytical calculation. It is important that such a procedure can be done with some degree of linearity, even in the presence of larger keys. In table 6.6, we report how the data behave for m1 ` m2 in different bit-size inputs.

Bits Π (µs) Paillier (µs) 128 1.8490 0.3350 256 1.8860 0.5020 512 1.9680 0.8790 1024 2.1090 1.8490 2048 2.3470 4.8500 4096 3.3930 36.5120

Table 6.6: Addition data.

Paillier started with better performance but rapidly spiked from 2048 bits on. Addition in Π kept a runtime under 5µs, even with a input of 4096 bits.

110 Chapter 6 Performance Evaluation of SCOT

35

30

25

20 PHE Paillier 15 Runtime (microseconds)

10

5

0 128 256 512 1024 2048 4096 Bits

Figure 6.3: Addition benchmark.

6.0.4 Decryption

The decryption procedure had a steady performance for both schemes, with a faster through- put for Π (table 6.7).

Bits Π (µs) Paillier (µs) 128 6290.1590 28909.3550 256 6292.7000 28363.4820 512 6196.6530 28755.1690 1024 6274.2660 28032.2870 2048 6309.2970 28012.8890 4096 5824.1550 28468.3490

Table 6.7: Decryption data.

Figure 6.4 shows a linear pattern in both constructions.

111 6.1. Experience Outline

35

30 Thousands

25

20

PHE 15 Paillier Runtime (microseconds) 10

5

0 128 256 512 1024 2048 4096 Bits

Figure 6.4: Decryption benchmark.

6.1 Experience Outline

Although different schemes were compared, both have similarities in their functionalities, such as privacy-preserving computation. The asymmetric scheme has the advantage of public keys to encrypt and delegate computation to a third party. However, we provide a mechanism to transfer the ownership of the computed data. Additionally, a key exchange mechanism can diminish the risk of exchanging and managing keys, a disadvantage in symmetric schemes’ social aspects. It is essential to mention that blockchain performance is a limiting factor for adopting any scheme. Regardless of their symmetric or asymmetric nature, the industry will analyze the cryptosystem’s security, along with the size of generated artifacts, and the construction’s throughput. A symmetric scheme can be secure with lower security parameters, and we immensely increased this input for the experiment io slow down our performance, reaching roughly a common ground with the asymmetric scheme. With proper parameters, our suggestion can have affordable ciphertext sizes that meet DLTs’ criteria.

112 6.1. Experience Outline

Our scheme has considerably greater keys, but the keys should be stored outside the blockchain. Additionally, the industry will push the size of the keys influencing the performance of computed ciphertexts. Therefore, for our purposes, the throughput when computing ciphertexts and the size of generated artifacts are more important than other processes runtime, such as the key generation.

113 CHAPTER 7

Lessons Learnt

This chapter describes our journey in the research of blockchain and the maturing process we have gone through to propose advances for the field. We believe that one can take advantage of our advancements, along with lessons taught by our mistakes. In a first contact with the subject, it is common to be misled by many conflicting sources. With a trending topic at hand, it is usual that publishers will look for authors in order to sell books and take advantage of the excitement. As a result, the market can be flooded with publications that carry misconceptions or narrow definitions, delaying a proper achievement of knowledge, creating difficulties for a solid foundation. In Section 7.1, we describe briefly our search for proper literature and how we defined trustworthy sources. Although it seems obvious that academic research is conducted in specific circles and therefore can be found through certain channels, the blockchain track does not follow an institutionalized pathway, but quite the opposite. For instance, the cypherpunks mailing list was the de facto peer-reviewing process for many of the ideas implemented by its participants. Additionally, there is no official definition for blockchain, and the meaning that it bears now was built by somehow conceptually reverse engineering its most popular application (i.e., Bitcoin). We discourse about our misconceptions in Section 7.2, and explain why such perceptions can be grasped even in the presence of seminal works. We then share our challenges when applying the knowledge acquired into a running prototype in Section 7.4, finishing with a benchmark in 7.3, where we compare our applied scheme to a construction based on the Paillier cryptosystem, a cryptographic scheme that has been considered for blockchain systems. 7.1. A Chain of Events

7.1 A Chain of Events

Our path started with the motivation to improve blockchain transactions with homomorphic computations. Previously, we have been applying experimental cryptographic schemes to still images and then video streams. Later on, a publication organized our findings in a more concise manner, demonstrating transformations over images and discussing the role of cloud computing providers when we pondered trust in virtualized environments. Along with those findings, GA properties were analyzed, such that the invertibility of multivectors and how the non-commutative property of the geometric product would be used as a key update mechanism. From that point, we could trace some similarities between the environment based on trust and blockchain’s new directions. We initiated the studies of blockchain by Bitcoin’s seminal work, following the cited papers and understanding the underlying concepts discussed by Nakamoto. One of the first mistakes that can occur when analyzing the document is considering that Bitcoin had a technological goal. The document is primarily based on strong ideological motivation, which can only be understood by examining the cypherpunks’ track of e-mails, the political and social environment at that time, and influential works. Being aware of this context helped to understand the dissociation of blockchain from the academic world, at least to some extent. Although Nakamoto used concepts nurtured in research circles, Bitcoin was more related to underground culture, only lately having reconciliation with academia. That helps explain why some concepts rely on opinions or educated guesses regarding the motivation behind some components (e.g., the relation between PoW and the byzantine problem). Therefore, to be pragmatic towards the subject, we tracked the origins of concepts approached by Nakamoto. It leads the research to investigate distributed systems, cryptography, probability, and the economy. Consequently, we reached concepts such as redundancy, replication, consensus, adversarial problems, and trust. While we were pursuing a clear understanding of the topic, the offers for virtualized versions of blockchains grew, in an economy of scale that matched our first analysis of cloud providers, posing the same threats previously investigated. The targeted customers were mainly companies experimenting with blockchain and migrating

115 7.2. Misleading Conceptions

their businesses to the cloud. We then checked our assumptions on our parallel cryptographic venue and found important similarities that would benefit the BaaS model. Our steps involved building the core functions to test the scheme’s feasibility, along with performance, security, and adherence to an industry-ready framework. There is a lot of debate regarding why or why not using a specific tool. What is relatively unanimous is the perception that one should know clearly the purpose for which the framework is going to be applied. A too broad concept may not get a good fit for some of the available options. Therefore, we aimed for an application that was predominant in most cloud services to prototype in a reduced but representative ecosystem. Finally, we searched for a baseline scheme that could show similarities with our experi- mental cryptographic system in order to compare. We focused mainly on the homomorphic operations offered, the acceptance of the construction by the blockchain community as a valid candidate, and the adherence to our premises listed in Section 1.4.1. We avoided any non-efficient proposition that relies on specialized components, trusted parties, or data segregation. As a result, Paillier was chosen to represent a valid option that could partially fit our abstract model, and if necessary, could be executed in a smart contract without much-added complexity.

7.2 Misleading Conceptions

Our first misconception was related to the degree of novelty in the blockchain concept. In fact, this is a recurrent misleading thought that is enforced by a wave of books that overextends the perception of usefulness for blockchain systems. This overvalued notion goes hand in hand with the idea that such a system is applicable for every case. After careful consideration, we realized that blockchain is way more restrictive than initially thought. Not just suffers in performance, but also has limitations in scalability. Additionally, limitations on the amount of data that can be persisted must be taken into account. A simplified decisional diagram can be seen in figure 7.1.

116 7.3. Analytical Benchmarking

Many Partial Same Lasting Lasting Public Shared data? parties? trust? rules? log? rules? transactions?

PRIVATE PUBLIC DON’T NEED BLOCKCHAIN! BLOCKCHAIN BLOCKCHAIN

Figure 7.1: Simplified blockchain decision tree.

In regards to security, blockchains are also at great risk when considering forward secrecy. Data breaches and attacks have always been improved, and users can be reckless with sensitive data (e.g., private keys). In this case, immutability brings the risk of conditioning the whole history of events to a lost secret key, creating a forever exposed track of events. In the original Bitcoin paper, the proposition aimed to create a mean by which individuals would transact without intermediaries. Therefore, a solution was devised with a reduced set of threats in mind, not foreseeing many of the attacks that were created posteriorly. That clarifies the understanding that no blockchain is the best option to rule all the scenarios, and a case by case approach is more recommended.

7.3 Analytical Benchmarking

We initiated our search for a baseline scheme considering a homomorphic construction with an unlimited number of executions for all the operations. Furthermore, we discovered a security issue in the scheme selected for the experiment. Although our construction would be conducted as an experimental effort, we aimed for a solution that could have better chances of adoption by guaranteeing better security assumptions. The first conclusion when looking for a more secure scheme was that we did not need an FHE scheme for this prototype. Our first plan was to calculate the hypothetical mortality of a fictional case. In reality, we not even needed a homomorphic scheme for this calculation since the expiration field of a patient was not intended to be encrypted. Therefore, we visualized a flaw in our analysis and changed our goals. Moreover, we choose the number of

117 7.3. Analytical Benchmarking

pre-existing conditions to be encrypted. In doing so, the average number of pre-conditions would be analyzed for expired patients, creating insights on a pattern showed by the disease. That could be realized by a scheme offering homomorphic addition and a division by a scalar. A fundamental characteristic of our first scheme was the ability to execute a key update on encrypted data. That was the differential leading a new world of possibilities for the private blockchain. Hence, the mathematical framework behind our candidate should bear this attribute, mainly because we are favoring computations that will not use a trusted environment to re-encrypt a given data. Therefore, we reduced the number of operations favoring the security aspect in a new proposition. The new scheme relies on the geometric algebra framework and keeps the remarkable attribute to execute a homomorphic key update. With the new scheme at hand, we choose the Paillier cryptosystem to have a comparative analysis. The first considered attribute was its homomorphic additive nature, that along with its scalar multiplication operation, makes the scheme a choice for blockchain applications such as voting. Although the Paillier cryptosystem is an asymmetric scheme and our construction is symmetric, equivalent security parameters can be found between both constructions in order to compare.

7.3.1 Generated Artifacts

As showed in equation 4.8, the Gen function receives a security parameter 1λ and outputs ¯ ¯ ¯ ¯ G3 a secret key sk “ K1,K2,g and public parameter pk “ pb,qq. It follows that K1,K2 P q, ` ¯˘ ¯ therefore each multivector K1 and K2 have 8 coefficients of b-bit integers each, where b “ λ 8. ¯ b ¯ L b Component g is also defined as an uniform b-bit integer, hence |K1| “ 8 ˆ 2 , |K2| “ 8 ˆ 2 and |g| “ 2b. The resulting secret key size is

¯ ¯ b b b b |sk| “ |K1| ` |K2| ` |g| “ 8 ˆ 2 ` 8 ˆ 2 ` 1 ˆ 2 “ 17 ˆ 2 . (7.1)

The public evaluation parameter pk “ pb,qq is comprised by b and q, where q is the smallest prime greater than 2b. We consider that the smallest prime greater than 2b falls into the next bit size, therefore we assume that |q| “ 2b`1. Our public evaluation parameter is then

118 7.3. Analytical Benchmarking

1 |pk| “ |b| ` |q| “ b ` 2b` . (7.2)

Z ¯ G3 Our encryption function Enc works with integers in q, consequently a ciphertext C P q have 8 coefficients of b-bit integers each, bounded by q on every operation. It holds that if 1 ¯ |q| “ 2b` , then each coefficient is at most |q|, resulting in a ciphertext C of size

¯ b`1 |C| “ 8 ˆ 2 . (7.3)

The Paillier cryptosystem is an asymmetric scheme and therefore generates a private and public key. Although our scheme defines a public evaluation parameter pk “ pb,qq, pk does not relate to the functionality of a public key in the asymmetric sense. The Paillier key generation function called here GenModulus, receives a parameter n defining the number of bits to be used in the generation of p and q, such that

pN,p,qq Ð GenModulusp1nq. (7.4)

N “ pq is the public key, where p and q are n-bit primes [36]. Our function Gen receives a security parameter 1λ, where λ does not relate directly to the number of bits. In our case, b “ λ 8 and is the equivalent parameter to n in the Paillier scheme. Therefore, for clarity, L we refer to b as the number of bits in the following definitions of ciphertext and secret and public keys in Paillier. Z˚ 2 Since ciphertext c P N 2 , we can define that c is at most N . The public key is N, and the secret key is xN,φpNqy, where φpNq “ pp ´ 1qpq ´ 1q resulting in N ` φpNq. We can see that these artifacts are defined in terms of N, and primes p and q have b-bits. The product pq cannot be less than b-bits or more than 2b-bits, as a result, we can assume the worst case scenario that N has at most 2b-bits, since N “ 2b2b “ 2b`b “ 22b. Consequently, 4 Z˚ 2 22b22b 24b c has at most b-bits due to the bounded ciphertext space N 2 , because N “ “ . Public key has size N “ 2b and secret key xN,φpNqy can be rewritten as x22b,p2b ´ 1q2y yielding a size of 22b ` p2b ´ 1q2-bits or « 42b-bits.

119 7.4. Development Challenges

7.4 Development Challenges

The development in the context of blockchain can be overwhelming. Although the technology relied upon known concepts from distributed systems and correlated fields, it also creates an interdisciplinary effort that may take a considerable time to settle all the necessary knowledge. The first most important decision relates to the applicability of blockchain for the task because the technology may not be the most suitable tool for the job. But once realized that this is the right choice, an analysis must be done amongst the available frameworks and pre-requisites for the intended project. We learned that these are the initial decisions guiding the development of a new system because that will define the architecture, programming languages, and distribution of functionalities. For instance, a blockchain can be used as a data storage or a computational infrastructure, being treated as a constituent part of a broader system. In this sense, another important design decision relates to the allocation of functionalities into components of a framework or tool. It means that crucial decisions will be made on which functionalities will be computed on-chain and off-chain, along with which and how much data the blockchain can afford to store, considering that a ledger will be an ever-growing track of data. This may involve not just computer-wise decisions but also legal agreements. Once deciding on one framework, one may find himself in a still-maturing code base, where non-detectable errors and mistaken tutorials do not perform as intended. In addition to that, many of the blockchain development tools aim to reproduce an environment of multi-machine distributed system. This creates a complex interaction, where the root cause of problems can be masqueraded. It can drive the development effort into a trial and error journey, where help can come from the interaction with an open-source community that obviously does not have its priorities aligned with your project deadline. Our mature takeaway from this work is that blockchain, like many other niches, is meant to be conducted by multidisciplinary teams, although prototypes can take advantage of testing components available for experimentation. Also, a solid understanding of disciplines

120 7.4. Development Challenges

behind distributed systems, computer networking, cryptography, and computer architecture is essential to conduct businesses that rely on DLTs.

121 CHAPTER 8

Future Directions

We suggest some of the future directions for this work, where contributions can reaffirm our findings, challenge our conceptions, or improve our results. In this research process, some questions were satisfied, but new inquiries arose in developing solutions. Some of the questions not yet properly answered are related to the combination of homomorphic computations. The greatest challenge is to realize efficient constructions (i.e., running in polynomial time) that properly satisfy all the model definitions. It is important to notice that the encryption scheme in the model does not need to be fully homomorphic for enabling the possible applications we discuss in the text. Instead, an efficient and reasonably compact CPA-secure SWHE scheme is probably enough for a large array of relevant real-world homomorphic DLT-based applications. For instance, we need to analyze the factors that define how many times certain operations can be executed in the scheme. In the same path, an analysis of bootstrap techniques to define the number of times an operation can be executed will be extremely useful. All in all, we believe that any further proposition should follow our CASC principle defined in Section 1.4.1. Those guidelines can help to stay aligned with core blockchain principles, avoiding losing the specifics that made this approach special. CHAPTER 9

Conclusion

In this work, we described blockchain’s actual stage as a mechanism of business facilitation in an environment where trust remains fragile. The growth of the technology and following adoption can now be perceived through a vast service option, where companies can take a controlled risk while experimenting with the new approach in a controlled environment. Although portrayed as a revolutionary technology, the blockchain advancements relied on a track of inherited contributions, from scholars to underground hackers, that we carefully discussed to set the stage for a comprehensive analysis of the subject at hand. Aware of its historical lineage, we could understand that the ingenious combination of existent inventions aimed for revolution. However, it ended up being adopted by the same institutions targeted as adversarial. However, similar problems are still encountered in business relationships that resemble the issues for which the tool was aimed to solve. We presented these problems not just as a privilege of the technological realm but as a constituent part of the human traits still pervasive in commercial relationships. Therefore, we analyzed specific solutions that could be represented technologically. We formalized a detailed investigation of solutions that can be combined into a pragmatic model for ownership management. In this analysis, we described a HE scheme. Due to its underlying mathematical framework, it can allow for a key exchange protocol and key update protocol, where the construction is feasible to be adapted to DLT frameworks. Furthermore, we created a proof of concept, where homomorphic computations could be executed in a blockchain environment, in a full cycle of a smart contract invocation. In this scenario, the theoretical model was successfully applied in a working example. Additionally, an analytical benchmark was conducted in comparison to a strong candidate for adoption by blockchain circles. We summarize our contributions: Chapter 9 Conclusion

• We provided a formal description of a model consisting of a HE scheme, a key exchange protocol and a key update protocol, along with an architecture built upon these homomorphic solutions to be applied in a DLT;

• We proposed the use of HE in DLT as a mechanism of ownership protection, renegoti- ation, and revision, offering a key exchange and key update protocols for ownership transfer;

• We found appealing pieces of evidence that the data custody problem is directly connected with the solution of critical open problems in DLT-based applications such as the Hold-Up and opportunism problems;

• We discussed the characteristics and implications of each problem associated with DLT and how our contributions addressed them;

• We illustrated the use of the model and architecture for solving the problem of ownership transfer between the original data owner and a designated third party through the use of a prototype that applies our theoretical propositions.

124 Bibliography

[1] S. Nakamoto et al., “Bitcoin: A peer-to-peer electronic cash system,” 2008.

[2] R. Manne, Short Black 9 Cypherpunk Revolutionary: On Julian Assange, vol. 9. Black Inc., 2015.

[3] E. Hughes, “A cypherpunk’s manifesto,” URL (accessed 3 August 2004): http://www.activism.net/cypherpunk/manifesto.html, 1993.

[4] W. Dai, “B-money,” Consulted, vol. 1, p. 2012, 1998.

[5] T. C. May, “The cyphernomicon: Cypherpunks faq and more, version 0.666,” Cypher- punks electronic mailing list, 1994.

[6] M. Carlsten, H. Kalodner, S. M. Weinberg, and A. Narayanan, “On the instability of bitcoin without the block reward,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 154–167, 2016.

[7] I. Eyal and E. G. Sirer, “Majority is not enough: Bitcoin mining is vulnerable,” in International conference on financial cryptography and data security, pp. 436–454, Springer, 2014.

[8] B. Johnson, A. Laszka, J. Grossklags, M. Vasek, and T. Moore, “Game-theoretic analysis of ddos attacks against bitcoin mining pools,” in International Conference on Financial Cryptography and Data Security, pp. 72–86, Springer, 2014.

[9] A. Narayanan and J. Clark, “Bitcoin’s academic pedigree,” Communications of the ACM, vol. 60, no. 12, pp. 36–45, 2017.

[10] P. A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency control and recovery in database systems, vol. 370. Addison-wesley New York, 1987.

125 Bibliography

[11] I. Bashir, Mastering blockchain. Packt Publishing Ltd, 2017.

[12] W. Abrar, “Untraceable electronic cash with digicash,” 1900.

[13] A. Back et al., “Hashcash-a denial of service counter-measure,” 2002.

[14] N. Szabo, “Bit gold.(2005),” 2005.

[15] V. Vinge, “True names, 1981,” True Names and the Opening of the Cyberspace Fron- tier,(ed. James Frenkel), TOR, New York, 2001.

[16] F. David, “The machinery of freedom: Guide to a radical capitalism,” 1973.

[17] X. Xu, I. Weber, and M. Staples, Architecture for blockchain applications. Springer, 2019.

[18] M. M. H. Onik and M. H. Miraz, “Performance analytical comparison of blockchain-as- a-service (baas) platforms,” in International Conference for Emerging Technologies in Computing, pp. 3–18, Springer, 2019.

[19] D. Tapscott and A. Tapscott, Blockchain Revolution: How the Technology Behind Bitcoin and Cryptocurrency is Changing the World. Portfolio Penguin, 2016.

[20] J. Singh and J. D. Michels, “Blockchain as a service (baas): providers and trust,” in 2018 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp. 67–74, IEEE, 2018.

[21] D. Rutherford, Routledge dictionary of economics. Routledge, 2013.

[22] D. L. Threedy, “Labor disputes in contract law: The past and present of alaska packers’ ass’n v. domenico,” Tex. Wesleyan L. Rev., vol. 10, p. 65, 2003.

[23] M. Schwarz and Y. Takhteyev, “Half a century of public software institutions: Open source as a solution to hold-up problem,” Journal of Public Economic Theory, vol. 12, no. 4, pp. 609–639, 2010.

126 Bibliography

[24] B. Ganglmair, L. M. Froeb, and G. J. Werden, “Patent hold-up and antitrust: How a well-intentioned rule could retard innovation,” The Journal of Industrial Economics, vol. 60, no. 2, pp. 249–273, 2012.

[25] R. T. Holden and A. Malani, “Can blockchain solve the hold-up problem in contracts?,” tech. rep., National Bureau of Economic Research, 2019.

[26] B. Klein, R. G. Crawford, and A. A. Alchian, “Vertical integration, appropriable rents, and the competitive contracting process,” The journal of Law and Economics, vol. 21, no. 2, pp. 297–326, 1978.

[27] I. Lapowsky, “Progressive democrats fight for access to the party’s voter data.”

[28] G. Zyskind, O. Nathan, and A. Pentland, “Enigma: Decentralized computation platform with guaranteed privacy,” arXiv preprint arXiv:1506.03471, 2015.

[29] E. Cecchetti, F. Zhang, Y. Ji, A. Kosba, A. Juels, and E. Shi, “Solidus: Confidential distributed ledger transactions via pvorm,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 701–717, ACM, 2017.

[30] A. Poelstra, A. Back, M. Friedenbach, G. Maxwell, and P. Wuille, “Confidential assets,” in International Conference on Financial Cryptography and Data Security, pp. 43–63, Springer, 2018.

[31] A. Kosba, A. Miller, E. Shi, Z. Wen, and C. Papamanthou, “Hawk: The blockchain model of cryptography and privacy-preserving smart contracts,” in 2016 IEEE sympo- sium on security and privacy (SP), pp. 839–858, IEEE, 2016.

[32] E. Androulaki, A. Barger, V. Bortnikov, C. Cachin, K. Christidis, A. De Caro, D. Enyeart, C. Ferris, G. Laventman, Y. Manevich, et al., “Hyperledger fabric: a distributed operating system for permissioned blockchains,” in Proceedings of the Thirteenth EuroSys Conference, p. 30, ACM, 2018.

[33] F. Benhamouda, S. Halevi, and T. T. Halevi, “Supporting private data on hyperledger fabric with secure multiparty computation,” IBM Journal of Research and Development, 2019.

127 Bibliography

[34] M. Hearn, “Corda: A distributed ledger,” Corda Technical White Paper, vol. 2016, 2016.

[35] J. Morgan, “Quorum whitepaper,” New York: JP Morgan Chase, 2016.

[36] J. Katz and Y. Lindell, Introduction to modern cryptography. CRC press, 2014.

[37] R. L. Rivest, L. Adleman, M. L. Dertouzos, et al., “On data banks and privacy homomorphisms,” Foundations of secure computation, vol. 4, no. 11, pp. 169–180, 1978.

[38] C. Gentry et al., “Fully homomorphic encryption using ideal lattices.,” in Stoc, vol. 9, pp. 169–178, 2009.

[39] H. A. da Silva and D. William, Fully Homomorphic Encryption Over Exterior Product Spaces. PhD thesis, University of Colorado Colorado Springs. Kraemer Family Library, 2017.

[40] D. W. H. A. da Silva, H. B. M. de Oliveira, E. Chow, B. S. Barillas, and C. P. de Araujo, “Homomorphic image processing over geometric product spaces and finite p-adic arith- metic,” in 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pp. 27–36, IEEE, 2019.

[41] D. W. H. A. da Silva, H. Oliveira, M. A. Xavier, E. Chow, and C. P. de Araujo, “Homo- morphic key update protocol based on clifford geometric algebra for distributed ledger technology,” in SCIS&ISIS, Joint 11th International Conference on Soft Computing and Intelligent Systems and 21st International Symposium on Advanced Intelligent Systems, 2020. To appear.

[42] V. Puri, S. Sachdeva, and P. Kaur, “Privacy preserving publication of relational and transaction data: Survey on the anonymization of patient data,” Computer Science Review, vol. 32, pp. 45–61, 2019.

[43] P. Vezyridis and S. Timmons, “Understanding the care. data conundrum: New informa- tion flows for economic growth,” Big Data & Society, vol. 4, no. 1, p. 2053951716688490, 2017.

128 Bibliography

[44] D. W. da Silva, C. P. de Araujo, and E. Chow, “Fully homomorphic key update and key exchange over exterior product spaces for cloud computing applications,” in 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC), pp. 25–251, IEEE, 2019.

[45] D. W. da Silva, C. P. de Araujo, E. Chow, and B. S. Barillas, “A new approach towards fully homomorphic encryption over geometric algebra,” in 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0241–0249, IEEE, 2019.

[46] D. W. da Silva, C. P. de Araujo, and E. Chow, “An efficient homomorphic data encoding with multiple secret hensel codes,” International Journal of Information and Electronics Engineering, vol. 10, no. 1, 2020.

[47] C. Barrera and S. Hurder, “Can blockchain solve the hold-up problem for shared databases?,” Prysm White Paper May, vol. 13, 2019.

[48] A. J. Casey and A. Niblett, “Self-driving contracts,” J. Corp. L., vol. 43, p. 1, 2017.

[49] B. Preneel, B. Van Rompay, J. J. Quisquater, H. Massias, and J. S. Avila, “Design of a timestamping system,” tehnicno porocilo projekta TIMESEC, 1998.

[50] S. Haber and W. S. Stornetta, “How to time-stamp a digital document,” in Conference on the Theory and Application of Cryptography, pp. 437–455, Springer, 1990.

[51] S. Haber and W. S. Stornetta, “Secure names for bit-strings,” in Proceedings of the 4th ACM Conference on Computer and Communications Security, pp. 28–35, Acm, 1997.

[52] D. Bayer, S. Haber, and W. S. Stornetta, “Improving the efficiency and reliability of digital time-stamping,” in Sequences Ii, pp. 329–334, Springer, 1993.

[53] H. Massias, X. S. Avila, and J.-J. Quisquater, “Design of a secure timestamping service with minimal trust requirement,” in the 20th Symposium on Information Theory in the Benelux, Citeseer, 1999.

129 Bibliography

[54] R. C. Merkle, “Protocols for public key cryptosystems,” in 1980 IEEE Symposium on Security and Privacy, pp. 122–122, IEEE, 1980.

[55] D. W. Kravitz, “Digital signature algorithm,” July 27 1993. US Patent 5,231,668.

[56] R. Wattenhofer, Distributed Ledger Technology: The Science of the Blockchain. Cre- ateSpace Independent Publishing Platform, 2017.

[57] D. Harding, “Bitcoin developer guide,” 2015.

[58] J. Nickerson, “Oliver williamson and his impact on the field of strategic management,” Journal of Retailing, vol. 86, no. 3, pp. 270–276, 2010.

[59] A. L. Schlafly, “The coase theorem: the greatest economic insight of the 20th century,” Journal of American Physicians and Surgeons, vol. 12, no. 2, pp. 45–48, 2007.

[60] W. S. O. Williamson, “Markets and hierarchies: Analysis and antitrust implications: A study in the economics of internal organization,” 1975.

[61] R. H. Coase, “The nature of the firm,” economica, vol. 4, no. 16, pp. 386–405, 1937.

[62] D. Chaum, “Security without identification: Transaction systems to make big brother obsolete,” Communications of the ACM, vol. 28, no. 10, pp. 1030–1044, 1985.

[63] W. Feller, “An introduction to probability theory and its applications,” 1957.

[64] J. L. Coolidge, “The gambler’s ruin,” The Annals of Mathematics, vol. 10, no. 4, pp. 181–192, 1909.

[65] M. A. Cusumano, A. Gawer, and D. B. Yoffie, The business of platforms: Strategy in the age of digital competition, innovation, and power. Harper Business New York, 2019.

[66] S. J. Grossman and O. D. Hart, “The costs and benefits of ownership: A theory of vertical and lateral integration,” Journal of political economy, vol. 94, no. 4, pp. 691–719, 1986.

[67] R. H. Coase, “The nature of the firm,” in Essential readings in economics, pp. 37–54, Springer, 1995.

130 Bibliography

[68] B. Burns, Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services. " O’Reilly Media, Inc.", 2018.

[69] M. Van Steen and A. S. Tanenbaum, Distributed systems. Maarten van Steen Leiden, The Netherlands, 2017.

[70] S. Tarkoma, Overlay Networks: Toward Information Networking. CRC Press, 2010.

[71] H. Kopetz and P. Verissimo, “Real time and dependability concepts,” in Distributed systems (2nd Ed.), pp. 411–446, 1993.

[72] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE transactions on dependable and secure computing, vol. 1, no. 1, pp. 11–33, 2004.

[73] F. Cristian, “Understanding fault-tolerant distributed systems,” Communications of the ACM, vol. 34, no. 2, pp. 56–78, 1991.

[74] V. Hadzilacos and S. Toueg, “Fault-tolerant broadcasts and related problems,” in Distributed systems (2nd Ed.), pp. 97–145, 1993.

[75] M. Pease, R. Shostak, and L. Lamport, “Reaching agreement in the presence of faults,” Journal of the ACM (JACM), vol. 27, no. 2, pp. 228–234, 1980.

[76] R. Shostak, M. Pease, and L. Lamport, “The byzantine generals problem,” ACM Transactions on Programming Languages and Systems, vol. 4, no. 3, pp. 382–401, 1982.

[77] D. P. Siewiorek, “Architecture of fault-tolerant computers: An historical perspective,” Proceedings of the IEEE, vol. 79, no. 12, pp. 1710–1734, 1991.

[78] N. Budhiraja, K. Marzullo, F. B. Schneider, and S. Toueg, “The primary-backup approach,” Distributed systems, vol. 2, pp. 199–216, 1993.

[79] F. B. Schneider, “Implementing fault-tolerant services using the state machine approach: A tutorial,” ACM Computing Surveys (CSUR), vol. 22, no. 4, pp. 299–319, 1990.

131 Bibliography

[80] E. F. Moore and C. E. Shannon, “Reliable circuits using less reliable relays,” Journal of the Franklin Institute, vol. 262, no. 3, pp. 191–208, 1956.

[81] A. Avizienis, G. C. Gilley, F. P. Mathur, D. A. Rennels, J. A. Rohr, and D. K. Rubin, “The star (self-testing and repairing) computer: An investigation of the theory and practice of fault-tolerant computer design,” IEEE Transactions on Computers, vol. 100, no. 11, pp. 1312–1321, 1971.

[82] R. L. Heacock, “The voyager spacecraft,” Proceedings of the Institution of Mechanical Engineers, vol. 194, no. 1, pp. 211–224, 1980.

[83] D. Fischer, Mission Jupiter: The Spectacular Journey of the Galileo Spacecraft. Springer Science & Business Media, 2001.

[84] J. H. Wensley, L. Lamport, J. Goldberg, M. W. Green, K. N. Levitt, P. M. Melliar- Smith, R. E. Shostak, and C. B. Weinstock, “Sift: Design and analysis of a fault-tolerant computer for aircraft control,” Proceedings of the IEEE, vol. 66, no. 10, pp. 1240–1255, 1978.

[85] L. Moser, M. Melliar-Smith, and R. Schwartz, “Design verification of sift,” 1987.

[86] M. Just, “Some timestamping protocol failures.,” in NDSS, vol. 98, pp. 89–96, 1998.

[87] L. Lamport, “The part-time parliament,” ACM Transactions on Computer Systems (TOCS), vol. 16, no. 2, pp. 133–169, 1998.

[88] L. Lamport et al., “Paxos made simple,” ACM Sigact News, vol. 32, no. 4, pp. 18–25, 2001.

[89] M. Castro, B. Liskov, et al., “Practical byzantine fault tolerance,” in OSDI, vol. 99, pp. 173–186, 1999.

[90] A. S. Tanenbaum and H. Bos, Modern operating systems. Pearson, 2015.

[91] L. Leslie, “The part-time parliament,” ACM Transactions on Computer Systems, vol. 16, no. 2, pp. 133–169, 1998.

132 Bibliography

[92] P. Mell, T. Grance, et al., “The nist definition of cloud computing,” 2011.

[93] A. Deshpande, K. Stewart, L. Lepetit, and S. Gunashekar, “Distributed ledger technolo- gies/blockchain: Challenges, opportunities and the prospects for standards,” Overview report The British Standards Institution (BSI), vol. 40, p. 40, 2017.

[94] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al., “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, 2010.

[95] L. Duo, “Legal challenges for data-driven society,” in 2017 ITU Kaleidoscope: Challenges for a Data-Driven Society (ITU K), pp. 1–6, IEEE, 2017.

[96] D. R.-J. G.-J. Rydning, “The digitization of the world from edge to core,” Framingham: International Data Corporation, 2018.

[97] S. Tahir and M. Rajarajan, “Privacy-preserving searchable encryption framework for permissioned blockchain networks,” in 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1628–1633, IEEE, 2018.

[98] “Tradelens.” https : / / www . tradelens . com. Accessed: 2020-09-17.

[99] I. Allison, “Ibm-maersk shipping blockchain gains steam with 15 carriers now on board.” https : / / www . coindesk . com / ibm-maersk-shipping-blockchain-gains-steam-with-15-carriers-now-on-board. Accessed September 17, 2020.

[100] N. Morris, “How sap’s blockchain strategy is fundamentally different.” https : / / www . ledgerinsights . com / sap-blockchain-strategy. Accessed September 17, 2020.

[101] D. Paluch, “Nobody wants to join maersk and ibm.” https : / / www . blockchain24 . co / nobody-wants-to-join-maersk-and-ibm. Accessed September 17, 2020.

133 Bibliography

[102] D. Kuhn, “Nestle announces new blockchain initiative separate from ongoing ibm project.” https : / / www . coindesk . com / nestle-announces-new-blockchain-initiative-separate-from-ongoing-ibm-project. Accessed September 17, 2020.

[103] “Ibm food trust.” https : / / www . ibm . com / blockchain / solutions / food-trust. Accessed: 2020-09-17.

[104] N. Morris, “Kuehne + nagel, largest freight forwarder adopts blockchain.” https : //www.ledgerinsights.com/kuehne-nagel-freight-forwarder-blockchain. Ac- cessed September 17, 2020.

[105] L. A. Gordon, M. P. Loeb, and L. Zhou, “The impact of information security breaches: Has there been a downward shift in costs?,” Journal of Computer Security, vol. 19, no. 1, pp. 33–56, 2011.

[106] W. Primoff and S. Kess, “The equifax data breach: What cpas and firms need to know now,” The CPA Journal, vol. 87, no. 12, pp. 14–17, 2017.

[107] Y. Zou, A. H. Mhaidli, A. McCall, and F. Schaub, “"i’ve got nothing to lose": Consumers’ risk perceptions and protective actions after the equifax data breach,” in Fourteenth Symposium on Usable Privacy and Security (tSOUPSu 2018), pp. 197–216, 2018.

[108] A. Thompson, “Dnc chair tom perez goes to war with state parties.” https : / / www . politico . com / story / 2018 / 12 / 16 / democrats-perez-state-parties-1066665. Accessed September 17, 2020.

[109] M. Dawoud and D. T. Altilar, “Cloud-based e-health systems: Security and privacy challenges and solutions,” in 2017 International Conference on Computer Science and Engineering (UBMK), pp. 861–865, IEEE, 2017.

[110] L. Chen and D. B. Hoang, “Novel data protection model in healthcare cloud,” in 2011 IEEE International Conference on High Performance Computing and Communications, pp. 550–555, IEEE, 2011.

134 Bibliography

[111] S. Kullback, Information theory and statistics. Courier Corporation, 1997.

[112] C. E. Shannon, “A mathematical theory of communication,” The Bell system technical journal, vol. 27, no. 3, pp. 379–423, 1948.

[113] A. Y. Khinchin, Mathematical foundations of information theory. Courier Corporation, 2013.

[114] A. Ben-Naim, Information Theory - Part I: An Introduction To The Fundamental Concepts. World Scientific, 2017.

[115] E. C. Cherry, “A history of the theory of information,” 1953.

[116] Y. Bar-Hillel, “An examination of information theory,” Philosophy of Science, vol. 22, no. 2, pp. 86–105, 1955.

[117] A. Barenghi, N. Mainardi, and G. Pelosi, “Comparison-based attacks against noise-free fully homomorphic encryption schemes,” in International Conference on Information and Communications Security, pp. 177–191, Springer, 2018.

[118] L. Ducas and D. Micciancio, “Fhew: bootstrapping homomorphic encryption in less than a second,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 617–640, Springer, 2015.

[119] C. Gentry, S. Halevi, and N. P. Smart, “Better bootstrapping in fully homomorphic encryption,” in International Workshop on Public Key Cryptography, pp. 1–16, Springer, 2012.

[120] J. Fan and F. Vercauteren, “Somewhat practical fully homomorphic encryption.,” IACR Cryptol. ePrint Arch., vol. 2012, p. 144, 2012.

[121] Y. Wang and Q. M. Malluhi, “Privacy preserving computation in cloud using noise-free fully homomorphic encryption (fhe) schemes,” in European Symposium on Research in Computer Security, pp. 301–323, Springer, 2016.

[122] D. Liu, “Practical fully homomorphic encryption without noise reduction.,” IACR Cryptol. ePrint Arch., vol. 2015, p. 468, 2015.

135 Bibliography

[123] A. Kipnis and E. Hibshoosh, “Efficient methods for practical fully homomorphic symmetric-key encrypton, randomization and verification.,” IACR Cryptol. ePrint Arch., vol. 2012, p. 637, 2012.

[124] J. Li and L. Wang, “Noise-free symmetric fully homomorphic encryption based on noncommutative rings.,” IACR Cryptol. ePrint Arch., vol. 2015, p. 641, 2015.

[125] K. Nuida, “A simple framework for noise-free construction of fully homomorphic encryption from a special class of non-commutative groups.,” IACR Cryptol. ePrint Arch., vol. 2014, p. 97, 2014.

[126] H. Simon, “Models of man. 1957,” New York, 1957.

[127] O. Hart and J. Moore, “Incomplete contracts and renegotiation,” Econometrica: Journal of the Econometric Society, pp. 755–785, 1988.

[128] D. G. Baird, J. P. Dawson, W. B. Harvey, and S. D. Henderson, “Contracts: Cases and comment,” 2008.

[129] W. J. Baumol, “Williamson’s the economic institutions of capitalism,” 1986.

[130] S. Goldwasser, S. Micali, and C. Rackoff, “The knowledge complexity of interactive proof systems,” SIAM Journal on computing, vol. 18, no. 1, pp. 186–208, 1989.

[131] D. Hopwood, S. Bowe, T. Hornby, and N. Wilcox, “Zcash protocol specification,” GitHub: San Francisco, CA, USA, 2016.

[132] F. McKeen, I. Alexandrovich, I. Anati, D. Caspi, S. Johnson, R. Leslie-Hurd, and C. Rozas, “Intel® software guard extensions (intel® sgx) support for dynamic memory management inside an enclave,” in Proceedings of the Hardware and Architectural Support for Security and Privacy 2016, pp. 1–9, 2016.

[133] M. Li, Y. Zhang, Z. Lin, and Y. Solihin, “Exploiting unprotected i/o operations in amd’s secure encrypted virtualization,” in 28th tUSENIXu Security Symposium (tUSENIXu Security 19), pp. 1257–1272, 2019.

136 Bibliography

[134] S. Pinto and N. Santos, “Demystifying arm trustzone: A comprehensive survey,” ACM Computing Surveys (CSUR), vol. 51, no. 6, pp. 1–36, 2019.

[135] R. N. Watson, J. Woodruff, P. G. Neumann, S. W. Moore, J. Anderson, D. Chisnall, N. Dave, B. Davis, K. Gudka, B. Laurie, et al., “Cheri: A hybrid capability-system architecture for scalable software compartmentalization,” in 2015 IEEE Symposium on Security and Privacy, pp. 20–37, IEEE, 2015.

[136] J. Benaloh, “Dense probabilistic encryption,” in Proceedings of the workshop on selected areas of cryptography, pp. 120–128, 1994.

[137] T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE transactions on information theory, vol. 31, no. 4, pp. 469–472, 1985.

[138] S. Goldwasser and S. Micali, “Probabilistic encryption & how to play mental poker keep- ing secret all partial information,” in Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali, pp. 173–201, 2019.

[139] P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in International conference on the theory and applications of cryptographic techniques, pp. 223–238, Springer, 1999.

[140] R. Cramer, I. Damgård, and J. B. Nielsen, “Multiparty computation from threshold homomorphic encryption,” in International conference on the theory and applications of cryptographic techniques, pp. 280–300, Springer, 2001.

[141] C. Fontaine and F. Galand, “A survey of homomorphic encryption for nonspecialists,” EURASIP Journal on Information Security, vol. 2007, pp. 1–10, 2007.

[142] M. Hirt and K. Sako, “Efficient receipt-free voting based on homomorphic encryption,” in International Conference on the Theory and Applications of Cryptographic Techniques, pp. 539–556, Springer, 2000.

137 Bibliography

[143] I. Damgard, M. Geisler, and M. Kroigard, “Homomorphic encryption and secure comparison,” International Journal of Applied Cryptography, vol. 1, no. 1, pp. 22–31, 2008.

[144] H.-Y. Lin and W.-G. Tzeng, “An efficient solution to the millionaires’ problem based on homomorphic encryption,” in International Conference on Applied Cryptography and Network Security, pp. 456–466, Springer, 2005.

[145] M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully homomorphic encryption over the integers,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 24–43, Springer, 2010.

[146] C. Gentry and S. Halevi, “Implementing gentry’s fully-homomorphic encryption scheme,” in Annual international conference on the theory and applications of cryptographic techniques, pp. 129–148, Springer, 2011.

[147] Z. Brakerski and V. Vaikuntanathan, “Efficient fully homomorphic encryption from (standard) lwe,” SIAM Journal on Computing, vol. 43, no. 2, pp. 831–871, 2014.

[148] Z. Brakerski, C. Gentry, and V. Vaikuntanathan, “(leveled) fully homomorphic encryp- tion without bootstrapping,” ACM Transactions on Computation Theory (TOCT), vol. 6, no. 3, p. 13, 2014.

[149] N. P. Smart and F. Vercauteren, “Fully homomorphic encryption with relatively small key and ciphertext sizes,” in International Workshop on Public Key Cryptography, pp. 420–443, Springer, 2010.

[150] D. Stehlé and R. Steinfeld, “Faster fully homomorphic encryption,” in International Conference on the Theory and Application of Cryptology and Information Security, pp. 377–394, Springer, 2010.

[151] A. Acar, H. Aksu, A. S. Uluagac, and M. Conti, “A survey on homomorphic encryption schemes: Theory and implementation,” ACM Computing Surveys (CSUR), vol. 51, no. 4, p. 79, 2018.

138 Bibliography

[152] F. Armknecht, C. Boyd, C. Carr, K. Gjøsteen, A. Jäschke, C. A. Reuter, and M. Strand, “A guide to fully homomorphic encryption.,” IACR Cryptol. ePrint Arch., vol. 2015, p. 1192, 2015.

[153] N. Jacobson, “Basic algebra i. basic algebra,” 2009.

[154] P. M. Cohn, Classic algebra. Wiley, 2000.

[155] D. Hildenbrand, Introduction to geometric algebra computing. Chapman & Hall/CRC, 2018.

[156] D. Hildenbrand, “Foundations of geometric algebra computing,” in AIP Conference Proceedings, vol. 1479, pp. 27–30, American Institute of Physics, 2012.

[157] L. Dorst, D. Fontijne, and S. Mann, Geometric algebra for computer science: an object-oriented approach to geometry. Elsevier, 2010.

[158] W. E. Baylis, Clifford (Geometric) Algebras: with applications to physics, mathematics, and engineering. Springer Science & Business Media, 2012.

[159] S. Davidson, P. De Filippi, and J. Potts, “Blockchains and the economic institutions of capitalism,” Journal of Institutional Economics, vol. 14, no. 4, 2017.

[160] M. Sipser, Introduction to the Theory of Computation. Cengage learning, 2012.

[161] T.-Y. Chung, “Incomplete contracts, specific investments, and risk sharing,” The Review of Economic Studies, vol. 58, no. 5, pp. 1031–1042, 1991.

[162] W. P. Rogerson, “Contractual solutions to the hold-up problem,” The Review of Economic Studies, vol. 59, no. 4, pp. 777–793, 1992.

[163] P. Aghion, M. Dewatripont, and P. Rey, “Renegotiation design with unverifiable information,” Econometrica: Journal of the Econometric Society, pp. 257–282, 1994.

[164] G. Nöldeke and K. M. Schmidt, “Option contracts and renegotiation: a solution to the hold-up problem,” The RAND Journal of Economics, pp. 163–179, 1995.

139 Bibliography

[165] J. Landin, An introduction to algebraic structures. Courier Corporation, 2012.

[166] M. Josipovic, “Geometric multiplication of vectors,” 2019.

[167] A. Rosén, Geometric multivector analysis. Springer, 2019.

[168] C. Paar and J. Pelzl, Understanding cryptography: a textbook for students and practi- tioners. Springer Science & Business Media, 2009.

140 APPENDIX A

Setup And Configuration Guides

The prototype walk through available in chapter 5 was developed and run using a MacBook Pro, with a 2.7 GHz Quad-Core Intel Core i7 processor, and 16 GB of memory. We ran over the macOS Catalina operating system version 10.15.7. It is important to mention that even the same tools and frameworks of a specific version may have different behaviors when executed in different operating systems (e.g., Microsoft Windows, Linux, etc).

A.1 Prerequisites

In this section we list trustworthy sources for the installation of the prerequisites for the execution of our prototype.

A.1.1 Git

In the case of already having the tool, you can update and install the latest version. To make Git available on your machine you can follow the steps described in: https://github.com/git-guides/install-git

A.1.2 Fabric Samples Repository

Open a new terminal window and go to a directory where the application files can be downloaded. Once there, the fabric-samples repo can be cloned (i.e., downloaded) through the command:

git clone "https://github.com/hanesbarbosa/fabric-samples" A.2. Setup

A.1.3 Golang

To install the Go language, follow the instruction listed in: https://golang.org/dl/ Make sure to also execute instructions related to environment variables or specific configurations for your operating system.

A.1.4 Docker And Docker Compose

In order to install the Docker platform you need to download and follow the steps given by the options listed in: https://docs.docker.com/get-docker/

A.2 Setup

Considering that the prerequisites were installed following the information available in A.1, you can proceed to prepare your environment to the execution of the prototype.

Change the working directory to the chaincode-docker-devmode folder inside the fabric- samples repository by issuing the command:

cd ./ fabric - samples / chaincode - docker - devmode

In the chaincode-docker-devmode folder there is the docker-compose-simple.yaml file defining which configurations will be used when instantiating containers. The Docker images defined in

this file were previously compiled with the Hyperledger Fabric binaries and phe-cli command, our CLI that uses the homomorphic encryption library. These containers will behave like individual machines in a network, abstracting what would be a user interacting with a smart contract in a terminal that has access to a blockchain network. The Docker containers representing a simple fabric network can be downloaded and started by the command:

docker - compose -f docker - compose - simple . yaml up -d

The output for the docker-compose command in figure A.1 shows that the download and setup of containers was finished.

142 A.2. Setup

Figure A.1: Docker compose command output

As soon as the containers are started, you can check that four of them are running, named cli, chaincode, peer and orderer. You can verify which containers are up by executing the command:

docker ps

The output for the command docker ps can be seen in figure A.2.

Figure A.2: Docker ps command output

If any of the aforementioned containers are missing, restart the process by stopping all running containers and starting over through the previous docker-compose command. You can stop all the containers by issuing:

docker - compose -f docker - compose - simple . yaml down --remove - orphans

143 A.2. Setup

If the error persists and not all the containers are running after stopping and restarting the Docker network, then remove the remaining Docker images and start the setup once again. To remove all the Docker images, you can use the command:

docker image rm -f $( docker image ls -q)

A.2.1 Preparing The Smart Contract

The chaincode will be started manually in the chaincode container. First, connect the terminal to the container using the following command:

docker exec -it chaincode sh

After that, from inside the container, change the working directory to the contract

-tutorial directory and compile the smart contract along with any dependencies. The directories in the image are mapped to volumes on the operating system the Docker containers are running upon. Use the command below to execute the procedure:

cd contract - tutorial && go mod vendor && go build

Run the binary contract-tutorial by executing:

CORE_CHAINCODE_ID_NAME = mycc :0 CORE_PEER_TLS_ENABLED = false ./ contract - tutorial -peer . address peer :7052

A.2.2 Installing And Instantiating The Smart Contract

Now, we can open another terminal window and connect to the cli container. It is important to make sure that the working directory is still the fabric-samples/chaincode-docker-devmode repository. Connect to the Docker container by:

docker exec -it cli sh

This specific terminal, which is connected to the cli container, is the interface where the user can interact with the smart contract. Also, this image was modified to include the phe-cli command. You can check its availability by executing:

which phe - cli

144 A.2. Setup

If the phe-cli program was previously installed, the which command shows where the command line system is located. The next step is to install the chaincode by the command:

peer chaincode install -p chaincodedev / chaincode / contract - tutorial -n mycc -v 0

Now the chaincode can be instantiated. We can instantiate it by passing empty arguments without defining any specific function of the contract. Therefore, execute:

peer chaincode instantiate -n mycc -v 0 -c '{" Args ":[]}'-C myc

Once those steps are successfully executed, functions can be invoked from the smart contract.

145