<<

DEGREE PROJECT, IN TIDAB , FIRST LEVEL STOCKHOLM, SWEDEN 2015

SecuRES: Secure Resource Sharing System

AN INVESTIGATION INTO USE OF PUBLIC LEDGER TECHNOLOGY TO CREATE DECENTRALIZED DIGITAL RESOURCE-SHARING SYSTEMS

DANIEL SVENSSON AND PHILIP LEUNG

KTH ROYAL INSTITUTE OF TECHNOLOGY

ICT TRITA TRITA-ICT-EX-2015:157

www.kth.se Abstract

The project aims at solving the problem of non-repudiation, integrity and confidentiality of data when digitally exchanging sensitive resources between parties that need to be able to trust each other without the need for a trusted third party. This is done in the framework of answering to what extent digital resources can be shared securely in a decentralized public ledger-based system compared to trust-based alternatives. A background of existing resource sharing solutions is explored which shows an abundance third party trust-based systems, but also an inter- est in public ledger solutions in the form of the Storj network which uses such technology, but focuses on storage rather than sharing. The proposed solution, called SecuRES, is a communication protocol based on public ledger technology which acts similar to Bitcoin. A prototype based on the protocol has been implemented which proves the ability to share encrypted files with one or several recipients through a decentral- ized public ledger-based network. It was concluded that the SecuRES solution could do away with the requirement of trust in third parties for all but some optional op- erations using external authentication services. This is done while still maintaining data integrity of a similar or greater degree to trust-based solutions and offers the additional benefits of non-repudiation, high con- fidentiality and high transparency from the ability to make source code and protocol documentation openly available without endangering the system. Further research is needed to investigate whether the system can scale up for widespread adoption while maintaining security and rea- sonable performance requirements.

Keywords Public ledger, Blockchain, Bitcoin, Non-repudiation, Trust, Secure trans- actions, Resource sharing, Decentralisation, Elliptic curve cryptography, Integrity, Confidentiality. Abstract

Projektet ämnar lösa problemen med oförnekbarhet, integritet och kon- fidentialitet när man delar känsligt data mellan parter som behöver lita på varandra utan inblanding av betrodd tredje part. Detta diskuteras för att besvara till vilken omfattning digitala resurser kan delas säk- ert i ett decentraliserat system baserat på publika liggare jämfört med existerande tillitsbaserade alternativ. En undersökning av nuvarande resursdelningslösningar visar att det existerar många tillitsbaserade system men även en växande andel lös- ningar baserade på publika liggare. En intressant lösning som lyfts fram är Storj som använder sådan teknologi men fokuserar på resurslagring mer är delning. Projektets föreslagna lösning, kallad SecuRES, är ett kommunika- tionsprotokoll baserat på en publik liggare likt Bitcoin. En prototyp baserad på protokollet har tagits fram som visar att det är möjligt att dela krypterade filer med en eller flera mottagare genom ett decentralis- erat nätverk baserat på publika liggare. Slutsatsen som dras är att SecuRES klarar sig utan betrodda tredje parter för att dela resurser medan vissa operationer kan göras mer användarvänliga genom externa autentiseringstjänster. Själva lösnin- gen garanterar integritet av data och medför ytterligare fördelar såsom oförnekbarhet, konfidentialitet och hög transparens då man kan göra källkoden och protocoldokumentation fritt läsbar utan att utsätta sys- temet för fara. Vidare forskning behövs för att undersöka om systemet kan skalas upp för allmän användning och alltjämt bibehålla säkerhets- samt pre- standakrav.

Nyckelord Publik liggare, Blockkedja, Bitcoin, Oförnekbarhet, Tillit, Säkra transak- tioner, Resursdelning, Decentralisering, Elliptic curve cryptografi, In- tegritet, Konfidentialitet. Glossary

Bitcoin A collective term for the entire network, currency and technology behind it.. 12 bitcoin The denomination of the currency. bitcoins can be further divided into millibitcoins etcetera down to the smallest denomination which is a satoshi. 11, 12, 50 block A collection of transactions. Part of the blockchain which constitutes the public ledger containing all the verified transactions in the network. 12 blockchain The public ledger that contains all the verified transactions in the network.. 12 business logic The model of a system where calculations and manipulation of data occurs. 39, 40 confidentiality Confidentiality means that only those that are supposed to be able to read something are able to do so. 31

DAO Access Object. 67

DMZ De-militarized zone. 27

DTO Data Transfer Object, A data container without logic. 44, 67

ECC Elliptic Curve Cryptography. 13, 32, 33, 68

Git A distributed version control system. 47

GUI Graphical User Interface. 69

IDS Intrusion Detection System. 27

JPA Java Persistence API. 44, 45, 67

JUnit Testing framework for Java. 44 JVM Java Virtual Machine, the virtual machine constituting the environment wherein java code execute. 44

LATEX Typesetting system. 47 markdown Markup language for typesetting text with the possibility to export to many different formats. 47

MySQL Popular relational database. 68 non-functional requirements Requirements that describe not what feature to implement but rather how it should perform. 46

OpenPGP Encryption standard for email. 28

P2P Peer-To-Peer. 17

POM Project Object Model. 45 public interface The accessors controlling access to entities within a class. Enti- ties prefixed by modifiers such as public and protected are part of this interface. 39

RIPEMD160 RACE Integrity Primitives Evaluation Message Digest which pro- duce 160-bit output. 34

RSA Public-key encryption algorithm. 32, 33, 66–68

Scrum Iterative agile software development methodology. 8, 45, 46 secp256k1 Standard defining a specific elliptic curve and mathematical constants. 32, 33

SecuRES Secure Resource sharing protocol. 8, 49–56, 66–68, 71–73, 76–79

SHA256 Secure Hashing Algorithm that produce 256-bit output. 19, 21, 34

SPV Simple Payment Verification. Does not depend on the entire blockchain. 17, 18, 23, 65, 78

UML Unified Modeling Language. 41

UTXO Unspent Transaction Output. 15–18 Preface

The authors would like to thank:

• Sead Muftic, our examiner, for giving us this opportunity

• Nazri Abdullah for acting as discussion partner for our designs

• Christian Gotare, our corporate supervisor, for his experienced insights

• Anders Sjögren, our head of programme, for always taking the time to answer our questions

• Our loved ones for their patience with us while we were absent during this project Contents

1 Introduction 1 1.1 Background ...... 1 1.1.1 Sharing a sensitive Contract ...... 1 1.1.2 Secure Email ...... 2 1.1.3 Cloud Storage ...... 2 1.1.4 Public Ledgers ...... 2 1.2 Problem ...... 2 1.3 Purpose ...... 4 1.4 Goal ...... 5 1.4.1 Expected Deliverables ...... 5 1.4.2 Benefits ...... 6 1.4.3 Ethics ...... 6 1.4.4 Sustainability ...... 7 1.5 Methodology Overview ...... 7 1.5.1 Feasibility Phase ...... 7 1.5.2 Design and Implementation Phases ...... 7 1.6 Delimitations ...... 8 1.7 Outline ...... 8 1.8 Contributions ...... 9

2 Theoretical Background 11 2.1 Bitcoin ...... 11 2.1.1 Overview, [1, p.1-2] ...... 11 2.1.2 Transactions ...... 13 2.1.3 Decentralized Peer-To-Peer Network ...... 17 2.1.4 The Blockchain ...... 19 2.1.5 Alternative Chains, Currencies and Applications ...... 22 2.1.6 Conclusion ...... 23 2.2 Storj ...... 24 2.2.1 Storage ...... 24 2.2.2 Heartbeats ...... 24 2.2.3 Implementation ...... 24 2.2.4 Sharing ...... 25 2.2.5 Ownership Verification ...... 25 2.2.6 Conclusion ...... 25 2.3 Dropbox ...... 25 2.3.1 Product Features ...... 25 2.3.2 Architecture ...... 26 2.3.3 Reliability ...... 26 2.3.4 Security ...... 27 2.3.5 Conclusion ...... 27 2.4 Secure Email ...... 28 2.4.1 Confidentiality ...... 28 2.4.2 Authentication ...... 29 2.4.3 Conclusion ...... 29 2.5 Git ...... 29 2.5.1 Snapshots ...... 29 2.5.2 Branching ...... 30 2.5.3 Conclusion ...... 30 2.6 Cryptography ...... 30 2.6.1 Symmetric Cryptography ...... 31 2.6.2 Asymmetric Cryptography ...... 31 2.6.3 Hashing ...... 33 2.6.4 Digital Signatures ...... 34 2.6.5 Public-Key Certificates ...... 34 2.6.6 Digital Envelopes ...... 34 2.6.7 Conclusion ...... 35 2.7 Summary ...... 35

3 Methodology 37 3.1 Literature Study ...... 37 3.2 Development ...... 38 3.2.1 Analysis ...... 38 3.2.2 Design ...... 38 3.2.3 Coding ...... 42 3.2.4 Implementation ...... 44 3.2.5 Development Methodology ...... 45 3.3 Project management ...... 46 3.3.1 Project ...... 46 3.3.2 Documentation ...... 47 3.3.3 Collaboration ...... 47

4 Solution 49 4.1 Requirements ...... 49 4.1.1 Functional Requirements ...... 49 4.1.2 Security Requirements ...... 50 4.2 Network ...... 50 4.3 Concepts ...... 50 4.3.1 Sharing a File ...... 50 4.3.2 Confidentiality and Integrity ...... 51 4.3.3 File Slice ...... 51 4.3.4 File Crumb ...... 51 4.3.5 Updating a File ...... 51 4.3.6 File Management ...... 52 4.3.7 Slice Verification ...... 52 4.3.8 Access Permissions ...... 53 4.3.9 Branching ...... 53 4.3.10 Splitting ...... 53 4.3.11 Joining ...... 54 4.3.12 Double Spending ...... 54 4.3.13 Transaction Verification ...... 54 4.3.14 Transactions ...... 54 4.3.15 Blockchain ...... 55 4.4 Protocol ...... 56 4.4.1 filedesc ...... 57 4.4.2 singletx ...... 58 4.4.3 putslice ...... 62 4.4.4 slice ...... 62 4.4.5 slicesuperv ...... 64 4.4.6 verifyslice ...... 64 4.4.7 slicestat ...... 65 4.5 Implementation ...... 65 4.5.1 Message-handling ...... 66 4.5.2 Blockchain ...... 66 4.5.3 System ...... 66 4.5.4 Client ...... 66 4.5.5 Mining ...... 67 4.5.6 Utility ...... 67 4.5.7 Peer ...... 67 4.5.8 Storage ...... 67 4.5.9 Integration ...... 67 4.5.10 Not Implemented ...... 68 4.6 Prototype ...... 68 4.6.1 Demo ...... 69

5 Solution Evaluation 71 5.1 Requirements Fulfilment ...... 71 5.1.1 Functional Requirements Fulfilment ...... 71 5.1.2 Security Requirements Fulfilment ...... 72 5.2 Security ...... 72 5.2.1 Potential Security Threats ...... 73 5.3 Performance and Scalability ...... 76

6 Conclusions 77 6.1 Evaluation of SecuRES ...... 77 6.2 Evaluation of Methodology ...... 78 6.3 Future work ...... 79

Bibliography 81

Appendices 83

A Planned demo scenarios 85 A.1 Bitcoin ...... 89 A.1.1 Fundamental Concepts ...... 89

B Bitcoin Appendices 91 B.1 Transaction Verification ...... 91

C SecuRES Protocol 93 C.1 Standards and Concepts ...... 93 C.2 Common Data Structures ...... 96 C.3 Messages ...... 103

D Logic Database Model 113

E Deployment 115

Chapter 1

Introduction

With a recent rise in interest for decentralized cryptocurrencies, following the success of Bitcoin and its derivatives based on an underlying public ledger technology, there has been increased efforts in finding alternative ways for deriving value from such designs [10]. This report is inspired by the premise that the advantages of this technology can bring new value to digital resource sharing systems and aims to investigate the validity of such a solution while attempting to create a proof of concept implementation.

1.1 Background

This project has been commissioned as part of a larger effort to find new ways to share digital documents and will focus on the use of public ledger technology for that purpose.

There are many possible scenarios and settings in which the need for sharing files arise. The following subsections each describe one such possible scenario.

1.1.1 Sharing a sensitive Contract

An example scenario for the current use of secure document sharing could be having a sensitive contract that needs to be signed and shared by two companies residing in different parts of the world. A common approach would be to meet in person and sign paper copies of said contract. This exchange could possibly be mediated by a trusted third party responsible for delivering the contract back and forth amongst the involved parties. Such a third party could be Dropbox, an FTP server or some other file sharing service or possibly physical couriers. Essential concepts are those of data integrity, confidentiality and non-repudiation.

1 CHAPTER 1. INTRODUCTION

1.1.2 Secure Email One of the existing solutions for exchanging the aforementioned sensitive contract is by sending secure emails cryptographically enveloped by using the recipients’ public keys. Secure email ensures that integrity and confidentiality is enforced between sender and recipient so that only the have access to transmitted data.

1.1.3 Cloud Storage Another possible scenario is that of sharing files among friends or co-workers. This is usually achieved by using, once again, third party services, such as Dropbox, that stores data in the cloud for easy access. People are invited to access resources uploaded by others in a swift and efficient manner and data is encrypted in transit to ensure confidentiality between clients and servers.

1.1.4 Public Ledgers A common denominator in the aforementioned scenarios is that they often require trust in third party entities and that the services themselves often rely on centralized servers. Public ledger based systems are mainly used as a means to remove this need for trust and distribute responsibility for maintaining a service across an entire network instead of focusing that effort on central servers. The public ledger itself allows for non-repudiation which means that events occurring in the system can not be denied as a common consensus is formed around the state of the ledger. Perhaps the most well known system making use of a public ledger is the de- centralized digital currency known as Bitcoin. This system enables financial trans- actions without the need for a bank by using a public ledger distributed across a peer-to-peer network, in combination with public-key cryptography. All transac- tions occurring in the Bitcoin network are registered in a blockchain, the state of which is eventually agreed upon by all nodes in the network through independent validation. The concept of using a public ledger has given rise to many new ideas and devel- opers are working to enable file sharing, voting systems, secure email transactions etc. using such technology [33]. This project will focus on how a file sharing system can be implemented using the decentralized consensus system that the blockchain provides.

1.2 Problem

When analysing the first scenario described in section 1.1.1 there are several prob- lems that can be identified:

1 ) The parties need to physically meet at potentially substantial financial costs.

2 1.2. PROBLEM

2 ) There is a risk of having to trust a third party and the security of the communication-channels involved with this sensitive information and how to ensure confidentiality.

3 ) One of the parties might claim to have never received the contract after the exchange. This could possibly lead to costly legal processes, hampering future cooperation between the parties, since there is no non-repudiation.

4 ) Physical objects also have a tendency to get lost over time. If both parties loose their copies of a contract there is no concrete basis to support references to the agreement. The potential solution to some of the problems above, mentioned in section 1.1.2, also poses several challenges: 1 ) Secure email limits the lifespan of the data to each user, as they no longer have access to it after it has been encrypted and sent. Unless a copy is also sent back to the sender.

2 ) Each recipient is responsible for storing the information over time, which means that availability might become an issue.

3 ) There is no definitive confirmation that each party received a copy of the mail.

4 ) Most often secure email also requires third party trust, since it is common for users to rely on external, instead of private, mail services.

Finally, the scenario described in section 1.1.3, using third party services for sharing files, also comes with its fair share of issues: 1 ) Most notably, this scenario requires extensive trust in third parties. Trust, that providers will continue to offer the service, deliver content as expected, not take resources as hostage, prevent government seizure etc.

2 ) Confidentiality is not ensured. Data in transit between clients and servers is encrypted, but the third party has clear access to the content which raises privacy issues.

3 ) Integrity is not ensured. There is nothing preventing the third party from subtly modifying data when serving it to recipients.

4 ) The central servers used by the service provider represents single points of failure that disrupt the service if attacked and brought down. Redundant servers help alleviate this problem but only to a certain extent. It is possible to ensure confidentiality and integrity by encrypting data prior to upload to the service providers’ servers. However, this requires an additional step in the process and is the responsibility of each user who might not be knowledgeable

3 CHAPTER 1. INTRODUCTION enough to enforce this. Furthermore, there is no way to remove the use of central servers or third party trust. Users are at the mercy of the service providers in this aspect.

As has been seen in recent high profile[24] hacker attacks on popular services and organizations personal data stored on servers used by such providers are constantly at risk of being stolen or destroyed. There is also the possibility for forced government seizure or back doors that could threaten privacy. This means that third party trust should by no means be taken lightly, nor the fact that all data is stored on central servers. If one removes the need for third party trust, several new problems arise that need to be addressed. First of all there is the issue of availability. Dropbox, for instance, guarantees a certain degree of service availability by implementing backups and redundancy in their systems, see section 2.3 for further details. In decentralized systems this is not the case since there are no central servers to query for data. In a decentralized system the responsibility for availability instead falls on users of the system. This means that users have to supply file storage space as well as serve meta data describing events in the network. Another issue, is how to introduce trust. For a system to work there must be some form of trust since future reactions build on previous actions, which have to be validated. The validity of such actions can only be ensured by trusting data supplied by the decentralized system. However, trust must come from independent validation of data and not by trusting individual data providers. Furthermore, users of the system must be able to form a common consensus around what previous actions have occurred. It is not enough to verify actions independently, everyone have to agree on the history otherwise they will all form separate futures. Finally, the implementation itself would have to be open source to allow for users to verify the functionality of the program and not blindly trust a supplier. Several issues with current ways of sharing files have been outlined and the question is whether or not they can be solved by using a decentralized consensus mechanism combined with cryptography. That is:

To what extent can digital resources be shared securely in a decentralized public ledger-based system compared to trust-based alternatives?

1.3 Purpose

The purpose with this project is to examine and attempt to offer a solution to the issues and problem stated in section 1.2, by using a decentralized consensus system in the form of a blockchain, combined with cryptography. By evaluating and combining ideas from different technologies and concepts in regards to file sharing, the project aims to explore the possibilities for file sharing within the confines of such a decentralized system.

4 1.4. GOAL

The project will evaluate whether or not it is possible to ensure confidentiality when sharing files as well as whether or not non-repudiation can be incorporated when sharing sensitive documents, such as contracts. For the scope of the project the purpose is not to provide a complete implemen- tation, but rather a proof of concept that can be extended in the future to solve the identified issues related to file sharing.

1.4 Goal

By the end of the project the goal is to present a feasibility evaluation of, and create a rudimentary proof-of-concept for, a resource-sharing system based on public ledger technology that can handle common point A to B file-sharing operations whilst enforcing the requirements of non-repudiation, data integrity, authenticity, confidentiality and a non-reliance on trust of third parties in relation to the problem statement in section 1.2.

1.4.1 Expected Deliverables In order to reach the specified goals the following deliverables are expected:

Protocol A concept and protocol capable of sharing files while dealing with issues of confi- dentiality, integrity, authenticity, non-repudiation and that does not require trust in a third party. The protocol itself is expected to combine the concept of a de- centralised consensus system delivered over a peer-to-peer network, cryptography and concepts from other solutions for sharing files such as Dropbox, Git and secure email. A blockchain similar to that in Bitcoin is expected to be an integral part of the protocol. This blockchain is also expected to be distributed across the entire network and independently validated by all peers.

Prototype A rudimentary proof-of-concept implementation of a file sharing platform based on the aforementioned protocol. This prototype will not be a complete system, but rather an implementation of key features of such a platform.

Report An investigation into the feasibility of using public ledger technology to solve the problem of non-repudiation, integrity and confidentiality of data when sharing and exchanging sensitive resources digitally between parties that need to be able to trust each other without the need for a trusted third party.

5 CHAPTER 1. INTRODUCTION

1.4.2 Benefits

The beneficiaries of this technology can potentially include any organizations, com- panies or individuals with a need to exchange critical information in a way where security, non-repudiation, confidentiality, integrity and the persistent storage of data are of importance. Additionally, the fact that the service can potentially be set up and used by anybody means that there is no central point of attack and civil rights groups, which are under pressure by big organizations or by oppressive states, can operate without fear of having the service completely shut down.

1.4.3 Ethics

Providing a secure file-sharing service comes with a number of ethical responsibilities and potential issues. Simply, the act of marketing the system as secure creates an implicit expectation of integrity and stability which may cause users to share information, which they otherwise would not, over systems which had not made such claims of security. Similarly, the fact that the proposed system can be used as a means of persistent storage means that users may upload documents while trusting the system enough not to bother with keeping backups, leading to the file being permanently destroyed should the system fail. Importantly, the currently proposed storage solution makes no guarantees that a file will always be accessible though it does attempt to maintain a certain redundancy of storage while taking steps to ensure that each can deliver that which they claim to be able to. On the opposite side of the issue the distributed and peer-based nature of the system makes it impossible to guarantee that files, once out in the network, can be completely removed. Though several steps have been taken to protect uploaded files, this becomes a big issue if these security levels are ever breached. A final issue relating to trust is that of the claim of providing non-repudiation of transactions. This comes as a benefit from the use of the public ledger, but if some loophole is found in non-repudiation, unbeknownst to the majority of users, a malicious party may use the chain as a way to lie about their own transactions or use it to falsely accuse others of wrongdoings. All these factors vastly increase the responsibility to create a system that de- livers all that is expected of it and clearly presents users with information to bring user expectations in line with reality and ensures that potential risks are handled properly. Even a perfect service which performs all that it is supposed to do have the problem of users being able to share illicit material with very low risk due to the level of anonymity provided. This is an issue which is difficult to handle without compromising quality in other parts of the system.

6 1.5. METHODOLOGY OVERVIEW

1.4.4 Sustainability Since the public ledger records all transactions performed in the network, it is expected to grow very large over time. A prime example of this is the Bitcoin blockchain which currently takes up over 30 GB of storage space [7] This could potentially end up being a sustainability issue if the benefits of using the service become less than the value of sacrificed space for storing it. As for increases in the amount of nodes and users in the network the decentral- ized nature of the system should make it less susceptible to bottlenecks compared to traditional server-based secure resource-sharing systems. Additionally, the open nature of the protocol makes it possible for users to come to a consensus on decisions requiring a unanimously adopted change in the system, called "hard forks", which would then allow the majority to adapt the system to make it more sustainable for the future. In terms of environmental sustainability the processing power needed to secure the blockchain and prevent fraudulent behaviour could be considered wasted power in that the computations are used only as proof-of-work. A way of minimizing the waste would be to perform calculations with scientific or societal value such as finding large prime numbers as the proof-of-work but if it is not random enough then the integrity of the chain may suffer and lower power miners may not have a fair shot of mining a block. Other popular solutions considered by the cryptocurrency community includes the use of proof-of-stake or proof-of-burn which would involve connecting personal investment or monetary value to the system but as it would be quite impossible to accurately assess the value of transactions in the network it makes little sense to use such a system.

1.5 Methodology Overview

This section will briefly introduce the methodology used in the different phases of the project.

1.5.1 Feasibility Phase This was the initial phase of the project in which a literature study was performed to summarize relevant information about different technologies and solutions currently implemented in the problem domain. Such a literature study would serve to provide the foundation on which to build the protocol that is expected to be one of the deliverables for the project.

1.5.2 Design and Implementation Phases During this phase of the project the protocol created during the design phase was implemented into code resulting in a prototype platform as a proof-of-concept.

7 CHAPTER 1. INTRODUCTION

An agile work methodology was applied to ensure incremental and rapid develop- ment. Parts of Scrum were adapted into the project development methodology. For designing the implementation, principles from object oriented design were applied. To ensure robustness in the code unit testing was performed to the extent pos- sible.

1.6 Delimitations

There are many active areas of research regarding blockchain technology, however this project will only focus on the suitability of such technology as a file sharing platform. The project investigates the concepts behind Bitcoin and whether or not they can be applied for sharing files. However, the economics aspect of Bitcoin will not be investigated. In Bitcoin the concept of mining and proof-of-work, see section 2.1.4, is what helps secure the blockchain in this decentralised consensus system. Mining requires processing power which is expensive. That is the reason why an incentive is neces- sary to motivate users of the system to participate in this mining process. In the case of Bitcoin the solution is a financial reward in the form of bitcoins. This thesis will not investigate the aspects of such an incentive mechanism in any great detail. Rather the thesis will focus on other aspects of the underlying concepts behind bitcoin. As the suggested 10-week duration for completing a bachelor’s thesis project does not lend itself to the rigorous testing, quality assurance and professional scrutiny needed to create a suitably secure system to handle confidential information the delivered code should be treated as a proof of concept which should not be used with any presumption of security. Similarly, the time-limit does not allow for thorough performance-optimization of code and as such the system delivered is not expected to be able to handle the effective loads which it may be put under during intended use. Any analysis regarding this aspect of the project may thus come to be proven less accurate.

1.7 Outline

Chapter 2: Theoretical Background Gives a background of current technolo- gies for sharing files as well as a detailed description of Bitcoin for the rest of the report to build upon.

Chapter 3: Methodology Presents the methodology used in the research, design and implementation of the system.

Chapter 4: Solution Presents the solution which is the SecuRES protocol and prototype implementation.

8 1.8. CONTRIBUTIONS

Chapter 5: Solution Evaluation This chapter evaluates the provided solution.

Chapter 6: Conclusions Summarizes and discusses the findings in the report.

1.8 Contributions

Though there has been significant overlap the main contributions for each author are generally as follows.

Philip Leung Protocol design, network and peer handling, blockchain implementation, mining, message expectation handling, handling logic triggered by messages, report

Daniel Svensson Protocol design, client functionality, encryption, protocol implementation, literature study, database design and implementation, integration layer, setting up working environment, report

9

Chapter 2

Theoretical Background

The purpose with this chapter is to give the reader a thorough background on the diversity of current technologies for sharing files. The underlying concepts of Bitcoin that are relevant to this project will also be outlined. Their structure and meaning will be explained as they serve as the potential building blocks for the solution provided by this project. Each section will be followed by a short summary explaining what parts might be relevant for this project.

2.1 Bitcoin

This section focuses on Bitcoin and describes this system in greater detail than any other technology in this chapter. The reason for this is that the concepts behind Bitcoin are the main focus of the report.

2.1.1 Overview, [1, p.1-2] Bitcoin is a system that provides an ecosystem for bitcoins, the digital currency with the same name. It was invented by someone or a group of people using the pseudonym Satoshi Nakamoto and first launched in 2009. Using the Bitcoin network it is possible to transfer value, represented digitally by bitcoins, between members in this network. Users communicate using an open source bitcoin protocol which can be run on multiple platforms ranging from smartphones to laptops allowing for great acces- sibility. Transactions are issued that transfer value from one address to another, much in the same way that money in other currencies move from one account to another. In fact what is possible with other currencies is generally also possible with bitcoin. One major difference between bitcoins and traditional currencies is the fact that it is entirely virtual and has no physical counterpart. “Coins” themselves are implied in the value transferred in transactions and move from sender to recipient.

11 CHAPTER 2. THEORETICAL BACKGROUND

As mentioned earlier, these transactions are set to different addresses which are hashes of keys, keys that uniquely identify users and allow them to prove ownership of outputs in transactions and thereby value in the network. These keys are often stored in the equivalence of wallets on each user’s computer and they are the only thing necessary to unlock transactions. Bitcoin is a distributed peer-to-peer system which means that there is no central server validating and approving transactions. Peers in the network perform this validation independently and achieve a common consensus by using a public ledger where all transactions are logged. This public ledger is distributed throughout the entire network and contains all information about performed transactions, thus alleviating the need for a trusted third party, such as a bank. Currency-issuance is dictated by the network itself and is set to increase the number of available bitcoins at a steady pace to a final limit of 21 million coins total by the year 2140. This makes it impossible to inflate the currency by issuing new money beyond the expected rate. New bitcoins are generated through a process called mining which involves com- peting to solve mathematical problems while verifying newly issued transactions. Any user on the network can participate in this mining process and approximately every 10 minutes new coins are issued due to that mining process, and awarded to those that helped solve the problems. The mining process is regulated across the network by built in algorithms. At its core the problem to be solved consists of hashing data to give a value be- low a certain threshold where a lower threshold requirement corresponds to higher difficulty. Data being hashed are new transactions in the network along with meta- information and the process results in a block which is a collection of all the transac- tions that were hashed in the process. The block is then appended to the blockchain which constitutes the public ledger of the network. The hash of the block header constitutes the proof-of-work that a certain amount of effort has been put into solving the mathematical problem. The solution proposed by Satoshi Nakamoto, provides a practical way to solve the Byzantine General’s Problem which is a well known problem within distributed computing. The problem is that of exchanging information over an unreliable and possibly compromised network, on which decisions need to be made. The idea is that the network can agree on a general consensus based on the proof-of-work for work performed by miner nodes in the network. For this project what is interesting is not the currency itself, but the underlying technology that makes all the aforementioned possible, and provides a solution to the Byzantine General’s problem.

The Bitcoin System

What constitutes the inner workings of Bitcoin can be summarized in the form of the following list, [1, p.3]

12 2.1. BITCOIN

• Decentralized peer-to-peer network (the bitcoin protocol).

• A public ledger (the blockchain).

• Decentralized mathematical and deterministic currency issuance (the distributed mining).

• Decentralised transaction verification system.

The rest of this section describes this list and the concepts mentioned in further detail.

2.1.2 Transactions This section discusses some of the concepts surrounding transactions in the bitcoin network. For those that are not familiar with bitcoin concepts it might be beneficial to have a look at appendix A.1.1. Before discussing transactions directly it is necessary first, to cover some other concepts.

Keys Bitcoin uses cryptographic keys as the addressees for transactions. Bitcoin makes use of ECC, described in more detail in section 2.6, along with other cryptographic concepts. Keys are generated and stored in a users’ private wallet and are all part of a cryptographic key pair where each pair comprises a public and a private key, which are both unique. The purpose of the public key is generally to provide a target address for trans- actions as well as to verify signatures created by the corresponding private key. The private key, however, serves another purpose. Each transaction is signed by the private key of the creator, and that signature is supplied with the transaction as well as the corresponding public key of the creator. In this way everyone in the network can verify that the transaction has not been tampered with and that the owner of the transaction has the right to spend the bitcoins they are incidentally spending in the transaction. As the names suggest the public key may be openly shared with others but the private key should be kept private at all times. If someone gets a hold of both keys in a pair they will have access to all transactions currently linked to that key pair and thereby all the bitcoins they represent, [1, p.61].

Addresses Addresses represent recipients in bitcoin transactions and they usually represent the owner of a public/private key pair. But they can be something else entirely, [1, p.70].

13 CHAPTER 2. THEORETICAL BACKGROUND

If the address represents a key pair, it is derived by hashing the public key in that pair. In section 2.6 hashing is described in further detail.

Double Spending Double spending is the act of spending the same bitcoins multiple times when the first transaction should be transferring ownership over to the recipient. As part of the validation process for a transaction it is verified that no double spending occurs.

Transaction Details A transaction message in the bitcoin protocol contains all the information necessary for transferring money from one party to another. Assuming that the message is well-formed and that the creator can prove ownership of the value he or she is transferring, it is propagated throughout the bitcoin network. It does not matter how a transaction reaches the bitcoin network as, once in the network, it will be verified and rejected if not valid. This means that it can be transferred in any way imaginable as long as it ultimately reaches the bitcoin network for verification, propagation and inclusion in the blockchain, [1, p.110]. Nodes will independently validate transactions and propagate them to the rest of the network if they are valid. Otherwise they are rejected. The validation process is described in detail in B.1.

Transaction Structure The transaction itself is a datastructure that consists of several fields of data describing how value is transferred from a source of funds (inputs) to destinations (outputs).

Table 2.1. Transaction structure

Size (byte) Type Description

4 Version Specifies which rules this transaction follows 1–9 Input Counter Number of included inputs Variable Inputs One or more transaction inputs 1–9 Output Counter Number of included outputs Variable Outputs One or more transaction outputs 4 Locktime A Unix timestamp or block number

Table 2.1 shows the structure of a transaction message from the bitcoin protocol. As seen there are inputs and outputs as well as a locktime. The locktime specifies when the transaction should be included in the blockchain. Either after a certain timestamp or at a specific blockheight, [1, p.111].

14 2.1. BITCOIN

Table 2.2. Transaction output Structure

Size (byte) Type Description

8 Amount Bitcoin value in satoshis 1-9 Locking-Script Locking-Script length in bytes, to fol- Size low Variable Locking-Script A script defining the conditions needed to spend the output

Transaction Outputs All outputs in a transaction are classified as unspent transaction outputs, UTXO, until they are referenced by inputs in another transaction. UTXO are indivisible amounts of value of bitcoin that only the correct key can unlock and spend. If the amount in the UTXO is greater than what should be transferred, the difference will be returned back to the creator of the transaction. This means that value moves from owner to owner via inputs and outputs through a chain of transactions. An exception to this chain is coinbase transactions which have no input and are created during the mining process. These transactions are what creates new value in the bitcoin network, see section 2.1.4 and mining. The total balance for a bitcoin user is not stored in one place, but exists as UTXO locked to the keys currently in that user’s possession. The structure of transaction outputs is detailed in table 2.2. The Locking script or encumbrance specifies what requirements must be met to be able to spend the output, [1, p.114]. Usually what is required is to provide the private key matching the public key to which the output is set out.

15 CHAPTER 2. THEORETICAL BACKGROUND

Table 2.3. Transaction Input

Size (byte) Field Description

32 Transaction Hash Pointer to the transaction containing the UTXO to be spent 4 Output Index The index number of the UTXO to be spent. 0 indexed 1-9 Unlocking-Script Unlocking-Script length Size Variable Unlocking-Script A script that fulfils the conditions of the UTXO locking script. 4 Sequence Number Currently disabled, set to 0xFFFFFFFF

Transaction Inputs Inputs in transactions are pointers to UTXO by way of the hash of the transaction (trans- action id) containing the output and index of which output in the transaction that is referenced. [1, p.118] The structure of a transaction input is described in table 2.3. The unlocking script contains instructions to unlock the locking script for the referenced output. Normally this is a signature by a private key proving ownership of the bitcoin address that the output was set to.

Transaction Fees Almost all transactions include transaction fees. The fees go to bitcoin miners as compen- sation and thereby incentive to keep mining and maintaining the security of the network. This also helps reduce spamming transactions or other abuse. The mining process is discussed in further detail in section 2.1.4. The fees are set by market forces within the network and affect how miners prioritize transactions along with other criteria. Transactions without fees might be delayed in processing so that they are not included in the blockchain right away. The fee for a transaction is calculated by subtracting the sum of the outputs from the sum of the inputs. [1, p.119].

Orphan Transactions Sometimes an entire chain of transactions that depend on each other, are created at the same time to satisfy some complex workflow. As they are distributed across the bitcoin network child transactions might reach nodes before their parents which is why each node maintains a pool of orphan transactions. This pool contains transactions without known parents. These transactions are removed from the pool and revalidated as soon as their parents arrive.

16 2.1. BITCOIN

Script Language The locking and unlocking scripts that are set for each transaction output and input re- spectively are written in a Forth-like scripting language. [1, p.121]. Whenever a transaction is validated, the unlocking script with each input is executed along with the corresponding locking script to check that the spending conditions are matched. If the conditions are not fulfilled then the transaction is not validated and will not propagate and be mined into the blockchain. This validation process is performed by every node in the network. The scripting language makes for creating many different contracts with complex con- ditions that have to be met before output can be spent. The most common contract is to lock output to a public key, then only the corresponding private key is capable of unlocking it.

2.1.3 Decentralized Peer-To-Peer Network The bitcoin network is a peer-to-peer network meaning that there is no central control, that is the network has a flat structure. P2P networks are naturally resilient, decentralized and open, [1, p.137]. BitTorrent is a recent example of such a network. The bitcoin network refers to the collection of nodes running the bitcoin P2P protocol. There are other protocols that extend the functionality of the bitcoin network, such as the Stratum protocol which is used by miners and lightweight wallets. This extended bitcoin network consists of the bitcoin P2P protocol, pool-mining protocols, the Stratum protocol and several other protocols connecting other components to the bitcoin system.[1, p.138].

Node types All nodes in the bitcoin P2P network are considered equivalent but they provide different sets of functionality depending on what type of node they are. There are four major func- tionalities supplied by nodes, Wallet, Miner, Full blockchain and Routing. All nodes must include the routing function to be able to participate in the network. Furthermore all nodes validate and propagate transactions and blocks and maintain connections to other peers.

Full nodes contain all four major functionalities. These nodes maintain a complete copy of the current blockchain which means that they can verify any transaction without external resources. This is the only node type that is completely independent in terms of verifying transactions since it does not have to trust any other party in the process.

Others, so called lightweight nodes, only maintain a subset of the blockchain thus veri- fying transactions using SPV, or Simplified Payment Verification. These nodes only down- load block headers from the blockchain without all transactions. Transaction verification is performed by requesting partial views of relevant data in the blockchain from other peers.

Mining nodes compete to create new blocks by solving the proof-of-work algorithm (re- peatedly hashing new transactions until a target hash value is achieved). These nodes are either full or lightweight nodes. If lightweight they participate in pool mining and depend on a pool server as a full node.

Wallet nodes might be full nodes, such as desktop clients, but are increasingly SPV nodes. Wallets keep track of users’ keys and their related UTXO as well as help create new transactions, calculating fees etcetera.

17 CHAPTER 2. THEORETICAL BACKGROUND

There are other specialized node types, but they are not relevant for this project.

Verification of Transactions

Full nodes verify transactions to their height in the blockchain whereas SPV nodes verify transactions to their depth. Blockheight verification starts with the transaction that should be verified and moves back all the way to the genesis block (the first block of the blockchain). Blockdepth verification verifies that a transaction belongs to a particular block via a Merkle path, see section 2.1.4 for details about Merkle paths. The node performing the verification then waits for that transaction to get more confirmations, or reach an increased block depth, as more blocks are appended to the blockchain. The fact that other nodes in the network accepted the block that the particular transaction is a part of and mined more blocks on top of it is sufficient to verify that the transaction was not a double spend, [1, p. 149]. Regarding SPV nodes there are some issues that need to be addressed. It is not possible to persuade a SPV node that a transaction exists in a block when it doesn’t. It is, however, possible to hide existing transactions from such a node, since it relies on its own information for verification, from other nodes. This could be used to stage a denial-of-service attack or a double-spending attack, where the same UTXO is spent twice without the SPV node finding out. This risk is mitigated if the node connects to random nodes thereby reducing the likelihood of them all being malicious. [1, p.149]. Well-connected SPV nodes are considered secure enough. Another issue is that SPV nodes mostly request specific transactions for verification which poses a privacy risk since it is possible to deduce what addresses reside in a node’s wallet based on those transactions. This is the reason why bloom filters were added. These filters allow for fetching a subset of transactions without revealing exactly what addresses are of interest. [1, p.150].

Bloom filters

These filters are used to test for set membership and represent probabilistic data structures. The technical implementation of Bloom filters consist of a set of M hashing functions, each producing values between 1 and N corresponding to indexes in an N-length bit-array, [1, p.151]. Adding data to the filter means hashing it once with each hashing function to get M indices in the bit-array and then set those positions to 1. To check whether or not an element is contained in the set the hashing procedure is repeated and the positions checked for ones. As more elements are added to the set, many positions in the bit-array will contain ones which increases the probability for the contains operation to return true, even if the element itself has never been added to the set. This probabilistic aspect to the filter is what the bitcoin network utilises to improve privacy for SPV nodes since the filter will match more elements than what it actually contains. This means that by initialising Bloom filters on other nodes, SPV nodes can mask what transactions they are actually looking for by requesting everything that matches the filters. The probability can be fine tuned to ensure either greater accuracy or privacy, [1, p.151].

18 2.1. BITCOIN

2.1.4 The Blockchain The blockchain is a type of public ledger where all transactions are registered in a back- linked list of blocks. Each block has precisely one parent, but a block may have many children due to forks. Forks are discussed later in this section. The first ever block in the chain is named the genesis block and it is statically encoded into each Bitcoin Core client, [1, p.162]. This provides a secure root that each new node entering the network can start building on when downloading descendant block data from other peers in the network. Each block is identified by the SHA256 hash of its header, the block hash. The header contains a pointer back to its parent block, which means that the child’s identity changes if the parent id changes due to how hashing functions work. Thus changing a parent means also changing the child and all other descendants, since they are linked back to this block via the chain of previous block pointers. This means that when a block is deep down in the chain a substantial amount of pro- cessing power must be used to modify it, together with all its descendants, which represents the key feature of bitcoin’s security. [1, p.160]. A depth of a few thousand blocks (about a month of time) means that a block is virtually unmodifiable.

Block Structure A block is conceptually a container for transactions that is appended to the blockchain. It consists of a header with metadata and a body consisting of a long list of transactions. An average block header is about 80 bytes in size and each transaction is about 250 bytes. Since each block on average contains more than 500 transactions that means the complete block is more than a 1000 times larger than the block header. [1, p. 160].

19 CHAPTER 2. THEORETICAL BACKGROUND

Table 2.4 shows the structure of a block and table 2.5 shows the structure of a block header.

Table 2.4. Block Structure

Size (byte) Field Description

4 Block Size The size of the block following this field 80 Block Header The header of the block 1-9 Transaction The number of transactions in this Counter block Variable Transactions The transactions recorded in this block

Table 2.5. Block Header Structure

Size (byte) Field Description

4 Version A version number to track soft- ware/protocol upgrades 32 Previous Block A pointer to the hash of the previous Hash block in the chain 32 Merkle Root A hash of the root of the merkle tree of this block’s transactions 4 Timestamp The approximate creation time of this block (seconds from Unix Epoch) 4 Difficulty Target The proof-of-work algorithm difficulty target for this block 4 Nonce A counter used for the proof-of-work algorithm

The difficulty, timestamp and nonce are necessary for the mining process and the Merkle root is the root in a Merkle Tree and summarizes all the transactions that are contained in the block. See the next section for a discussion about the Merkle Tree datastructure.

Merkle Trees Blocks represent all their transactions by using what is known as a Merkle Tree. By use of this data structure nodes are able to quickly verify whether or not a transaction belongs to a block.

20 2.1. BITCOIN

The Merkle tree is a binary hash tree and is used for efficiently summarizing and veri- fying the integrity of large sets of data. [1, p.164]. The tree is constructed by recursively hashing pairs of nodes until there is only one node left, the Merkle root. This is the fingerprint for the entire set of data that was used to create the Merkle tree. In the case of bitcoin each node is hashed twice using the SHA256 algorithm, or double- SHA256. Transactions themselves are not stored in the Merkle tree, only the hashes thereof. The tree is built from the bottom up by hashing transactions in the block, thus creating leaf nodes. Each pair of leafnodes (the right most one is duplicated if there is an odd number of transactions) are concatenated and hashed to create parent nodes. This process is then repeated, with the last node duplicated if there are an odd number of nodes, until it converges to the root node. Due to the nature of binary trees, it is possible to check whether or not an element is included in the tree of size N using at most 2 log2(N) calculations which makes for an efficient algorithm. [1, p.164]. To prove that a transaction is included in a block, it is only necessary to create log2(N) hashes which is called the Merkle path connecting the specific transaction to the Merkle root.[1, p.167]. A Merkle path is much smaller than the size of the complete block with all its transac- tions. This means that by supplying a transaction and its Merkle path it is possible to verify that the transaction is part of a block without downloading complete blocks and possibly the entire blockchain with gigabytes of data, [1, p.170] If the transaction hashed together with its Merkle path results in the Merkle root in the block’s header, that means the transaction is in fact part of that block.

Mining

The purpose with mining is to create new bitcoins, but most of all to protect the network against fraudulent transactions or attempts at double spending. By solving a mathematical problem based on a cryptographic hashing algorithm, miners are rewarded with transaction fees and new bitcoins from a so called coinbase transaction , [1, p. 185]. The Bitcoin solution to the problem is called the proof-of-work. In the mining process a new block is created and the work itself consists of repeatedly hashing the header of the new block and changing a nonce every time until the resulting hash has a numerical value that is below a difficulty target, [1, p.191]. The fact that the hash of the header is below the difficulty target is what constitutes the proof-of-work. As soon as a miner node mines a new block by hashing the block header below the difficulty target that node propagates this block to all its peers who then independently validate the block and all its transactions. If the block is considered valid the block is further propagated by those peers and appended to the blockchain. Fees and new bitcoins are incentives for nodes to perform this mining process, which is essential to the bitcoin network as the miners act as a decentralized security mechanism validating and clearing new transactions [1, p.173]. The blocks are created about every ten minutes and it is not until a transaction is part of a block that it is considered confirmed and that new owners may spend their newly received bitcoins [1, p.173].

21 CHAPTER 2. THEORETICAL BACKGROUND

Blockchain Forks

It is possible for forks in the blockchain to occur. For instance, if many miners mine the next block at approximately the same time their will, for a period of time, be several branches off the mainchain. For this reason nodes in the bitcoin network maintain three different sets of blocks: blocks connected to the main blockchain, blocks that form branches off the main chain (secondary chains) and blocks whose parent is unknown (orphan blocks). Invalid blocks are rejected as soon as they fail validation [1, p.198]. The main blockchain at any time is the chain with the most accumulated proof-of- work. If two chains coincide in terms of accumulated proof-of-work, one of them is chosen randomly. The branches to the blocks on the main chain are kept for future reference, should one of those branches ever pass the mainchain in accumulated proof of work. Whenever a secondary chain passes the main chain it becomes the main blockchain and the blocks in the old main chain are passed to the secondary set. Since all nodes consistently choose the chain with the most accumulated proof-of-work, the network will eventually converge to a consistent state. [1, p.200].

Decentralized Consensus

The main invention of Satoshi Nakamoto is the decentralized mechanism for emergent con- sensus. The consensus is not explicit, since there is no single moment when consensus occurs. Instead it is achieved as the result of the asynchronous interaction of many independent nodes following the same set of rules [1, p.177]. What is agreed upon is the state of the public ledger, the blockchain, dictating who owns what and when, which is verified and created independently by peers in the bitcoin network. This consensus emerges from the interaction of four independent processes across the network:

• Each transaction is independently verified by each full node based on a certain set of criteria.

• New transactions are independently aggregated and mined into new blocks with proof- of-work.

• New blocks are independently verified and assembled into the blockchain.

• The longest blockchain with the most accumulated work through proof-of-work, is chosen independently.

The combination of these mechanisms makes it possible for every single full node to create its own copy of the blockchain, the public authoritative ledger and thus reach common consensus [1, p.177].

2.1.5 Alternative Chains, Currencies and Applications

Bitcoin is an open source project that has spawned many alternative implementations of decentralized currencies. There are also implementations that share the same concepts and building blocks, but apply them to other areas such as domain names [1, p.216].

22 2.1. BITCOIN

Meta Coin Platforms Meta coins and meta chains are software layers that have been implemented on top of the bitcoin protocol to, for instance, implement a currency within a currency or build a complete platform on top of the bitcoin protocol. The purpose with these platforms is to encode extra metadata into bitcoin transactions, thereby extending their functionality. Examples of such meta coins are Colored Coins and Mastercoins [1, p.216].

Alt(ernative) Coins These are alternative decentralized currencies. Some are forks from bitcoin’s source code and others are completely new implementations. These coins differ from bitcoin in three primary areas: • Their monetary policy is different, such as having a different block generation time. Examples of this are Litecoin and Dogecoin which both have shorter block generation times to allow for faster confirmation of transactions. • The consensus and proof-of-work mechanism is different. This would entail having different algorithms for deriving proof-of-work. • They implement specific features such as strong anonymity. An example of this is Darkcoin. An alt coin with a different proof-of-work algorithm is Myriad which uses five different proof-of-work algorithms. The purpose is to make the network more resistant to consensus attacks and ASIC (a special type of hardware that performs better when mining bitcoins than regular CPUs) specialization. [1, p.222] Another example is Primecoin whose proof- of-work is searching for prime numbers and thus perform useful work instead of wasting processing power on solving fictional problems that is the case with bitcoin [1, p.223].

Noncurrency Alt(ernative) Chains These are implementations that share the blockchain design pattern with bitcoin but whose transactions and blocks are not necessarily used as currency. An example of this is Namecoin which is a direct fork of the bitcoin code, but instead of currency it is used to register domain names, file signatures, voting systems and much more [1, p.227].

2.1.6 Conclusion The aspects of bitcoin that are most relevant for this project is its protocol, the blockchain and the decentralized consensus mechanism it offers. It is highly likely that, for this project, the same type of network structure, node types, and all the major concepts and security mechanisms such as mining, incentive and much of the protocol can be reused. The main difference will be that instead of transferring value, files or file meta-information will be the payload. Rather than trying to encode more data into actual bitcoin transactions the authors feel that it is more flexible and gives better cohesion to develop a new protocol inspired from bitcoin. What should be further considered is the use of lightweight clients and SPV since it allows for validating transactions without downloading the complete version of a blockchain, while still considered secure enough. For some types of file sharing (for instance sharing

23 CHAPTER 2. THEORETICAL BACKGROUND non important, non confidential data) it simply does not make sense to download an entire blockchain for the sake of verification because the files themselves are not that important.

2.2 Storj

Storj [38] is a blockchain based platform for anonymously storing files in the cloud. It is currently in beta stage and seems to focus on the storage and retrieval of individual files by a single person without tracking file-modifications.

2.2.1 Storage Physical storage and redundancy is provided by the DriveShare service [36] in which users can donate storage and bandwidth for monetary return in the form of crypto-currencies. These so-called Drive Farmers make a certain amount of bandwidth and hard drive space on their computers, available to the Storj network and offer payment contracts to users who want to store files in that space. The contracts dictate how small fees will be paid out to the space-provider during Heartbeats, see section Heartbeats, in order to incentivise them keeping the space available and data unaltered. There is currently no way to reliably remove files uploaded to the network as Drive Farmers control their own hardware and can just disconnect from the network while keeping the files. There is, however, the possibility of initiating a “soft delete” by halting payments or Heartbeats to the storage provider, as this would break the payment contract and remove the incentive to continue providing the service and most likely prompt an automatic removal of the file to free up the space for new contracts.

Shards Uploaded files are split into encrypted fragments of fixed size called shards which are re- dundantly spread out into the network, so that no complete file is stored at one location. This facilitates storage of larger files and increases security as even if an attacker were able to surpass the encryption of the shard, they would not be able to access the complete file.

2.2.2 Heartbeats In order to make sure that storage providers do not delete or tamper with files which they agreed to host, Storj periodically audits storage nodes by issuing “challenges” called Heartbeats, to which the nodes must respond with information proving that they still have access to the encrypted shards. These audits can be implemented in several different ways. One is having the client pre-generate seeds and using them and the shard to build a Merkle tree, the Merkle root of which is then mined into the blockchain. Proof-of-storage can then be verified by the client issuing challenges to the nodes by sending them seeds which should then be hashed together with the shard and compared to the Merkle tree root [40].

2.2.3 Implementation At the moment the only known operational client running on the Storj platform is Metadisk [37], a prototype web-application which allows users to drag and drop files into the browser

24 2.3. DROPBOX which then get encrypted and uploaded into the cloud. Following a successful upload, a URL containing a hash of the file and private key is returned and can later be used to download and decrypt the file.

2.2.4 Sharing Sharing files with other people in the current version of Metadisk is done by retrieving a link to the resource and having the user send the URL using third-party tools. If a recipient then downloads the file and wants to perform some modifications, this is handled as a new upload with no relation to the original file. This makes collaboration and tracking of file-changes more difficult.

2.2.5 Ownership Verification The current use for the blockchain by Metadisk is as a means of proving ownership of a file. When a file is uploaded on the network some metadata such as the hash of the encrypted file and storage locations are stored in the chain, open for anybody to see who has a copy of the blockchain. This then serves as a public record in the ledger saying that the user identified by their public key is the owner of a file which can be retrieved from certain storage locations.

2.2.6 Conclusion This is indeed an interesting solution that seems to have already tackled many of the problems that arise while integrating a blockchain and file sharing. For this project the ideas of shards and heartbeats seems especially interesting and will be evaluated further for inclusion in the protocol and prototype. Storj focuses a lot on incentive mechanisms for making sure that files are kept available in the cloud for an extended period of time as well as the privacy issues concerning such storage. The authors behind that project seem to have solved many of the problems in that area which is partly why that will not be the focus for this project. The fact that the goal with the platform is not to share files but use it as a personal cloud storage solution means that the ideas need to be expanded and adapted to include the sharing aspect.

2.3 Dropbox

Dropbox is a third party service allowing for users to store, sync and share files across different devices. Dropbox for Business implements and extends that same service to give extra control to administrators and the way they manage an organization’s resources. In all versions they strive to provide an easy-to-use interface backed by an infrastructure geared towards fast and reliable uploads and downloads in a secure manner, [17, p.3].

2.3.1 Product Features Dropbox provides the following features in addition to cloud storage:

25 CHAPTER 2. THEORETICAL BACKGROUND

Administrator capabilities It is possible for the administrator to set sharing permissions for items currently uploaded thereby limiting who may share what with other parties. Furthermore, the administrator can monitor current access to team resources, which sessions are active, remotely wipe repositories, unlink devices that are connected to a user account and perform several other administrative tasks.

User features Apart from password protection end-users can further protect their accounts by enabling two-step verification which means using two different sources of identification. It is also possible to recover an unlimited number of previous versions of a file which means that changes to data can be tracked and retrieved.

2.3.2 Architecture The underlying architecture comprises several layers of protection and services.

Encryption and application service This service is responsible for processing file data arriving from clients. Dropbox applications communicate with this service when they detect changes to files that should be synchronized at the servers. Each file that arrives is split into blocks which are then encrypted. If the file has been stored before, only the modified blocks are synchronized. The blocks are then passed on to the storage service.

Storage service Responsible for storing users’ files acting as a Content-Addressable Stor- age (CAS) system where each encrypted block is addressable via its hash.

Metadata service Manages meta data about user data such as file names and types linked to Dropbox accounts. No files are stored or transferred by this service.

Notification service Monitors changes to accounts and signals to clients when files have been modified.

Each service connects to one another using SSL if necessary. Currently it is possible for a user to access Dropbox via a web browser, desktop ap- plication and/or mobile applications. Depending on the interface of choice capabilities and security considerations differ. For instance, mobile applications never download files automatically unless otherwise specified.

2.3.3 Reliability The managed meta data is distributed within a data center in an N+2 availability model (this model ensures that data availability is very high) with backups being performed reg- ularly. The file block storage is designed to provide more than 99.9999% durability using third party providers [17, p. 5]. This aims at protecting user data and coupled with load balancing across multiple servers ensures high service availability and redundancy.

26 2.3. DROPBOX

2.3.4 Security Overall Dropbox has enforced several policies for how access to data should be handled. Several policies have been devised for information security, physical security, incident re- sponse, logical access, physical production access, change management, support as well as privacy considerations and the process for incorporating changes into the system [17, p. 13]. These policies also, to some extent, extend to their third party service suppliers. Dropbox submits to regular third party audits to ensure that they live up to their claimed standards and have achieved certification under the standard ISO 27001 [17, p. 15].

Data in Transit To ensure data confidentiality in transit between Dropbox applications and servers SSL/TLS is enforced using 128-bit or higher AES encryption. The clients use public certificates to verify the identity of the Dropbox servers and thus prevent a man in the middle attack.

Data at Rest All Dropbox data at rest are encrypted using the 256-bit AES algorithm.

Key Management The file encryption keys are managed by Dropbox to “remove complexity, enable advanced product features and strong cryptographic control” [17].

Network Security The back-end network is diligently maintained and audited by internal and external security teams. Network traffic is controlled by firewalls, network security monitoring and IDS to pre- vent malicious traffic.

The network itself is divided into • Internet-facing DMZ • VPN front-end DMZ • Production network • Corporate network Access to the more sensitive sections of the network is limited to only approved personnel and IP addresses.

2.3.5 Conclusion What is relevant for this project in the Dropbox implementation is possibly their concept of splitting files into blocks and only updating modified blocks whenever files change. This results in less data in transit over time, saving bandwidth and speeding up the service.

27 CHAPTER 2. THEORETICAL BACKGROUND

Interesting to note is the amount of effort that Dropbox has invested to uphold their service’s security, availability and reliability, as well as trust from customers. There is auditing performed by third parties to build trust and there are many different subsystems for security and redundancy etcetera. All of this is inherent to a decentralized peer-to- peer network as it is naturally resilient and redundant to allow for great availability and reliability. In terms of trust it is necessary for users to trust in Dropbox and that they do not misuse the data that is stored on their servers. Can users trust that their data is not analysed thus possibly breaching personal integrity? What happens if some government pressures Dropbox to reveal data belonging to an account, can they guarantee that this will not happen? Can users trust in the redundancy, reliability and availability of their service, will it change over time and will the pricing change? Consequently this system incurs major third party trust with no absolute guarantees. This is the purpose with a decentralized consensus mechanism such as the blockchain which can be independently verified by each and every peer in the network thus building absolute trust.

2.4 Secure Email

Secure Email is used to send data from sender to one or more recipients using encryption to ensure data integrity and confidentiality. One of the most widely used encryption standards for email communication is OpenPGP [26]. This standard is defined by RFC 4880 [11], which describes the format and methods that should be used for handling such communication. OpenPGP uses digital signatures, encryption, compression, Radix-64 conversion, key management and certificate services to provide data integrity services. Signatures and encryption is discussed further in section 2.6.

2.4.1 Confidentiality To ensure data confidentiality OpenPGP combines symmetric-key and public-key encryp- tion. The object to be sent is encrypted using symmetric encryption and a random used- only-once key, called the session key. This key is then encrypted using the receiver’s public key and appended to the message. The following sequence describes this process in more detail and what it usually entails:

1. The sender creates a message.

2. A random number is generated, to be used as the session key for this message only.

3. The session key is encrypted using each recipient’s public key and prefixed to the message.

4. The session key is used to encrypt the message which is usually compressed.

5. The recipients then decrypt the session key using their private keys.

6. Using the decrypted session key the rest of the message is decrypted and possibly also decompressed.

28 2.5. GIT

It is possible to use only symmetric encryption if the key is derived from a passphrase or other shared secret or a two-stage method similar to what is described above where the session key is encrypted using a symmetric key based on some shared secret. The encryption process can be combined with a digital signature where the message is first signed and then both the signature and message are encrypted according to the above method. Cryptography is discussed in more detail in section 2.6.

2.4.2 Authentication A digital signature is created using a hashing algorithm combined with a public-key signa- ture algorithm according to the following sequence.

1. The sender a) Creates a message b) Generates a hash code of the message c) Generates a signature from the hash code using a private key. 2. The signature is attached to the message 3. The recipients generates a new hash code for the message and compares that to the signature.

If the hash of the message matches the copy supplied by the signature then the message is authentic, which means it was indeed created by the sender.

2.4.3 Conclusion Secure email is interesting for this project in that it ensures confidentiality via end-to-end encryption using digital enveloping. It also ensures authenticity via signatures and the possibility to share resources with multiple parties. However, the fact that it is every recipient’s responsibility to store the received resource is not satisfactory. Resources need to be available in the cloud over time.

2.5 Git

Git is probably not a system usually associated with file sharing, since its forte is version control. However, it is this version control and the fact that it is tightly coupled to sharing modifications to files that makes it interesting in this project. Git is a Distributed Version Control System, or DVCS [13], which means that it is used for keeping track of the version history of files amongst distributed nodes. The central repository of files is mirrored on each client which ensures that a complete recovery is possible even if the central server should go down, since each clone is a full backup of the entire repository.

2.5.1 Snapshots Git takes a picture of the entire repository when committing (saving) changes which means that the current state of all the files are saved into a database.

29 CHAPTER 2. THEORETICAL BACKGROUND

To make this more storage efficient Git links to a previous version of a file if it hasn’t been modified. Also packfiles [14], are used. These files consist of modified data stored in one large blob which has been indexed storing only changes from one version of a file to another.

2.5.2 Branching In most Version Control Systems there is the concept of branching. This entails diverging from the main line of development and initiating a separate branch of changes [12]. Since Git takes snapshots of a repository, it is easy to create branches. The system needs only adjust pointers to different snapshots to manage which snapshots belong to which branches. Normally two branches will eventually need to be merged. This means that changes made in those separate branches will have to be combined into one of the branches and conflicts may arise. The Conflict resolution capabilities of Git work well when dealing with text files but less so for binary files. The simple reason being that it is not easy to make sense of conflicting binary data.

2.5.3 Conclusion Since Git focuses on version control rather than plain file sharing it is most often used as a collaborative tool, much like how it was applied for coding and documentation in this project. The concepts of branching and merging would be interesting to incorporate in this project but it is by no means necessary for a file sharing platform. How Git handles modifications is of interest since it could possibly be applied in a way that reduces unwanted redundancy of storing similar information over and over again in the network. Most of this functionality will likely reside on clients and it is uncertain how much of this the protocol and other server nodes in the network would need to handle. This project will not attempt at implementing a fully functional Git client. Git will however serve as an inspiration to see what can and should be implemented in the resulting protocol.

2.6 Cryptography

This section outlines different cryptographic concepts that are necessary to include in a project such as this.

According to [34, p. 12] some of the security objectives that should be considered when evaluating any system are: confidentiality Only those that are authorized to access a resource should be able to do so. integrity Improper modification of information should not be possible to ensure informa- tion non-repudiation and authenticity.

30 2.6. CRYPTOGRAPHY authenticity The property of being able to be verified and trusted. It should be possible to verify that users are who they say they are and that data comes from a trusted source. Cryptography can be used to ensure all three objectives mentioned above. By encrypting data it is possible to obscure data, so that only authorized users are able to access it [34, p.39]. By signing data it is possible to ensure data integrity and non-repudiation (not being able to deny what has happened) as well as authenticity, since it can be proven that data has not been modified and that it originates from some particular user [34, p.59]. The foundation for cryptography are mathematical functions that are easy to perform one way but very difficult to perform the other way. For instance it is easy to calculate the public key from a given private Elliptic Key, but the other way around requires substantial effort. This permits creating unforgeable signatures and digital secrets. Security of any encryption scheme depends on how difficult it is to deduce the original data from what has been encrypted. There are generally two attack vectors that can be used: cryptanalysis or a brute-force attack. The first means analysing the algorithm used in the encryption process to try and find weaknesses whereas the other attempts all possible inputs to a decrypting algorithm until what resembles coherent data that is possibly the original data, is outputted [34, p.40]

2.6.1 Symmetric Cryptography This type of cryptography is the universal method for providing confidentiality for data. A symmetric, or single-key or secret-key encryption scheme consists of five elements [34, p.40] plaintext Data that is to be encrypted. ciphertext Output from the encryption. secret key Data that affects how the algorithm works. encryption Algorithm taking the plaintext and secret key as input performing various algorithm substitutions and transformation to produce the final ciphertext. decryption This is essentially the encryption algorithm run backwards. It takes the algorithm key and ciphertext to produce the original plaintext.

2.6.2 Asymmetric Cryptography The list in section 2.6.1, applies to asymmetric encryption as well with the modification that a secret key is replaced by a keypair containing a public and private key. Normally the keys complement each other in the sense that what is encrypted with one key can be decrypted with the other [34, p. 54]. Asymmetric cryptography allows for both encryption schemes ensuring confidentiality as well as signing data to ensure integrity and authenticity. This type of cryptography is generally slower than symmetric encryption [34, p.54] which is why the general applications for this kind of crypto systems are digital signatures, symmetric key distribution and encryption of secret keys [34, p.57]. Diffie and Hellman (inventors of the first public-key algorithm) postulated a list of requirements that all public-key encryption schemes must fulfil [34, p.57]. Easy in this list refers to computationally easy, and hard means computationally infeasible. Whether or not something is computationally easy is determined by how much processing power the operation requires.

31 CHAPTER 2. THEORETICAL BACKGROUND

• It is computationally easy for a party to generate a key pair. • It is easy for sender A to produce ciphertext knowing a public key and the message to be encrypted. • It is easy for receiver B to decrypt the ciphertext using a private key to recover the original message. • It is hard for an opponent to determine the private key from the public one. • It is hard for an opponent to determine the original message based on a public key and ciphertext.

For some schemes, such as RSA, the requirement that either the private or the public key can be used for encryption and decryption is fulfilled.

Elliptic Curve Cryptography ECC is a type of public-key cryptography based on the discrete logarithm problem that arises when adding and multiplying points on an elliptic curve. Bitcoin uses the elliptic curve and mathematical constants defined in the standard secp256k1 [5]. The curve used is described by

2 3 y = (x + 7) over Fp or y2 mod p = (x3 + 7) mod p (2.1) mod p indicates that the curve is defined over a finite field of prime order p or Fp with p = 2256 − 232 − 29 − 28 − 27 − 26 − 24 − 1. This means that the curve is defined on a finite set of points contrary to an elliptic curve over real numbers which is continuous [1, p. 66] The high order of p makes for an unimaginably large two dimensional grid. The following sections will describe ECC according to the standards used by bitcoin.

Elliptic Curve Mathematics To understand how key pairs are generated in ECC one has to understand some basic elliptic curve mathematics. In elliptic curve mathematics there is the generalised concept of addition and multipli- cation. The + operator signifies addition, P3 = P2 + P1, where all the points fall on the elliptic curve. This means that for each pair of points on the curve the sum thereof is also a point on the curve. Geometrically P3 is defined by drawing a line between P1 and P2 and reflecting the point of intersection with the curve, Pi, around the x-axis. P3 = (xi, −yi). If P1 and P2 coincide the line between the points become the tangent line at that point and thus P3 should be interpreted as the point where the tangent line intersects the curve. There is also the concept of the point at infinity, O, which is a point that is infinitely far away on the curve and does not have finite coordinates. Should two points added together have the same x-coordinate the tangent line between them will be vertical. In this case P3 is defined as the point at infinity, O. This point plays the role of 0 in regular addition in that P1 + O = P1. Based on the concept of repeated addition multiplication can be defined which means that kP = P + P + P + P... .

32 2.6. CRYPTOGRAPHY

Private Key The private key, k, is a number chosen randomly between 1 and approximately 2256. The process for choosing this number need to be cryptographically secure what means that it should not be possible to guess how this key was chosen [1, p. 64]. More formally, the key can be any number between 1 and n − 1 where n = 1.158 ∗ 1077. The key is defined as the order of the elliptic curve used in bitcoin.

Public Key The public key, K, in ECC is produced by using elliptic curve multiplication and multiplying the private key, k, with a so called generator point, G, on the elliptic curve. For bitcoin the generator point is a fixed point on the curve, as defined by the secp256k1 standard [5], which means that the same point is used to generate all keys.

K = (x, y) = kG (2.2) The public key is a point on the elliptic curve and can be shared with anyone, since it is computationally infeasible to calculate the private key from the public key. It is equivalent to guessing all possible k values that could produce K from 2.2, [1, p. 65]. This is where the security in this crypto scheme stems from. The fact that the public key can be derived from the private key means that it is not possible to use one key for encryption and the other for decryption, as is the case with RSA.

RSA This is one of the first invented public-key schemes and is the most widely accepted and implemented approach to public-key encryption. Compared to elliptic curve cryptography this scheme requires more overhead, but has existed longer which lends it more credibility [34, p.59]. This is a block cipher, which means that data is encrypted in blocks, where the plaintext and ciphertext are represented as integers between 0 and n − 1 for some n. The encryption and decryption process can be described by the following algorithm

C = M e mod n M = Cd mod n = (M e)d mod n = M ed mod n (2.3) where C represents the ciphertext and M the plaintext. e, d, n are constants that constitute the public and private keys. The public key is {e, n} and the private key is {d, n}. n = pq where p and q are different large prime numbers. The strength of this algorithm depends on the fact that it is computationally infeasible to factorize large numbers, in this case n which is necessary if one wishes to deduce the plaintext from the ciphertext. [9, p.168]

2.6.3 Hashing Secure hashing functions, usually denoted by H, are functions that accept variable length input and give a fixed-length output. Their purpose is to produce fingerprints of some block of data and they should fulfil the following requirements [34, p.52]: • The function can be applied to any size block of data • The output produced must have fixed length.

33 CHAPTER 2. THEORETICAL BACKGROUND

• H(x), that is the hash of some data, must be relatively easy to calculate. • It must be computationally infeasible to find x such that H(x) = h. Such a hash function is referred to as one-way or pre-image resistant. That is, given the hash it should not be possible to determine what data it was derived from. • For any given block of data x, it must be computationally infeasible to find y 6= x such that H(y) = H(x). This property is referred to as second image resistant or weak collision resistant. This means that different data should produce different hashes. • It is computationally infeasible to find any pair of data, (x, y) such that H(x) = H(y). This property is referred to as collision resistant or strong collision resistant. This means that it shouldn’t be possible to find any pair of data that hash to the same value. Hashing algorithms that fulfil these requirements are considered secure. Of course it is also possible to use encryption to produce fingerprints of data, but this is a much slower process and thereby more costly [34, p.51]. In bitcoin the SHA256 and RIPEMD160 algorithms are used.

2.6.4 Digital Signatures Digital signatures can be used for authenticity and integrity and are usually a combination of public-key encryption and hashing. A message for which integrity and authenticity should be provided is first hashed using some hashing function whereupon that hash is encrypted using the senders private key to produce the signature of the message. To verify a message the recipient hashes it using the same hashing function that the sender used, and decrypts the signature using the sender’s public key. If the two hashes match, the recipient can be sure that the message is valid and originated from the sender.

2.6.5 Public-Key Certificates When exchanging public keys as part of some public-key encryption scheme, it is pertinent that each party can verify that the public key they receive actually belongs to the person who sent it. The solution to this problem is the public-key certificate. A certificate comprises the public key and identifying data for some user along with the signature for that block of data created by a trusted third party (usually a Certificate Authority, CA). This CA has verified the identity behind the public key and guarantees that the key belongs to that entity. [34, p.60]

2.6.6 Digital Envelopes This technique allows for multiple parties to protect a message without having to share a secret in advance. This uses a combination of public-key and symmetric encryption. A random symmetric key is generated, that will only be used once. The message is encrypted using this key whereupon the key itself is encrypted using each of the recipients’ public keys, and appended to the message. This envelope is finally distributed to the recipients who are the only people capable of decrypting the symmetric key, using their private keys, and ultimately recover the original message [34, p.61]

34 2.7. SUMMARY

2.6.7 Conclusion All concepts covered in this section will be used in the project in some form or another. Cryptography is necessary to ensure confidentiality, integrity of data and authenticity which is a requirement for decentralized file sharing to work. Asymmetric encryption will be used to sign data that is transferred between recipients to ensure authenticity and integrity. Symmetric encryption will be used to encrypt sensitive data and file data to ensure confi- dentiality. The reason behind this is that symmetric key encryption is faster than asym- metric encryption. When distributing the encrypted files across the network it is likely that a combination of both these concepts will be implemented using some form of digital envelopes. Public-key certificates can be used to identify other recipients using the service if deemed useful. For instance it might be of interest when sharing sensitive information that it is crucial only the correct recipients have access to. However, the use of public-key certificates requires third party trust (trust in the Certificate Authority that issued that certificate) and the infrastructure to distribute and validate such certificates will have to be implemented as a separate service complementing the file sharing platform. Hashing will be used to a large extent in this project similar to how bitcoin hashes all its block headers and transactions to ensure set membership, integrity and identification of a resource (for instance a transaction is identified by its hash).

2.7 Summary

The technologies that have been presented in this chapter all relate to the area of interest as described by the problem statement in section 1.2. Git and Dropbox are means for sharing and collaborating using resources in projects or similar settings. They are not related to the decentralised consensus network that Bitcoin represents nor do they ensure confidentiality from sender to recipient in the way that secure email does. Storj in itself solves the problem of sharing files using public ledger technology. However, it is not possible to directly share files with different parties. Instead you share links giving access to the resource at hand. This means that the user does not share a resource specifically to another recipient, but instead uploads the file to the network giving access to it via links. Furthermore Storj does not seem to offer the possibility to track changes to a file which is possible with both Git and Dropbox, where Git is probably the best suited solution in this aspect. The idea behind Storj resembles a personal cloud-storage solution more than an actual file sharing platform. Bitcoin itself does not offer any file sharing capabilities. It is presented to describe underlying concepts and ideas that are to be evaluated during this project. In conclusion, further research within the domain as stated by the problem statement in section 1.2 seems justified. This research will not create something conceptually new, but rather merge existing technologies into something new and evaluate whether or not they are suited for use in file sharing using a decentralised network. Concepts and solutions from different types of file sharing platforms and collaborative tools, such as Dropbox, secure email and Git, will serve as inspiration to see if it is possible to incorporate these into a new solution. Storj has already proven that it is possible to store files using public ledger technology but has yet to fully explore the feasibility of sharing files in a practical manner like that which is already implemented with non-public ledger technologies in the other products.

35

Chapter 3

Methodology

This chapter discusses methodologies applied in this project.

3.1 Literature Study

A literature review of existing information was performed to investigate the current state of the problem domain as dictated by the problem section, 1.2. The review consisted of collecting relevant information for the area at hand and evaluating that data. According to [39], a literature review should consist of an overview, a summary and an evaluation. In this project these aspects are covered in chapter 2. To keep in mind during this process is that the information analysed must be relevant for the thesis and state what is and what is not known. Furthermore it is prudent to identify any areas of controversy in the literature and formulate questions for further research. Taylor [39], proposes a number of questions that should have clear answers when per- forming the literature review process. These questions include the following: • What type of literature study is performed? In this case it was a quantitative study used to determine technologies and protocols for solving the problem at hand. • Has the search for information been wide enough to include all relevant material? • Has irrelevant material been pruned away? • Is there a relationship between the information at hand and the problem statement that is being investigated? • Is the literature critically analysed and has the studies been discussed in light of the researcher’s own opinions? For each source of information being analysed, a further set of questions should be used as a guideline to help the researcher arrive at relevant conclusions. This is an excerpt from what is proposed by Taylor, [39]:

• Has the author formulated a problem or issue that is being analysed? • What are the strengths and limitations in the way the author formulates that problem. • How does this source of information relate to the thesis question at hand and what are its strengths and weaknesses?

37 CHAPTER 3. METHODOLOGY

• Has the author evaluated literature relevant to the issue at hand? Are sources in- cluded, that disagree with the author’s point of view? • Is the author objective in arguments and claims?

The resulting literature study for this project is more of a compilation of already known technologies and concepts and does not involve contrasting opposing opinions. The sole purpose of the study was to evaluate information about the particular areas of interest and then apply that knowledge in the attempt at achieving the project goals.

3.2 Development

The following sections each describe one aspect of the development process during the project. The ordering of the sections adheres to the sequence in which they were applied in the actual process.

3.2.1 Analysis The first step in the development process is identifying the requirements of the system that is to be produced. These requirements are either directly stated in an existing specification of requirements or should be derived from use cases that dictate what the system is expected to achieve. In the context of this project the requirements were directly derived from the problem statement and what challenges it presented as well as the technologies explored in chapter 2. Once use cases and requirements have been determined they act as input to the next step in the development process which is the design phase.

3.2.2 Design The purpose with good design is to facilitate the production of code that is flexible and also easy to maintain, develop and understand. This is highly desirable in large projects with multiple developers to avoid misunderstandings and reduce redundant work. This project has applied several design principles which are outlined below.

Object-oriented design The foundation on which all object-oriented design should rest can be boiled down to the following three basic principles:

1. Low coupling 2. High cohesion 3. Encapsulation

Low Coupling This principle states that classes should have as few dependencies to other classes, as pos- sible. This minimizes the effect on a particular class when modifications are made in other classes. The more unstable the referenced class, the greater the impact of modifications therein [22, p.285].

38 3.2. DEVELOPMENT

High Cohesion High cohesion states that classes should have well defined knowledge and purpose which means that a class should only know what it needs to know and only perform actions that it should perform [22, p.291].

Encapsulation This concept revolves around the accessibility of declared entities in code. Usually this is governed by modifiers, such as public, private and protected or a lack thereof. The purpose of encapsulation is to separate the public interface and the underlying implementation. By defining a good public interface it is possible to modify the implementation without affecting dependent code since the interface can remain intact.

Other Principles There are other principles and patterns that can and should be applied depending on the situation. Some of the more noteworthy are: 1. Polymorphism 2. Inheritance 3. Composition and Aggregation

Polymorphism This means that a particular declaration can have different underlying implementations [22, p.414]. Polymorphism allows for higher cohesion, since the code for choosing imple- mentation is determined by the object that accesses the declaration. It also results in lower coupling, since classes making use of a particular interface need not depend on the underlying implementation.

Inheritance Inheritance is when a subclass inherits properties from a superclass. What properties should be inherited can be dictated by the use of modifiers such as protected and private.

Composition and Aggregation These terms describe forms of association. Composition describes the relation between an enveloping object and a contained object [22, p.519] much in the same way that a car is composed of wheels and an engine. The analogy is not perfect since composition describes a relation where the contained objects can not exist outside of the enveloping object, which is the case for wheels and an engine that are objects in their own right. If the relation in a composed object points to an object that is capable of existing on its own that relation is instead characterized as an aggregation. Consider the driver of a car. The car contains the driver but the driver also exists outside of the car.

Patterns Patterns are general and well tested solutions to problems of a certain type. They are guidelines for how the solutions should be designed and implemented. Each of the outlined patterns have been implemented in the project code.

Layer This pattern attempts to minimize the rippling effect of modifying code in one place and as a result having to change code in other places. The idea is that business logic should not be mixed with code for the user interface or mixed with services such as persistence, leading to

39 CHAPTER 3. METHODOLOGY low coupling. This is achieved by dividing the system into different layers, or subsystems, where each layer has high cohesion [22, p.204] and serves a well-defined purpose by using packages (Java). Ideally this set of layers form a hierarchy and the goal is to achieve low coupling between subsystems and have no dependencies going from layers further down the stack to those closer to the top. The top of the stack is where the user resides. A variant of the Layer pattern is MVC where the hierarchy of layers is composed of, from top to bottom; the View, Controller and Model.

View This contains the entire user interface with all that it entails from handling user actions and events to window-management. Whenever an application needs to communicate with the model to manipulate data due to user actions the View calls the relevant procedures in the Controller.

Controller This package and ultimately its classes are responsible for delegating incoming system calls to the appropriate subsystem. This concept is a pattern in itself and in terms of MVC, the subsystem is the Model. All system calls from the View reach the Controller which in turn calls the relevant procedures in the model.

Model The Model consists of all the classes maintaining the application’s view of reality. All business logic resides here and this is where modification of data takes place.

Singleton This pattern was widely used in the project implementation and deals with the problem of having exactly one instance for an object, which should be accessible throughout the system, [22, p.442].

Observer It is common for different objects in a system to monitor changes in other objects. The Observer pattern proposes a solution to this without making all those objects depend on each other, leading to lower coupling. By creating an Observer interface and possibly also an Observable interface, objects can register as observers with another object and thus be notified each time a modification occurs in the observed object [22, p.465].

Factory A Factory serves the sole purpose of creating other objects that are possibly too complicated or would lead to low cohesion if created elsewhere.

DAO DAO stands for Database Access Object and such an object has the single purpose of com- municating with a database. This type of object should not have any dependencies to the logic layer nor should it contain any business logic in itself. The public interface of such an object is designed to satisfy the business layer.

Modelling This section describes how the prototype was first modelled.

System

40 3.2. DEVELOPMENT

With the aforementioned principles and patterns in mind the design of the system needs to be modelled. A tool well suited for use in this process is UML. UML defines , objects and relations by specifying their look, what they may consist of and their associations. Using these diagrams and their associated objects and relations it is possible to model different types of systems, concepts or other phenomenon. The standard itself says nothing about what the diagrams should be used for, only how they should be constructed and what they should consist of. In this project the primary diagrams used were Sequence- , Class- and Communication diagrams as well as the concepts of a Domain model and a System .

Sequence diagram Such a diagram models the interaction between objects and how they pass messages to each other in sequence [22, p.222]. In terms of software development this is useful when designing method calls and in what sequence they should occur to achieve a particular goal.

Communication diagram This type of diagram is similar to the sequence diagram in functionality. However, the representation differs and gives less focus on the sequence of events [22, p.224]. These diagrams are better at giving an overview.

Class diagram A gives a static view of something and describes how different objects are related and what they consist of. In this project this type of diagram was used to give an overview of how different classes and subsystems, relate and what they are comprised of.

Domain model Based on the requirements determined during the analysis phase, 3.2.1, possible candidates for objects or classes, that should appear in the domain model, are identified. The do- main model itself depicts the problem domain in which the application operates. This is often represented as a class diagram where the objects represent entities with real-world counterparts [22, p.134].

System Sequence Diagram The purpose with this type of diagram is to show all system operations (operations that a user can execute with the application). The diagram depicts the system itself as a blackbox and shows the interactions that is possible between the system and the rest of the world to give a clear overview of the capabilities of the system [22, p.176].

Database Based on the domain model, also called the conceptual model, of the application a logical database model is created. Since the underlying data store for this project was a relational database (see section 3.2.4), a brief description of that concept follows.

Relational database A relational database means that data is stored in relations, or tables, where each table consists of rows with columns that each have domains. This relates to the object oriented approach where a table can be considered an object, columns attributes of that object, domains are possible values for each attribute, and rows represent actual instantiations of

41 CHAPTER 3. METHODOLOGY

the object. Tables in a database usually have inter-table relations realised by the use of so called foreign keys which point to rows with primary keys in other tables. These relations can have multiplicities ascribed to them giving rise to relations, such as one-to-one (a row in one table points to exactly one row in another table and vice versa), or many-to-many. A primary key is either a single column or a set thereof, whose combined value must be unique in a particular table. This way it is possible to use the primary keys to point to rows in a table. The relational model underlying a relational database is based upon relational algebra [16, p.88] which, among other things, allows for projecting, selecting and joining different datasets in a swift and efficient manner.

Logical Database Model Classes, attributes and associations in the conceptual model are translated to tables, columns and keys. The model is then normalised to avoid redundancy being built into the database. This process involves removing functional and transitive dependencies. Extra tables might be introduced to handle many-to-many relations and surrogate keys may be used to replace composite primary keys to remove redundant data in referencing tables.

Tools For this part of the development process Astah©, [15], a powerful UML editing tool, was used for the creation of UML diagrams. The main reason for choosing this tool was the possibility to merge different versions of a design into one. This trait is highly appreciated when working in a project with several developers as it is possible to modify the design separately and then meld those versions. For the database design MySQL workbench was used with its ability to export SQL scripts. See 3.2.4 for more detail.

3.2.3 Coding Coding is the process of translating design into code. During this process errors and lim- itations in the design are detected. If it is pertinent, the design should be updated to reflect modifications that were necessary to correct the issues. However, the design does not necessarily have to match perfectly the code.

Coding Conventions Coding conventions exist to standardize how the code looks when it is implemented to facilitate reading and extending it. Listing 3.1 and subsequent comments present conventions used in this project.

1 package com.mycompany. productidentifier .subsystem; 2 3 /∗∗ 4 ∗ Description of what the class does 5 ∗ 6 ∗ @author John Doe 7 ∗/ 8 p u b l i c class DescriptiveClassName { 9

42 3.2. DEVELOPMENT

10 ... 11 12 /∗∗ 13 ∗ Description of what the method does. 14 ∗ What are its parameters and results 15 ∗ 16 ∗ @param param1 Description of the purpose of the parameter 17 ∗ 18 ∗ @return Description of the return data 19 ∗/ 20 p u b l i c descriptiveMethodName( param1 ...) { 21 ... 22 } 23 24 p r i v a t e descriptiveMethodName2( param1 ...) { 25 ... 26 } 27 28 ... 29 30 } Listing 3.1. Code Conventions The general package structure should serve the purpose of minimizing inter-package depen- dencies. Thus, each package should consist of classes with a similar purpose or functionality to maximize high cohesion and minimize coupling among packages. Each class should be put in a package whose name clearly relates to the functionality of the class itself. If the class can be placed in many different packages and/or supplies widely used functionality such a class should be placed in a util-package. Each object or class should have a clear description of the function of that particular object as well as who originally authored it. Each method in the public interface should also be documented with a description of its functionality as well as its parameter list and return data. The public interface comprises all outwardly exposed methods with public, protected or package visibility. Only private methods are exempt from the need of a descriptive comment. The comments preceding all class and public method declarations should to the ex- tent possible follow Javadoc syntax and conventions [27]. Comments inside method bodies should, if possible, be avoided, since they are generally not maintained when code is up- dated.

Best Practices Some best practices which have been applied in the project are outlined in the following sections.

Avoid duplicate Code Instead of copying code the principle should be, write once and reuse.

Avoid long Methods If the method is long and complex it is likely that it will be difficult to follow the author’s train of thought throughout the method body. The solution is to divide the body into several sub methods where each method-name clearly dictates what functionality is provided.

43 CHAPTER 3. METHODOLOGY

Avoid long Classes A long class is usually an indication of low cohesion which means that the class can probably be divided into several classes with separate responsibilities and functionalities.

Avoid long Parameter Lists Use DTOs and other container objects to group long parameter lists.

Use descriptive Names All declarations should have names that clearly state their function and purpose and the reuse of variables should if possible be avoided, since that deviates from the concept of high cohesion.

3.2.4 Implementation Several technologies were combined during the implementation phase of this project and the most noteworthy are covered in this section.

Language The language of choice for this project was Java. The reason for this is primarily familiarity and its flexibility in that it can execute in many different environments as long as the JVM has been compiled for that particular environment. The choice of Java is further motivated by the fact that there exist well established technologies such as JPA and JUnit and that it is a widely adopted programming language, facilitating the development process.

Unit testing To ensure that the code produced is robust and performs as expected unit testing is an invaluable tool. With well designed tests it is possible to be absolutely confident that the code executes they way it is supposed to, once the tests pass. Once it is possible to verify the behaviour of the code, it becomes more flexible. The reason being that modifications to the code can be done without fear. As long as tests are passed it is plausible that the code performs as anticipated. This facilitates implementing new functionality and adapt to changing requirements. The objectives for tests are that they should be complete, in that they cover all relevant parts of the code, reproducible, in that they can be performed over and over again without leaving side effects, self evaluating, in that they should explicitly indicate whether or not they passed. A unit test, tests some part of the code that in itself comprise a coherent unit of code. For instance, a single method is tested to ensure that it executes as expected.

Testing Framework Since the project was implemented using Java, the testing framework of choice was JUnit.

Object-relational Mapping This concept deals with how data in a database is mapped to objects in an object oriented programming language, such as Java. In Java a natural choice is JPA [28], which provides an API for all such mappings. This way it is possible to have a standardized way of communicating with an underlying data store, be it MySQL, PostgreSQL or something else

44 3.2. DEVELOPMENT similar to the concept of polymorphism. The implementation behind the declaration can easily be changed. The result is that the application calls methods in JPA which are mapped to a database provider such as EclipseLink [18] that in turn communicates with the underlying data store and delivers the result back to the calling application.

Database As the backing datastore for the prototype implementation MySQL [30] was chosen. The reasoning behind this choice was once again one of familiarity and ease of implementation. A product tightly coupled to MySQL is MySQL workbench [29], which enables efficient generation of SQL statements based on a logical database model. Since the model is likely to change as new requirements emerge during the development process, it is efficient to only have to modify the model and then automatically generate code necessary to update the database. This automated process greatly improves development speed and makes for ease of use. For details about relational see section 3.2.2.

Protocol Development Since the main focus of this project was to design and implement a protocol, a way of easily translating an evolving protocol definition into code swiftly and repeatedly was desired. Messages in the protocol should also be serialised and de-serialised in an efficient manner and leave a small footprint in transit. Google’s protocol buffers [20] fits all those criteria. The protocol buffers allow for defining messages that constitute the protocol, in so called .proto files. Those files are then compiled into, for instance, Java code that create and handle the messages. Protocol buffers are especially well suited for communication protocols or data storage and is, compared to XML, simpler, 20-100 times faster, up to 10 times smaller and the access classes generated are easier to use programmatically [20].

Tools The IDE used in this project was Netbeans [31] due to its integration with Maven [2], a software project management and comprehension tool. Maven is a tool that can be used for building and managing any Java-based project. It revolves around the POM, Project Object Model which is used to, for instance, specify the build process, facilitate management and testing.

3.2.5 Development Methodology To facilitate an adaptable and flexible development process, it was decided that parts of the Scrum software development methodology should be applied in this project.

Scrum In Scrum there are three primary roles: The product owner, who represents the stakehold- ers and customer, who is responsible for ensuring that value is ultimately delivered. The development team, whose responsibility is to deliver potentially shippable versions of the

45 CHAPTER 3. METHODOLOGY product by the end of each sprint. The Scrum master who has the job of facilitating the work effort by coaching the team, keeping it on track, and adhering to Scrum principles etcetera. The requirements that have been identified for a certain product is collected in a product backlog which is maintained during the duration of the development work, containing both features, bug fixes and non-functional requirements or whatever else might be necessary to deliver a functional product by the end of development. A sprint, or iteration, is the basic unit of development, which means that all work is performed in an iterative and incremental fashion. Sprints are limited in time and often stretch from a week to a month. Each sprint begins with a planning where the work and goal for the coming iteration is identified and a sprint backlog put together. The sprint backlog contains items that should be handled during the coming iteration. They are displayed on a task board and sorted into their current state; To do, In progress and Done with possible variations. During the sprint there is what is called the Daily Scrum where each member answers three questions about what was achieved yesterday, what needs doing today and has any problems arisen preventing the team from reaching the sprint goal. By the end of the sprint the progress made and whether or not the goals have been achieved, as well as other issues, are reviewed in a sprint retrospective. This serves the purpose of improving the process for future sprints. During the sprint completed items are moved to the Done category and a burn-down chart is updated to monitor progress. By the end of each iteration there should be a working product which means that the software should be integrated, fully tested, documented and possibly ready for delivery.

Chosen Methodology

This project adapted some of the Scrum methodology. The parts that were deemed to contribute more than the effort they required, were chosen. The product and sprint backlog as well as the sprint methodology were implemented into the development process which meant that the work flow was iterative. Each sprint was one week in time and was backed up by a sprint backlog as well as planning and review sessions.

3.3 Project management

This section outlines administrative and communication methodologies that were applied in the project.

3.3.1 Project

To facilitate the process of planning, organizing and controlling resources and ideas during the lifetime of the project a project management application was deployed. Redmine [32] was chosen to facilitate these tasks. Using this tool it was possible to track bugs, create and assign tasks and roles to members of the team as well as create Gantt charts to monitor progress and follow up on issues.

46 3.3. PROJECT MANAGEMENT

3.3.2 Documentation

All documentation was written in either LATEX [23] or markdown. The reason behind choosing LATEX is that it provides professional typesetting capabilities coupled with powerful bibliography management and an extensive set of functions expandable by a plethora of plugins. markdown was used to handle all documentation but the report, which was written in LATEX, since it allows for clear and concise formatting with the possibility to export to many different formats. Both LATEX and markdown are plain text which makes them well suited for version control and conflict resolution.

3.3.3 Collaboration To aid development and documentation a decision was made to utilise Git throughout the entire project. One of the reasons is the inherent redundancy built into the Git implemen- tation, since each participating node contains a complete copy of the repository. Another reason is the possibility to branch and explore different ideas as well as the possibility to handle merge conflicts that arise when multiple developers work on the same code. For a detailed description of Git see section, 2.5.

47

Chapter 4

Solution

The solution is based on the Bitcoin concept with a similar network structure and protocol. The protocol and network has been modified to take into account the requirements specified in section 4.1 and incorporate ideas from all the technologies described in Chapter 2. The solution itself has been named SecuRES.

4.1 Requirements

Based on the different technologies presented in Chapter 2, and their capabilities, there are a number of requirements that the authors believe an implementation should fulfil.

4.1.1 Functional Requirements 1. Sharing must be possible to multiple recipients. The process should be similar to that of Dropbox.

2. Branching should be possible. This means exploring different versions of a file in parallel and is directly related to the branching concept of Git.

3. Permissions should be a part of the platform. Similar to how access control lists are implemented in Linux, it should be possible to dictate who is allowed to do what with a file within the confines of the network.

4. The platform must allow for different types of file sharing. Both double spending and transferral of ownership (no double spending) should be possible.

5. It must be possible to upload files to a network and have those files be available for recipients.

6. It must be possible to verify that files uploaded to the network are still available and have not been corrupted, without downloading the entire file. This requirement is directly derived from the idea of heartbeats described in section 2.2.

7. The file storage must be resilient in that there is no single point of failure. This means that redundancy must be built into the file storage solution.

49 CHAPTER 4. SOLUTION

4.1.2 Security Requirements 1. Confidentiality must be guaranteed. Akin to how secure mail guarantees that only the recipients have access to the data being shared it should not be possible for anyone else to deduce that information. 2. Integrity of data must be guaranteed. This means that if data has been modified there must be a way to tell. This is achieved in secure email as well as bitcoin. 3. Non-repudiation must be guaranteed. No one should be able to deny what has hap- pened. This is implemented in the bitcoin protocol via the public ledger registering every transaction. 4. Trust in a third party must not be necessary for the continuation or usage of the ser- vice. This stems from the concept of a decentralised peer-to-peer network in bitcoin.

4.2 Network

To achieve a decentralized solution for file sharing the authors propose a network that is similar in structure to that of Bitcoin described in section 2.1.3. The same types of nodes will be used and the functionalities they provide. However, similar to Storj it is pertinent to introduce one more node functionality due to the storage requirements when uploading files to the network. This functionality has been named Storage, and a node providing this is able to store parts of files as well as supervise the health of those parts in the rest of the network. Appendix E shows an overview of the decentralized SecuRES network. The appendix also shows some integration towards other systems that are not relevant for this report, such as the IDMS nodes.

4.3 Concepts

The authors have introduced several concepts that differ from Bitcoin. Some of these con- cepts are implemented in Storj. However, SecuRES expands these concepts and introduces new ideas as well.

4.3.1 Sharing a File The first time a file is uploaded to the network a genesis transaction is created for this particular file. During this process a file ID, the hash of the genesis transaction, is assigned to this file. A transaction contains inputs and outputs similar to bitcoin and encode information about the file and its recipients. The outputs of a transaction specify the recipients for the file and what their permissions are within the confines of the SecuRES network and the file in question. The file itself is split into similarly sized parts called what have been named slices, that are encrypted using a randomly generated encryption key, and distributed across the network independently of the transaction. The information about the slices is encrypted, reusing the previous encryption key, and appended to the transaction before it is distributed to the rest of the network.

50 4.3. CONCEPTS

Once the transaction reaches the recipients, they decrypt the file- and slice-information and request slices from the network to construct a local copy of the file. Interesting to note is that any type of digital resource that can be packaged in a file, be it contracts, documents, securities, music or movies can potentially be shared using the SecuRES platform.

4.3.2 Confidentiality and Integrity Just like Bitcoin, transactions for sharing files will be set out to hashes of public keys and it is possible for clients to set bloom filters to aid privacy. All confidential information about a file in a transaction and the file itself is encrypted using a secret key that is in turn encrypted using each recipient’s public key to ensure that only they have the means necessary to access that information. All transactions reference a file ID but since it is a cryptographic hash, it is not possible to discern any information about the file from this hash.

4.3.3 File Slice When files are uploaded to the network they are divided into smaller pieces to make them more manageable. Each slice is encrypted in such a way that confidentiality is ensured and only relevant parties will be able to access the original content. The slices are distributed across the network to storage nodes, without reference to the file to which they belong, to ensure privacy and reduce the risk of hostage situations. A hostage situation would, for example, be when someone withholds a slice of a file, thereby making it incomplete if some condition is not met. A file slice is the direct equivalence of a Storj shard. Each slice is identified by its hash after it has been encrypted.

4.3.4 File Crumb Even smaller divisions of the files, called crumbs, are used to make the slices even more manageable and make the concept of slice verification using proof-of-storage, or Merkle tree audits, possible. They enable verification without the need to download complete slices of a file. The file crumb is the smallest part of a file that circulates the network. This is ultimately the shape that files will have in the peer-to-peer network and how they are distributed. Collections of a certain amount of crumbs are constitute a slice.

4.3.5 Updating a File Whenever a file that is shared on the network is locally updated and a client wishes to share that modification with the rest of the recipients that have access to this file, a new transaction is created. Only slices of the file that differ from those currently residing in the network are up- loaded. This helps reduce unwanted redundancy on the network similar to how Dropbox, 2.3 manages file changes. Updated slices are encrypted using the same encryption key that was first used for this file ID, what means that a new key need not be distributed to the users currently sharing this file.

51 CHAPTER 4. SOLUTION

The new transaction will contain only information about the newly uploaded slices. No transaction outputs are necessary unless new recipients are also added to the file, as the old recipients are already logged in the blockchain. When clients request new transactions pertaining to this file ID this transaction will be delivered and they can download the new slices and update their local version of the file. It is possible to track the entire version history of a file through its transactions stored in the blockchain.

4.3.6 File Management

The health of slices in the network is constantly monitored to ensure that slices do not disappear or become corrupt and that the specified redundancy for each slice is maintained. Each storage node that agrees to store a slice also agrees to monitor its health across the rest of the network and each client to whom this slice is relevant will also aid in this supervision. The auditing process itself is performed by issuing regular slice verification requests to other nodes that store the slice in question. If some node storing a slice, goes down, the auditing nodes negotiate that the particular slice is uploaded to a new storage node to substitute the failed node. If the failed node ever comes back up again, the redundancy of the slice in the network will temporarily be greater than what has been requested. However, the recovering node will detect that the redundancy has already been satisfied and eventually drop the slice. Slices are completely decoupled from files and transactions which means that the net- work can not link any slice to any particular transaction or file which in turn helps prevent targeted attacks on files.

4.3.7 Slice Verification

The SecuRES approach of verification of slices is a modified version of the concept of heartbeats in Storj. Heartbeats are discussed in section 2.2 and implemented using Merkle Trees. However, Storj currently only allows for verification to be performed by the client that first shared the file. SecuRES allows for a verification process that can be performed by both clients and storage nodes in the network without having to download entire slices. For each slice a Merkle Tree is constructed based on the crumbs that constitute that slice. The Merkle Root for that tree is included in the transaction that shares the slice ensuring that storage nodes and auditing nodes can not fake what the Merkle root should be. As explained in section 2.1.4 a Merkle Root represents a fingerprint for a large set of data, in this case a collection of crumbs. Whenever a node wishes to verify that a slice is still available and not tampered with, that node requests a pair of crumbs as well as their Merkle Path from a storage node. By calculating the Merkle root based on that information and verifying the result, the node now knows with high probability that the corresponding slice is valid. The reason that the node can not be absolutely certain in the outcome of that verifi- cation is because the entire slice was not checked. It is possible that the storage node kept only those crumbs and the corresponding Merkle Path for that particular slice and nothing else. However this is mitigated by requesting random crumbs every time which ensures that the storage node needs to store the entire slice.

52 4.3. CONCEPTS

The requirement for a node or client to be able to perform this verification process is that it has access to information about the file slice itself. This is only possible if a node either stores the particular slice or is able to decrypt the relevant information from a transaction (as is the case with client nodes).

4.3.8 Access Permissions The SecuRES protocol allows for permissions to be set on a per file ID and per recipient basis. Once a permission has been set, it is not possible to revoke it by any other means than what is discussed in section 4.3.10. The permissions dictate what type of transactions are allowed for a particular file ID and user within the confines of the SecuRES network. Nothing prevents someone from uploading a shared file with a new file ID and thereby circumventing the previous permissions set for that user in regards to this specific file. However, since the file ID is new, the transaction chain pertaining to this ID is clearly distinguishable from the original chain. Permissions are part of the validation of transactions which is why each transaction references previous transactions that explicitly grant to a user necessary permissions for what is attempted. This is also the reason why it is not possible to revoke permissions, since it is not possible to remove transactions from the blockchain. The user granting permissions can never grant permissions to him- or herself and also not grant permissions that they do not already possess.

4.3.9 Branching The protocol allows for branching by assigning a branch ID to transactions. Based on these branch ID:s, client nodes can then reconstruct branches locally for each file. Branch ID:s are only relevant for transactions that describe modifications to a file. Permissions are set globally for a particular file ID, independent of different branches. When the client creates a transaction that contains file modifications, the branch ID is set in the transaction indicating to other clients what branch the modifications belong to so that they can locally reconstruct the branching tree. If a user wishes to work on a different branch, the workflow is similar to that of checking out a branch in Git. Either an existing branch is used or a new one is generated. Each transaction contains a timestamp to allow for a sequential ordering of modifications of a file.

4.3.10 Splitting Revoking permissions for users is only possible in one of two ways. Either a new genesis transaction is created for the file in question or a split is performed. By creating a new genesis transaction the file is given a new file ID and encrypted using a new encryption key, effectively blocking anyone not on the recipient list from accessing this version of the file. The old transaction chain for the old file ID will still be available on the network and evolve over time by users still using that but nothing related to the new file ID will be available to users that were not specifically invited. A split transaction is similar to a genesis transaction in that a new encryption key and a new file ID (the hash of the split transaction) is generated and only those users authorized to be recipients of the transaction will have access to the new version of the file. The

53 CHAPTER 4. SOLUTION difference is that this transaction will contain references back to transactions in the old transaction-chain so as to not loose file history. This means that only modified slices need to be encrypted using the new encryption key and old slices that have not been modified can remain intact. This helps reduce unwanted redundancy. Even though only modified slices, encrypted using the new encryption key, are uploaded to the network, the split transaction needs to contain slice-descriptions for all slices that constitute this version of the file that is split. This snapshot of the file ensures that slices from previous transactions can be reused.

4.3.11 Joining

If there are many splitting transactions over time, there will be many encryption keys in circulation for a particular file. Sharing this file with a new recipient will then require the encryption of each such key using the recipient’s public key. This will potentially make for large transactions which is why the join-transaction was introduced. This type of a transaction collects slices encrypted using different encryption keys and re-encrypts them using one common key and assign a previous file ID (matching the key chosen). For instance if one wishes to revert back to a previous permission state before a split, this is what should be used.

4.3.12 Double Spending

In Bitcoin this means the act of spending the same bitcoins multiple times, which is not allowed. Translating this to a file-sharing scenario, double spending would entail sharing the same resource multiple times. For regular files this is not an issue, they can be shared over and over again. However, for certain resources such as contracts, the act of double spending should probably not be allowed if it involves value changing hands.

4.3.13 Transaction Verification

Since there are different types of transactions that can be issued in the SecuRES-protocol there are also different validation procedures. TRANSFER transactions does not allow for double spending. This has to be set in the genesis transaction forcing all subsequent transactions to also be of the TRANSFER type. For this type of transaction there are no permissions. Ownership of the resource is transferred from the sender to a single recipient. A NORMAL, SPLIT or JOIN type transaction allow for double spending as long as the user creating the new transaction has the necessary permissions. These types of transactions are all validated in the same way according to their permissions. The inputs in a transaction are required to reference previous transaction outputs grant- ing a user sufficient permissions for what is currently being attempted.

4.3.14 Transactions

There are a number of possible transactions created for different purposes.

54 4.3. CONCEPTS

Genesis Transaction This is the first transaction for a file that is shared on the network. The hash of this transaction will also act as the ID of the file in question. Using this ID it is possible for clients to request all transactions that are related to a particular file. This means that transactions that contain only modifications of a file can also be requested, since they do not necessarily have outputs.

Modification Transaction This type of transaction will contain only encrypted slice descriptions for those slices of the file that has been modified. Thus, no recipients need to be specified. The encryption key will be reused from a previous transaction.

Permission Transaction These transactions are used to grant permissions to authorized file recipients. That is, recipients that have already received transactions pertaining to a particular file ID. The file description part of the transaction will not contain any encrypted slice descriptions.

Split Transaction This type of a transaction is discussed in section 4.3.10.

Join Transaction This type of a transaction is discussed in section 4.3.11.

Transfer Transaction This type of transaction is used to prevent double spending. This is specified in the genesis transaction and entails transferring ownership to a single recipient in each transaction.

Sharing Transaction Whenever a file is shared with a new party, the encryption keys for slices of the file are encrypted using the recipient’s public key and the permissions for the user are set in the transaction’s output. It is possible to have multiple new recipients specified in the same transaction.

4.3.15 Blockchain The blockchain will consist of blocks identical to those in bitcoin. Similar to bitcoin, there will also be a secure genesis block for nodes to start building their blockchains on when synchronising data with other peers in the network. The genesis block will contain a single genesis transaction for a file containing a string reflecting when the SecuRES system was first put into use.

55 CHAPTER 4. SOLUTION

4.4 Protocol

The SecuRES protocol is a peer-to-peer communication protocol for sharing files between multiple recipients aimed at providing a solution to the problem statement as well as the requirements formalised in section 4.1. The protocol itself is a derivative of the Bitcoin protocol using many of the same mes- sages and structure. Storj has also acted as an inspiration source for file audits. SecuRES comprises transactions and blocks, just like Bitcoin, but transactions are spe- cific for sharing files and there are different types of transactions. Hashes, Merkle trees, signatures, addresses and the general message structure are im- plemented in the same way as in Bitcoin. This section describes only messages and data structures different from their Bitcoin counterparts or are not present in the bitcoin implementation. For a complete presentation of the SecuRES protocol see appendix C. For a complete presentation of the bitcoin protocol see [3].

56 4.4. PROTOCOL

File Information Messages

The following protocol messages contain information about files.

4.4.1 filedesc A file description message is a of a transaction message and it contains data about the file that is being shared. Table 4.1 shows the structure of this message.

Table 4.1. The Structure of the filedesc Message

Field Size Description Data type Comments

32 file id file_id File id. Hash of a genesis transaction. 32 current state char[32] Current hash of file 1+ redundancy var_int The redundancy setting for this file and its slices. 0 indi- cates industry standard (cur- rently 3). 8 is the chosen maximum at the moment. 32 branch_id char[32] The id of some branch that this transaction relates to. 1+ slice_count varint Number of slices that the file was split into. If the transac- tion does not involve a mod- ified file this will be 0. ? slicedescs slicedesc Slice descriptions. This will be encrypted, using the same key that encrypted the file, to prevent the network from de- termining which slices belong to which transaction.

slicedesc A slice description message is part of a file description message and is always encrypted. The message describes a particular slice of a file and its structure is shown in table 4.2.

57 CHAPTER 4. SOLUTION

Table 4.2. The structure of a slicedesc Message

Field Size Description Data type Comments

32 slice_id char[32] The sliceid (hash of the slice) being described. 32 slice merkle char[32] Hash representing the merkle root root for the crumbs that this slice consists of. 1+ slice_index varint The index of a slice within a file. Used when locally recon- structing a file on a client. 4 crumb_len uint32_t Size of a crumb. 4 slice_len uint32_t Size of the slice.

Transaction Messages

The following messages describe a transaction message and what it consists of.

4.4.2 singletx This message is what constitutes a transaction for sharing and/or modifying a file. The structure of this message is described in Table 4.3.

58 4.4. PROTOCOL

Table 4.3. The Structure of the singletx Message

Field Size Description Data type Comments

4 version uint32_t Protocol version used for this transaction 4 type txtype The type of this transaction 1 tx_in count varint The number of inputs 41+ tx_in txin[] Inputs for this transaction. Similar to those of bitcoin. 1+ tx_out count varint Number of recipients 9 tx_out txout[] A list of 1 or more transac- tion outputs for this file 8 creation_time uint64_t Creation time for this trans- action 4 lock_time uint32_t Same locktime as in the bit- coin protocol ? file_desc filedesc Description of the file that is being shared txtype This data structure specifies the type of a transaction and indicates how it should be validated in the network. The possible values are shown in Table 4.4.

59 CHAPTER 4. SOLUTION

Table 4.4. Transaction Types

Value Description Comments

0 NORMAL Normal transaction with double spending allowed. 1 TRANSFER No double spending allowed. Owner- ship of a file is transferred. This is set in the genesis transaction and de- scendant transactions must be of this type 2 SPLIT A splitting transaction which means that a new file id is generated but the transaction still points back to a pre- vious transaction. 3 JOIN A joining transaction collecting slices encrypted using different encryption keys under one previous encryption key and accompanying file id. txout Transaction outputs are included in a singletx message and details information about re- cipients for a file. Their structure is described in Table 4.5.

60 4.4. PROTOCOL

Table 4.5. The structure of a Transaction Output

Field Size Description Data type Comments

1+ pk_script varint Length of the pk_script length ? pk_script uchar[] Public key script that must be matched to be able to claim this output. 1 permissions permissions Bitfield for permissions granted to this recipient 1+ number of varint The number of ensuing en- enc_keys crypted encryption keys. ? enc_keys enc_key[] The encrypted encryption keys used to encrypt the slices and slice descriptions for a file enc_key This message is contained in a txout message and contains an encryption key used for encrypting a file slice and file description information. The structure of this message is described in table 4.6.

Table 4.6. The structure of an enc_key message

Field Size Description Data type Comments

1+ enc_key varint Length of the encrypted en- length cryption key ? enc_key uchar[] An encryption key used for encrypting slices and slice de- scriptions, encrypted using a recipient’s public key.

61 CHAPTER 4. SOLUTION

Slice Messages

The following messages concern slices and how they are handled by the protocol.

4.4.3 putslice This message stores a slice of a file at the receiving node. Its structure is shown in Table 4.7

Table 4.7. The structure of a putslice-message

Field Size Description Data type Comments

? slice slice A slice message containing the slice itself

4.4.4 slice This message contains a slice of an encrypted file. The structure is shown in Table 4.8. The message is either sent as part of a putslice message or in response to a request for a particular slice.

Table 4.8. The structure of a Slice message

Field Size Description Data type Comments

32 slice_id sliceid Id of the slice (the hash of the encrypted slice) 1 redundancy varint Required redundancy for this slice 1+ crumb_len varint Crumb length ? crumbs crumb[] Crumb messages constituting this slice

62 4.4. PROTOCOL crumb This message contains a crumb, which is the smallest part a file slice is divided into. The structure of this message is described in table 4.9. Each crumb is accompanied by an index that points to the position in the Merkle Tree where the hash of this crumb was located. Depending on the index it is possible to deduce whether or not the data belongs to an actual crumb or a hash part of the Merkle path for some crumb. Indices less than n indicates a node inside the Merkle tree and >=n indicates an actual file crumb.

Table 4.9. The structure of a crumb Message

Field Size Description Data type Comments

4 version uint32_t Protocol version that was used to create this message 32 slice_parent char[32] The slice id of the slice that this crumb belong to 4 crumb_index uint32_t Index for the crumb in the slice Merkle tree. Also rep- resents the position of the crumb within the slice used when locally reconstructing a file on a client. 4 crumb_len uint32_t Crumb length ? crumb_data char[] The crumb itself

63 CHAPTER 4. SOLUTION

4.4.5 slicesuperv

This is a slice supervision message exchanged by storage nodes that share in monitoring a particular slice on the network. This message is described in table 4.10. The purpose with this message is to inform slice holders of other nodes in the network that also store a particular slice to facilitate the slice auditing process.

Table 4.10. The structure of a slicesuperv message.

Field Size Description Data type Comments

32 slice_id slice_id Id of the slice that is to be supervised 4 redundancy uint32_t Required redundancy for this slice throughout the network ? storage loca- netaddr Current locations for the tions slice. Basically a network ad- dress

4.4.6 verifyslice

This message is used for requesting the status of a particular slice on a storage node. The response data will be a slicestat message. Apart from verifying the status of a slice the requesting node will also ensure that there is enough redundancy in the network so that the slice is not lost. If there is not enough redundancy, the node will duplicate the slice to other nodes. Verifying the slice is achieved by comparing the recalculated Merkle Root based on the response data, with what is known either from a transaction or the fact that the node already owns the slice in question. Table 4.11 shows the structure of this message.

Table 4.11. Structure of a verifyslice message

Field Size Description Data type Comments

32 slice_id char[32] Hash identifying the slice in question 4 challenge uint32_t Number of challenges that count follows 8+ challenges challenge[] The challenges themselves

64 4.5. IMPLEMENTATION challenge A challenge message consists of one crumb index and points to a bottom node in the slice Merkle tree. The bottom nodes are file crumbs that are hashed together to build the rest of the Merkle tree. The peer that receives this challenge responds with the file crumb that the index points to as well as its neighbour and their Merkle path that is necessary to recalculate the Merkle root for the slice.

4.4.7 slicestat This message is sent as a response to either a verifyslice or putslice message. It contains two or more actual crumbs as well as the corresponding Merkle Paths from the slice Merkle tree to allow for the requesting node to recalculate the Merkle root and verify that the slice is still valid. The structure of a slicestat message is shown in Table 4.12

Table 4.12. The structure of a slicestat Message

Field Size Description Data type Comments

4 crumb_count uint32_t Number of crumbs includ- ing possible hashes inside the slice merkle tree ? crumbs crumb[] Contains the crumbs or Merkle tree hashes.

4.5 Implementation

The system is structured into several subsystems performing specific tasks according to the design principle of high cohesion. Each subsystem contains its own set of worker threads and a workmanager controlling those threads. The workmanagers are, in turn, controlled by a global system manager that is able to control how to distribute and manage load in the system as a whole. Most subsystems interact by delegating asynchronous tasks to worker threads of each others workmanagers, which means that the entire implementation functions as an asyn- chronous message passing system. Each subsystem is responsible for verifying and validating the data it handles. This means, for instance, that the message-handler verifies transaction message syntax whereas the blockchain system validates transaction data. The idea behind the subsystems is that it should be possible to combine them into different types of nodes. For instance, full nodes would include everything, client nodes would include only some of the subsystems necessary for SPV, and miners would need another subset of functionality. The implementation itself borrows some of its functionality and concepts, such as hash- ing, Merkle tree construction and Bloom filters, from [6] to avoid reinventing the wheel to some extent.

65 CHAPTER 4. SOLUTION

The following subsections each describe one of the main subsystems that make up the entire system.

4.5.1 Message-handling This subsystem handles incoming messages and delegates them to appropriate subsystems, after verifying their syntactic correctness and adherence to the rules of the protocol. This system is also responsible for creating messages that should be broadcast to the rest of the network or sent in response to individual peers. This component of the system makes intensive use of protocol buffers described in the section 3.2.4, used to translate the SecuRES protocol into code.

4.5.2 Blockchain This part of the system handles transactions and blocks, thereby maintaining a local copy of the SecuRES blockchain. All incoming blocks and transactions are validated by this subsystem. For instance, the type of transaction is checked and verified to determine whether or not double spending is allowed and the required permissions for the transactions are checked against the transaction inputs. This system knows how to create public-key scripts and signature scripts that are ulti- mately used to prove ownership of transaction outputs.

4.5.3 System There are overall system monitoring processes that manage and distribute load amongst subsystems. Furthermore an expectation-handler keeps track of what information the system is ex- pecting from other nodes, triggering actions when that information finally arrives. This means that if a client node is expecting a number of slices for a file, that file is rebuilt locally when those slices arrive.

4.5.4 Client The client subsystem is used by users to initiate new transactions, upload files and to present various information about the network to the user. It is capable of encrypting and decrypting data. This is not implemented anywhere else, since the system was built for end-to-end confidentiality between senders and recipients. Consequently, clients can decrypt slice descriptions included in transactions as well as collections of crumbs (slices) that are requested from the network by extracting encryption keys from relevant transactions. Using slices and their crumbs the client is also capable of reconstructing files to create local copies. This means that the reverse is also possible, the client can encrypt files and then slice them and crumble those slices to prepare them for distribution in the network. Confidentiality is achieved by combining asymmetric encryption using RSA algorithms and symmetric encryption, using the Blowfish algorithm, in a process similar to that of digital enveloping, described in section 2.6.6. Symmetric encryption is used to encrypt files before they are sliced and crumbled. The key used in this process is also used to encrypt confidential file metadata in the transactions

66 4.5. IMPLEMENTATION themselves. Once that has been performed the client encrypts the secret key using all the recipients’ public RSA keys. The client subsystem is the only system that is capable of creating complete transactions by using public and private keys stored in a local Java keystore. By requesting transactions pertaining to both a user’s public keys and file IDs of interest, it is possible for a client to maintain updated versions of files as well as their history. Currently the implementation makes use only of RSA keys, since the original plan was to extend the SecuRES system with the capability for users to identify parties owning public keys to enable functionality similar to that of Dropbox when sharing files with other users. The integration with such a certificate management system has not been implemented.

4.5.5 Mining The mining functionality of the system supplies the same functionality as bitcoin miners, only adapted to the SecuRES environment by calling utility functions to validate and hash transactions correctly. This means that this subsystem confirms transactions and aggregate them into new blocks that are appended to the blockchain.

4.5.6 Utility There are many utility and helper classes responsible for hashing, encrypting and performing other tasks that need to be available in several different subsystems. One such utility class performs the functionality of constructing Merkle Trees and re- calculating Merkle Roots based on input data and Merkle paths. This functionality is used both in the blockchain and in the storage subsystems to verify block-membership for transactions and for slice auditing.

4.5.7 Peer The peer subsystem is responsible for managing connections to other peers in the SecuRES network and exposes a Peer-class for other subsystems to communicate directly with that particular peer. Through a Peer-Handler it is possible to broadcast messages to all peers whose connec- tions are currently active. This part of the system is equivalent to the routing functionality present in all Bitcoin nodes.

4.5.8 Storage The storage subsystem handles slices and their crumbs that arrive to the node and schedule audits of slices that should be performed regularly. Slice auditing ensures that the integrity of slices remain intact on storage nodes in the network and that sufficient slice redundancy is maintained.

4.5.9 Integration The integration layer of the application consists of DAOs and entities created from the database using JPA. The entities are translated to DTOs, messages and other data container objects before they leave the integration layer.

67 CHAPTER 4. SOLUTION

All access to the underlying data store passes through this layer and this layer is not dependent on other layers in the application, adhering to the design principles of high cohesion and low coupling. The data store itself is a relational database, in this case MySQL. Appendix D shows the resulting logic database model for SecuRES.

4.5.10 Not Implemented The system itself is not complete and has not been optimized in terms of performance. All major functionalities are in place, but they need to be integrated with each other and in some cases also further developed. Most of the code has been commented as dictated by the code conventions in section 3.2.3 with additional comments about what remains to be done. Overall, since the scope of the project is fairly large, the authors felt forced to prioritize what should be fully implemented and what had to be postponed to future work due to the short timespan of the project. What needs most work at the moment is the client subsystem and its integration into the rest of the system, since this was the last subsystem to be implemented and as such has not been designed as extensively as the other parts of the system. Nor has the mining and validation functionality been tested and integrated completely.

4.6 Prototype

The prototype does not make use of all the implemented subsystems in the implementation due to time constraints on the project. The prototype makes use of two types of nodes. Full nodes and client nodes. Client nodes currently implement the same functionality as full nodes plus the client aspects. In the future that will not be the case, as the idea behind the client is that it should be lightweight. All parts of the implementation have not been integrated in the prototype. This means that no mining processes are active on the nodes nor are transactions validated properly before propagation. Furthermore, there is no active slice auditing and RSA keys are used as addresses instead of ECC keys. What is working, however, is peer discovery and bootstrapping which means that nodes in the network connect to a root node informing them of other active nodes. The nodes then initiate handshaking procedures and the peer-to-peer network comes to life. The client nodes can share files by creating genesis transactions set out to an arbitrary number of recipients based on known public keys. The files being shared are encrypted, sliced up and finally crumbled and distributed to storage nodes in the network. Once a file has been uploaded to the network the corresponding transaction is propa- gated through the network, ultimately reaching other clients. Currently all nodes receive all transactions, since the client nodes also act as full nodes. When a client receives a transaction the user is prompted to request all slices extracted from the encrypted slice-descriptions. Upon approval, the client requests slices from the network with those IDs, registering these IDs with the global expectation handler for the system. When slices arrive the client immediately decrypts the crumbs constituting the slices and reconstructs the file into a local copy.

68 4.6. PROTOTYPE

A user that runs a client node is presented with a basic GUI that enables the user to see all currently known transactions and shared files available locally, manage private and public keys in a key store, and download file slices belonging to a transaction, as well as share new files through genesis transactions.

4.6.1 Demo For demonstration of the prototype there are four running nodes: two client nodes and two full server nodes without the client GUI. The network is simulated on one computer using different ports, of a localhost, for each node. First, a root server is brought online, followed by the rest of the nodes that communicate with this node in order to discover other nodes in the peer-to-peer network. Once the nodes have performed all handshaking procedures, each client is presented with a GUI allowing them to share files with each other. The key store for each client has been pre-populated with a set of public and private keys for use when sharing files. Client 1 will share a file to Client 2 by creating a genesis transaction for some ran- domly selected file. This file is sliced up and distributed to the network followed by the corresponding transaction. Once the transaction reaches Client 2 the user at that node selects the transaction and chooses to download slices belonging to that transaction. In that process the file meta data in the transaction is decrypted on the client. When the slices arrive for that transaction, the decrypted file meta data is used to locally reconstruct the file. When the file has been downloaded, it is compared to the original file verifying that the file has indeed been shared in its entirety from Client 1 to Client 2, thereby completing the demonstration. As can be seen in appendix A there was also a demo scenario planned for ownership transmission transactions and further adding to an existing chain of related transactions but due to time constraints this was not deemed sufficiently implemented to properly demon- strate such functionality.

69

Chapter 5

Solution Evaluation

Previous chapters have discussed the process and reasoning behind designing a file-sharing solution to fulfil the goals mentioned in section 1.4 as well as implementing a prototype of said solution. This chapter will evaluate that prototype and design in terms of performance, security and threats as well as requirements fulfilment.

5.1 Requirements Fulfilment

This section discusses whether or not the solution achieved the requirements specified in section 4.1. The evaluation will cover both the implementation and the protocol. The enumeration directly corresponds to that of section 4.1.

5.1.1 Functional Requirements Fulfilment 1. The SecuRES protocol allows for multiple recipients by adding more outputs to a transaction which is what was required.

2. Branching is possible to implement since the protocol allows for branch identifiers in file descriptions. However the current implementation of a client node has not taken this into account.

3. Permissions are explicitly set for each recipient of a file, in the protocol. However the verification process regarding this has not been fully implemented.

4. The protocol allows for different types of file sharing even though the implementation does not fully implement all transaction types yet.

5. It is possible to upload files to the network and a client can request the slices thereof.

6. Slice auditing has been designed into the protocol however the current implementation has not activated this subsystem.

7. The protocol is capable of specifying a redundancy setting in the transaction however the implementation does not respect this setting in its current state.

71 CHAPTER 5. SOLUTION EVALUATION

5.1.2 Security Requirements Fulfilment 1. Confidentiality is ensured in the SecuRES system. All files are encrypted and dis- tributed across the network without reference to a particular transaction. The en- cryption is implemented using secret-key and public-key cryptography in such a way that only the recipients are capable of accessing the data. The confidential file information in transactions is encrypted using a form of digital enveloping to ensure end-to-end encryption amongst recipients for a file. 2. Since all transactions are signed using the creator’s private key everyone else can verify that this person was the originator of the transaction as well as that data has not been corrupted in transit. 3. Since all transactions are ultimately mined into the blockchain provided that they are valid, non-repudiation is guaranteed as everyone in the network has access to and can validate the entire blockchain. 4. Trust in a third party is not necessary for using the SecuRES system as it is a decentralized consensus system similar to Bitcoin.

5.2 Security

Because of the lifetime of information in the blockchain and the inability to reliably remove resources from the storage network it is important that the encryption algorithms used are suitably robust and can withstand attack for enough time to make the information useless by the time it is decrypted. As anybody can access encrypted versions of file-slices, even though they would not know which file it belongs to, there is the possibility of attackers amassing many slices in the hope of being able to decrypt them with superior computing power or cracking algorithms in the future. For this reason it is not possible for users to reliably recall shared slices and update the encryption algorithm which means that the initial encryption method must be adequately future-proof. The SecuRES prototype currently uses 256-bit symmetrical Blowfish algorithm for en- crypting files which, having 2256 possible combinations would take a fictional supercomputer checking a billion billion (1018) keys a second

2256 ≈ 5.8 ∗ 1058 seconds 2 ∗ 1018 ≈ 1.8 ∗ 1051 years on average to crack using the simplest brute-force method. This is magnitudes longer than the estimated current 13.82 ∗ 109 year age of the universe, [25]. This only guarantees adequate protection from brute-force attacks and more research should be done in order to properly secure the data before production due to the possibly long lifetime of files in the system. It should be noted that blowfish was used for convenience and should almost certainly not be used in a proper release due in part to its inability to efficiently encrypt larger files, [19]. With this in mind the SecuRES solution proposal should be able to fulfil the original design goals put forward. The system is able to handle point A to B file-sharing operations by only making them accessible to the public keys which have been specified.

72 5.2. SECURITY

It is also able to ensure non-repudiation of transactions by use of the blockchain in combination with the authenticity granted by the public keys used. As for file data integrity, the decentralized storage network solution makes it so that it is possible to verify that a file has not been deleted or illegally modified through the use of file audits in the shape of Merkle-challenges. By maintaining a redundancy for each slice in the network the impact of individual malicious storage nodes should be minimal and have a reduced effect. The network itself should be capable of automated recovery when a malicious node tampers with a slice by uploading the slice to another, most likely, honest node thanks to the information available in the network. In comparison to existing file storage solutions there is a risk of third party service providers tampering with the files they provide storage for. The means of security from these providers is then the monetary loss and possible legal action depending on local legislature. As SecuRES storage attempts to maintain redundancy of each part of the files on the network on unaffiliated nodes while keeping these parts disconnected from the public blockchain one is able to diffuse the risk taken. It can be noted that the actual storage of crumbs can be handled on whichever medium is desired as long as the challenge audits can be passed in time. This leaves the non-reliance on trusted third parties for operations which holds true for all parts. One possible addition to the SecuRES-system is using public certificates, issued by Certificate Authorities, to verify who owns what public key. This would however incur third party trust but it is not vital to the system itself but rather a bonus service facilitating the use of SecuRES.

5.2.1 Potential Security Threats This section addresses and assesses the severity of some security issues related to the Se- cuRES design which are deemed either more likely to threaten the proposed system solution or which have potential to do damage to the network. Suggestions to minimize risk are made where possible.

Sybil attacks It would be possible for malicious parties to set up nodes in the network, pretending to be honest while conducting unsavoury activities. If malicious parties set up and connect many nodes to the SecuRES network the chances of clients and legitimate servers connecting to compromised ones increase. Connecting to a single malicious server should not pose a big problem as all information from a node is validated and compared with the rest of the network. However, when a large part of the connected nodes are compromised they can act to isolate honest nodes and control what data they receive. This way it is possible to incapacitate single nodes by surrounding them. Also, if the majority of the nodes in the network have been compromised it is possible for them to decide on new consensus rules for validation and thus dictate what data should circulate in the network and how it should be verified. The honest nodes will not have the capacity to override the malicious nodes and clients will connect to them and download false data. As long as the clients themselves have not been compromised they will most likely invalidate data supplied from dishonest nodes and what will happen instead is that the

73 CHAPTER 5. SOLUTION EVALUATION network is severely incapacitated. Steps that can be taken in order to minimize the risk of malicious nodes entering the network:

• Each node will store a connection pool of significant size, chosen randomly and with a limited amount of connections per /16 subnet as Bitcoin does [4].

• Nodes can bootstrap using several different seeds in case one is compromised and only responds with malicious addresses.

• It should also be possible to maintain a node reputation service. If many nodes agree that some node is not honest its reputation should be affected. Thus if malicious nodes enter the network its reputation should clearly state that there is a risk that it is not trustworthy. However, an attacker can circumvent this by acting as an honest node and then turn when its reputation is high. As soon as that happens the reputation will, on the other hand, plummet. Reputation should only be a help in preventing malicious nodes from getting a hold in the network and not acts as an ultimate deciding factor.

Denial of Service attacks

Nodes may be rendered unable to handle new connections by having malicious parties continuously send messages which elicit some kind of action at a rate higher than can be handled. Individual nodes may be kept busy quite easily by this kind of attack. Because of the distributed and decentralized nature of the network, however, it would be difficult to take down the network as a whole and as no single node is vital to the overall stability of the network any node which finds itself under attack may just shut down and bootstrap again from a new address. Many users would also have help from their internet service providers which may be able to sense and cut off abnormal traffic.

Packet sniffing

By using packet sniffing software to monitor user’s internet traffic malicious parties may be able to connect transactions and public keys to people. This, however, requires the attacker to already have access to that network.

51% attacks

As the public ledger system is maintained by peers and the common consensus is decided by the network an attacker attaining a majority share of the computing power of the network would be able to significantly disrupt the system. This would allow them to prevent confir- mation of file-transactions and the mining of new blocks in the main chain. They may also be able to transfer ownership of a file multiple times that they should only be able to share once. This would, however require a significant investment by the attacker for relatively small personal gain as any value of having duplicate ownership of a file such as a housing contract would likely be invalidated by third parties as both transactions are plainly visible in the file-chain.

74 5.2. SECURITY

Restoring deleted private keys Even if files have been deleted from ones hard-drive another user with access to the drive could restore deleted files from memory which has not been overwritten. This means that malicious users may be able to recover private keys from old drives which is a problem as permissions for files are mined into the blockchain linked to a corresponding public key meaning that the user will have access to everything connected to this key. To minimize the risk of this happening users who are getting rid of a drive may do the following: • Encrypt and password protect the keys. • Use shredding software to overwrite the memory in which the keys were stored. • Physically dispose of drive if documents are particularly sensitive.

Sharing to malicious party There is a possibility that a user on the network accidentally shares a file to a malicious party just like someone could pay bitcoins to the wrong address. It is possible for the malicious party to pose as the valid addressee and trick the user into transferring the file to the wrong address by running some form of scam. If this malicious party is given admin rights for the file he or she has the capacity to block access for all other users and solely gain control of the file. For valuable files there is a risk for potential loss. Should the user discover the mistake before the file transaction is committed to the blockchain no harm is done as it is possible to revoke the transaction. However if the transaction is verified and mined into the blockchain there is little to be done unless the user issues a new transaction revoking all access rights for the malicious party before they have time to react. The only way to make sure this never happens is to verify that the recipient addresses are valid. A possible solution is to ask a third party service such as a Certificate Authority or IDMS to verify that the identity behind the public key really is the one you want to share the data with. However, this requires trusting a third party service provider. Another solution is to ask the network whether or not it considers the recipient to be honest. The network could check if the user has had interaction with this address before and if that turned out okay. The system can not verify who is behind some public address but it can answer questions regarding suspicious behaviour. Say that the malicious party has misbehaved in other situations then that should be reflected in the public ledger.

Targeted attacks on slices Because storage nodes will respond to anybody who requests information about slice loca- tions it could be possible for malicious parties to make targeted attacks on all those nodes who claim that they are able to return a certain slice. If those nodes are brought down this would then make it nigh impossible for a legitimate user to download a complete copy of the file. However, since there is no publicly available connection between slices and files or users it would be difficult for attackers to target a file or user specifically and as such there would not be much to gain from such an attack other than trying to disrupt use of the system.

75 CHAPTER 5. SOLUTION EVALUATION

One possible way of reducing this risk would be for storage nodes to only respond to queries which provide proof of ownership of the slice that is being asked for. This does, however, open one up to replay attacks and makes it so that proof of storage audits can only be performed by file owners.

5.3 Performance and Scalability

As was noted under delimitations there was no time for proper performance testing and optimization of the prototype and as such any direct performance evaluation based on it would be non-representative. The design proposed should behave quite similar to the Bitcoin network with the added load of handling files, their encryption and redundancy checks. These additional operations would, however, in the current implementation mostly affect the client generating a new genesis transaction and the storage nodes. In terms of scaling, each SecuRES prototype node can choose a desired upper threshold of how many active connections should be handled at a time rather than handling everybody like a centralized solution would do. The problem of blockchain growth mentioned in section 1.4.4 is shared by most existing public ledger-based systems. Data on the main Bitcoin blockchain as of June 2015 provided by Blockchain.info, [8], shows that over the course of a year since June 2014 the chain has almost doubled in storage size from about 18 GB to just over 34.5 GB which when compared to the average cost of hard drive storage per GB in 2014, [35], translates to about 0.5 USD in increased storage costs per node adjusted for inflation. This is in excess of the 0.33 USD decrease in cost for the same amount of hard drive storage space from 2013 to 2014, [35], and does not factor in increased network and performance costs. Comparing the Bitcoin protocol to that of the proposed SecuRES system suggests sim- ilar sized blocks for the same amount of though the latter may be able to handle Blockchain-growth better due to its natural connection to large amounts of storage space. In this manner the scalability is not unsustainable but could require increased incentives for full-node users to take the added cost of storage.

76 Chapter 6

Conclusions

From the findings presented in this report it would seem that public ledger technologies can indeed be used to create feasible decentralized resource-sharing solutions. However, there are some concerns compared to substitute solutions, apart from the benefits, which will be discussed in the following sections. This report have shown that by using the SecuRES protocol it is possible to share files in using a decentralized consensus mechanism such as a blockchain, distributed across a peer-to-peer network. The claims are further supported by the prototype that has been developed proving that encrypted point-to-point file sharing over the peer-to-peer SecuRES network is possible on a small scale.

6.1 Evaluation of SecuRES

SecuRES is an as of yet not fully tested system with a flexible protocol which uses symmet- ric and asymmetric encryption to possibly ensure end-to-end confidentiality and integrity similar to secure email.

By using a decentralized consensus mechanism in the shape of a blockchain, indepen- dently validated by nodes in a peer-to-peer network it is possible to ensure non-repudiation and build trust. This mitigates the need for trust in third parties such as Dropbox, for supplying a service. However, for this to work in the long run an incentive mechanism needs to be introduced into the SecuRES network, both for mining and for storing slices. Without an incentive the security and availability of the system fails. Possible solutions to this issue has been proposed by others, for instance it is possible to adapt a solution similar to that of Storj, 2.2.

The potential for a platform such as SecuRES is much more than just sharing files. Through the use of the TRANSFER transaction it is possible to transfer value similar to bitcoin since this type of transaction does not allow for double spending. It should also be possible to use the platform as a collaborative tool by fully implement- ing branching and a modification history as in Git where it is possible to move back and forth between file revisions. This concept should be possible using the SecuRES protocol. However, much remains to be done on the client side to implement this. It should be in- vestigated whether or not tools like Git can be directly integrated into the client to avoid

77 CHAPTER 6. CONCLUSIONS reinventing the wheel. One drawback of the SecuRES file sharing solution is the fact that whenever a file is shared in the network it is difficult to “un-share” it. A user has no control over what happens to a slice once it reaches the network. As suggested by Storj, 2.2, there is the possibility of a “soft delete” by removing the incentive for storing a slice. If so it will probably disappear from the network over time. However, the trace of that file will still be available in transactions pertaining to that file. It is also possible for recipients of the file, other than the originator, to provide incentive to storage nodes to continue storing the slice.

Another issue with a file sharing platform using a public ledger is that if a client node wishes to perform a complete verification of the blockchain to ensure its validity it has to download substantial amounts of data. This means that even if the client is only sharing one small file it would be necessary to download a lot of data for the verification process. Data that is really not of any interest to the client. The solutions, as has been suggested in bitcoin, is SPV nodes however that means trusting other nodes in the network which compromises the claim that no trust in a third party is necessary, however small that trust need be.

6.2 Evaluation of Methodology

Though the implementation phase of the project was to follow the iterative and agile scrum methodology much of the work ended up being sequential like that of the waterfall method- ology. This is likely to have been caused by insufficiently specified minimum requirements, few delimitations, overly broad modularization and insufficient deadlines which lead to tasks growing larger than what one can possible claim to finish within the confines of a one week long sprint. Instead of developing the system incrementally and have a finished product by the end of each sprint, it was developed on a broad front. Since a rigorous design of the system had been performed an unconscious attempt was made to develop the entire system in one go. This was not reflected upon until the very end of the project. This lead to tasks being drawn out through many iterations without the possibility to properly gauge progress. Because of this, many of the benefits of the intended project methodology became more of an administrative burden than a help. The result of this was that the final implementation can only be considered half finished. All the functionality in the system has not been tested properly and fully integrated. There was simply no time to test everything. In the beginning of the project there was talk of performance testing the prototype and evaluating it in a live setting. However, since there was no suitably finished prototype until the very end there was no time for these kinds of representative tests either. If those tests had been performed there would have been more objective data, in the results section, to analyse which would have given more credence to the solution. As it stands now the solution is more of a theoretical proof-of-concept even though the prototype is working. For future projects the most important lesson to learn would be to clearly specify and modularize minimum requirements and handle these first so as to avoid feature creep and the temptation of trying to perfect parts which may not have the highest priority in early iterations. Also focus should be on developing an incremental product and not attempt at implementing everything from the beginning.

78 6.3. FUTURE WORK

6.3 Future work

Future work in this area could include investigating the possibility of connecting SecuRES to a digital currency which can be used to add incentive for miners, implement transaction fees and add automated payment of legitimate storage nodes which pass slice audits. Inspiration should probably be drawn from Storj. Currently the only real incentive for using the SecuRES system is the capabilities and potential of the system itself. There is no financial gain. In order to push for widespread adoption of the system it would be pertinent to per- form proper load and scalability testing to ensure that the system can handle that level of operation. Part of this should also be to explore the possibility of pruning, dividing and otherwise reducing the impact of a growing blockchain. The impact of the file storage solution in the network should also be analysed further to determine the effects on available bandwidth and storage space. One possible aspect to explore is if it is feasible to implement the concept of branching fully or if that will make files explode in size on the network as all possible modifications of a file are uploaded. Overall the possibility for compressing slices and other data handled by the protocol should be investigated. At the moment slices are uploaded to the network without any form of compression. The SecuRES protocol itself should also be analysed to determine the impact of the modifications compared to bitcoin. For instance what will the average size for a SecuRES transaction be compared to bitcoins average size of 250 bytes, [1, p. 160], and what will the resulting impact be on bandwidth and storage space? Perhaps it should also be analysed to what extent files will be available on the network once they have been uploaded, and what the feasibility of having those files remain available over time, is? Because of time constraints the implementation was done using familiar tools to speed up development. If the work is to be further developed it is recommended to investigate the use of Erlang for parts of the implementation such as message passing and synchro- nisation of blockchain data across nodes. The reason for this is that Erlang scales nicely in distributed systems and is capable of handling distributed databases very well. It also handles concurrency better than Java with smaller overhead and the capability of managing thousands of threads, [21]. Finally, if one were to continue developing the prototype the Blowfish algorithm for file encryption should be replaced with an algorithm better suited to the application, like Twofish. It would also be prudent to increase the RSA key-sizes used from the current size of 2048 bits to lengthen the time that they will remain secure.

79

Bibliography

[1] Andreas M. Antonopoulos. Mastering Bitcoin. Vol. 2013. O’Reilly Media, Inc. isbn: 978-1-4493-7403-7. url: http://chimera.labs.oreilly.com/books/ 1234000001802/index.html. [2] Apache Software Foundation. Maven – Welcome to Apache Maven. url: https: //maven.apache.org/ (visited on 06/17/2015). [3] Bitcoin Community. Protocol documentation - Bitcoin. url: https://en. bitcoin.it/wiki/Protocol_specification (visited on 04/09/2015). [4] Bitcoin Community. Weaknesses - Bitcoin. Bitcoin wiki. Feb. 2015. url: https: //en.bitcoin.it/wiki/Weaknesses#Sybil_attack (visited on 04/20/2015). [5] Bitcoin.it. Secp256k1 - Bitcoin Wiki. url: https://en.bitcoin.it/wiki/ Secp256k1 (visited on 07/11/2015). [6] bitcoinj. bitcoinj. url: https://bitcoinj.github.io/ (visited on 06/23/2015). [7] Blockchain Luxembourg S.A.R.L. Bitcoin Blockchain storlek. url: https : //blockchain.info/sv/charts/blocks-size (visited on 06/23/2015). [8] Blockchain.info. Bitcoin Blockchain Size. url: https://blockchain.info/ charts/blocks- size?showDataPoints=true%5C×pan=1year%5C& daysAverageString=7%5C&scale=0%5C&address= (visited on 06/05/2015). [9] Lars-Christer Böiers. Diskret matematik. Lund: Studentlitteratur, 2003. isbn: 91-44-03102-5 978-91-44-03102-6. [10] Jerry Brito and Andrea Castillo. Bitcoin: A Primer for Policymakers. Mer- catus Center at George Mason University, Dec. 19, 2013. 48 pp. [11] J. Callas et al. RFC 4880 - OpenPGP Message Format. url: http://tools. ietf.org/html/rfc4880 (visited on 04/24/2015). [12] Scott Chacon and Ben Straub. Git - Basic Branching and Merging. url: http://git-scm.com/book/en/v2/Git-Branching-Basic-Branching- and-Merging (visited on 04/28/2015). [13] Scott Chacon and Ben Straub. Git - Git Basics. url: http://git-scm.com/ book/en/v2/Getting-Started-Git-Basics (visited on 04/24/2015). [14] Scott Chacon and Ben Straub. Git - Packfiles. url: http://www.git-scm. com/book/en/v2/Git-Internals-Packfiles (visited on 04/27/2015).

81 BIBLIOGRAPHY

[15] Change Vision. UML and Modeling Tools | Astah.net. url: http://astah. net/ (visited on 06/17/2015). [16] Thomas M. Connolly and Carolyn E. Begg. Database systems: a practical approach to design, implementation, and management. 4th ed. International computer science series. Harlow, Essex, England ; New York: Addison-Wesley, 2005. 1374 pp. isbn: 0-321-21025-5. [17] Dropbox. Dropbox for Business security | A Dropbox whitepaper - Security_Whitepaper.pdf. url: https://www.dropbox.com/static/business/resources/Security_ Whitepaper.pdf (visited on 04/27/2015). [18] Eclipse Foundation. EclipseLink. url: http://www.eclipse.org/eclipselink/ (visited on 06/17/2015). [19] GnuPG. GnuPG Frequently Asked Questions. url: https://www.gnupg.org/ faq/gnupg-faq.html#define_fish (visited on 06/23/2015). [20] Google Developers. Developer Guide | Protocol Buffers | Google Develop- ers. url: https://developers.google.com/protocol- buffers/docs/ overview (visited on 06/17/2015). [21] Kristof Kovacs. Integrating Java and Erlang. url: http://www.theserverside. com/news/1363829/Integrating-Java-and-Erlang (visited on 04/10/2015). [22] Craig Larman. Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development. 3rd ed. Upper Saddle River, N.J: Prentice Hall PTR, c2005. 702 pp. isbn: 0-13-148906-2. [23] LaTeX – A document preparation system. url: http://www.latex-project. org/ (visited on 06/17/2015). [24] Jay McGregor. The Top 5 Most Brutal Cyber Attacks Of 2014 So Far. Forbes. url: http://www.forbes.com/sites/jaymcgregor/2014/07/28/the-top- 5-most-brutal-cyber-attacks-of-2014-so-far/ (visited on 07/20/2015). [25] Nasa. WMAP- Age of the Universe. url: http://map.gsfc.nasa.gov/ universe/uni_age.html (visited on 06/23/2015). [26] OpenPGP Alliance. OpenPGP.org - The OpenPGP Alliance Home Page. url: http://www.openpgp.org/ (visited on 04/28/2015). [27] Oracle. How to Write Doc Comments for the Javadoc Tool. url: http://www. oracle.com/technetwork/articles/java/index-137868.html (visited on 06/16/2015). [28] Oracle. Java Persistence API. url: http://www.oracle.com/technetwork/ java/javaee/tech/persistence-jsp-140049.html (visited on 06/17/2015). [29] Oracle. MySQL :: MySQL Workbench. url: https : / / www . mysql . com / products/workbench/ (visited on 06/17/2015). [30] Oracle. MySQL :: The world’s most popular open source database. url: https: //www.mysql.com/ (visited on 06/17/2015).

82 BIBLIOGRAPHY

[31] Oracle. Welcome to NetBeans. url: https : / / netbeans . org/ (visited on 06/17/2015). [32] Redmine. Overview - Redmine. url: http://www.redmine.org/ (visited on 06/17/2015). [33] Matthew Sparkes. “The coming digital anarchy - Telegraph”. In: The Tele- graph (June 2014). url: http://www.telegraph.co.uk/technology/news/ 10881213/The-coming-digital-anarchy.html (visited on 02/23/2015). [34] William Stallings and Lawrie Brown. Computer security: principles and prac- tice. 2nd ed. Boston: Pearson, 2012. 788 pp. isbn: 978-0-13-277506-9 0-13- 277506-9. [35] Statistic Brain. Average Cost of Hard Drive Storage | Statistic Brain. Nov. 11, 2014. url: http://www.statisticbrain.com/average- cost- of- hard- drive-storage/ (visited on 06/05/2015). [36] Storj.io. DriveShare - Powered by Storj. url: http : / / driveshare . org/ (visited on 04/22/2015). [37] Storj.io. MetaDisk - Powered by Storj. url: http://metadisk.org/ (visited on 04/22/2015). [38] Storj.io. Storj - The Future of Cloud Storage. url: http://storj.io/ (visited on 04/15/2015). [39] Dena Bain Taylor. A Brief Guide To Writing A Literature Review | Tay- lor | Writing in the Health Sciences: a comprehensive guide. url: http : //hswriting.library.utoronto.ca/index.php/hswriting/article/ view/3092/1239 (visited on 06/18/2015). [40] Shawn Wilkinson. Storj A Peer-to-Peer Cloud Storage Network. In collab. with Tome Boshevski, Josh Brandoff, and Vitalik Buterin. url: http://storj. io/storj.pdf (visited on 04/17/2015).

83

Appendix A

Planned demo scenarios

Scenario 1. Collaborative sharing

1.1 Sending new file Preconditions: • Two or more SecuRES servers running bootstrapped with knowledge of each other. • PC running SecuRES client. • Dummy file exists on PC #1 • Two RSA public keys Actions:

1. User #1 accesses SecuRES client. 2. User #1 uploads the file and chooses the following transaction properties: • Sender: User #1’s private key signature (Can be auto-generated by client) and public key – Permissions: Full sharing (double spending allowed) • Recipient #1: User #2’s Public Key – Permissions: Full sharing (double spending allowed) • Recipient #2: User #3’s Public Key – Permissions: No sharing 3. Transaction properties are checked and validated. 4. File is encrypted using an auto-generated secret key. 5. File is uploaded to open storage location. 6. For each recipient and sender the secret key and storage URI are encrypted using their public key and added to transaction details. 7. Client connects to server #1 and sends transaction details. 1. Server initiates genesis transaction from provided details.

2. Transaction details are checked and validated.

85 APPENDIX A. PLANNED DEMO SCENARIOS

3. Server creates file-chain from transaction 4. Transaction is added to pool of transactions being mined. 5. Server broadcasts transaction to known peers. 8. Client is presented with private key if one was auto-generated 9. A view on the client tracks the confirmation state of the transaction.

Postconditions: (When transaction has been mined into at least one block) User #1, 2 and 3 can all access the resource through any client-server combination. Users #1 and 2 can share the file-chain as many times as they want. User #3 cannot share the resource further in the same file-chain.

1.2 Recieving file Preconditions: • Two or more SecuRES servers running bootstrapped with knowledge of at least one server from scenario 1.1. • PC running SecuRES client. • RSA key-pair any user from scenario 1.1. Actions:

1. User accesses SecuRES client and provides public key. 2. Client connects to server and inquires about transactions pertaining to the provided public key. 1. Server does a linear search through mainchain and retrieves metadata for trans- actions to which public key is recipient. 3. Client requests access to file-chain pointed to by transaction from server to get en- crypted storage location URI and encrypted secret key for file. 4. Secret key and URI are decrypted using private key. 5. File is retrieved from URI. 6. Retrieved file is decrypted using secret key.

Postconditions: Same as on start

1.3 Share existing file-chain Preconditions: • Two or more SecuRES servers running bootstrapped with knowledge of each other. • PC running SecuRES client. • User #1’s RSA key-pair • User #2’s RSA public key Actions: 1. User #1 accesses SecuRES client. 2. User #1 chooses the following transaction properties: - File-chain: Identifier of file-chain to be shared - Sender: User #1’s private key signature. - Recipient #1: User #2’s Public Key - Permissions: Full sharing

86 3. Transaction properties are checked and validated. 3. Client supplies user #1’s public key to server and gets encrypted storage location URI and encrypted secret key for file. 4. URI and secret key are decrypted using private key. 5. For each recipient the secret key and storage URI are encrypted using their public key and added to transaction details. 6. Client connects to server #1 and sends transaction details. 1. Server initiates transaction from provided details. 2. Transaction details are checked and validated. 3. Transaction is added to pool of transactions being mined. 4. Server broadcasts transaction to known peers. 8. A view on the client tracks the confirmation state of the transaction. Postconditions: (When transaction has been mined into at least one block) User #1 and 3 can access and share the resource through any client-server combination.

Scenario 2. No double-spending ownership transmission

2.1 Sending new file Preconditions: • Two or more SecuRES servers running bootstrapped with knowledge of each other. • PC running SecuRES client. • Dummy file exists on PC #1 • User #2’s RSA public key Actions: 1. User #1 accesses SecuRES client. 2. User #1 uploads the file and chooses the following transaction properties: • Sender: User #1’s private key signature (Can be auto-generated by client) and public key • Recipient #1: User #2’s Public Key – Permissions: Single share (No double spending allowed) – Encryption: None 3. Transaction properties are checked and validated. 4. File is encrypted using a generated secret key and uploaded to open storage location. 5. Storage URI and secret key encrypted with public key are added to transaction details. 6. Client connects to server #1 and sends transaction details. 1. Server initiates transaction from provided details.

2. Transaction details are checked and validated. 3. Transaction is added to pool of transactions being mined. 4. Server broadcasts transaction to known peers. 7. Client is presented with private key if one was auto-generated 8. A view on the client tracks the confirmation state of the transaction. Postconditions: (When transaction has been mined into at least one block) Any user access the resource through any client-server combination. Everybody can see that user #2 was given ownership of resource by user #1.

87 APPENDIX A. PLANNED DEMO SCENARIOS

2.2 Recieving file Preconditions: • Two or more SecuRES servers running bootstrapped with knowledge of at least one server from scenario 1.1.

• PC running SecuRES client.

• RSA key-pair of any user from scenario 1.1. Actions:

1. User accesses SecuRES client and provides public key. 2. Client connects to server and inquires about transactions pertaining to the provided public key.

1. Server does a linear search through mainchain and retrieves metadata for trans- actions to which public key is recipient.

3. Client requests access to file-chain pointed to by transaction from server to get en- crypted storage location URI and encrypted secret key for file. 4. Secret key and URI are decrypted using private key. 5. File is retrieved from URI. 6. Retrieved file is decrypted using secret key.

Postconditions: Same as at start.

2.3 Share existing file-chain Preconditions: • Two or more SecuRES servers running bootstrapped with knowledge of each other.

• PC running SecuRES client.

• User #1’s RSA key-pair

• User #2’s RSA public key Actions: 1. User #1 accesses SecuRES client. 2. User #1 chooses the following transaction properties: - File-chain: Identifier of file-chain to be shared - Sender: User #1’s private key signature. - Recipient #1: User #2’s Public Key 3. Transaction properties are checked and validated. 6. Client connects to server #1 and sends transaction details. 1. Server initiates transaction from provided details. 2. Transaction details are checked and validated. 3. Transaction is added to pool of transactions being mined. 4. Server broadcasts transaction to known peers. 7. Client is presented with private key if one was auto-generated 8. A view on the client tracks the confirmation state of the transaction. Postconditions: (When transaction has been mined into at least one block) Any user access the resource through any client-server combination. Everybody can see that user #2 was given ownership of resource by user #1.

88 A.1. BITCOIN

A.1 Bitcoin

A.1.1 Fundamental Concepts address A bitcoin address is what transactions are set out to. It can be repre- sented as a string of letters and is the equivalence of an account number. This is the hash of a public key. bitcoin Name of the currency unit, network and software. block Consists of a header and a body. The header contains a timestamp of cre- ation and a fingerprint identifying the preceding block in the blockchain. The body is a grouping of transactions that was verified during the min- ing of this block. The hash of the header is a proof of work that the mining effort has actually been performed. Valid blocks are appended to the main blockchain by general network consensus. blockchain List of blocks that all link back to its predecessor until the static genesis block. confirmation Once a transaction has been included in a block it has one confirmation. When other blocks are mined to continue this chain the transaction has one extra confirmation for each successor block. blockheight The number of blocks from the genesis block until a particular block. blockdepth The number of blocks from the latest block in the main blockchain until a particular block. difficulty Network wide setting for how much work is necessary to produce a proof of work. This is regulated by the difficulty target that specifies what a new blockheader need to hash below. hash Digital fingerprint of some binary input. genesis block The first block in the blockchain. miner A node in the network that works to find proof of work for new blocks through a process of repeated hashing. network The peer-to-peer network that propagates all transactions and blocks to the peers. Proof-Of– Data that requires significant computation to find. For bitcoin this means Work finding a numeric solution to the SHA256 algorithm that meets a network wide difficulty target. public key The public key of a keypair. Such a key is hashed to provide a bitcoin address. private key The private key of a keypair. Such a key is used to sign transactions to prove that the transaction originated from the owner of this key. This key is to be kept secret at all costs, if not others can claim ownership of funds belonging to this key. transaction A transfer of bitcoins from one address to another. They are collected by miners and included into blocks to finally be logged in the blockchain.

89

Appendix B

Bitcoin Appendices

B.1 Transaction Verification

This is taken directly from [1, p.178] 1. The transaction’s syntax and data structure must be correct. 2. Neither lists of inputs or outputs are empty. 3. The transaction size in bytes is less than MAX_BLOCK_SIZE. 4. Each output value, as well as the total, must be within the allowed range of values (less than 21m coins, more than 0). 5. None of the inputs have hash=0, N = −1 (coinbase transactions should not be relayed). 6. nLockTime is less than or equal to INT_MAX. 7. The transaction size in bytes is greater than or equal to 100. 8. The number of signature operations contained in the transaction is less than the signature operation limit. 9. The unlocking script (scriptSig) can only push numbers on the stack, and the lock- ing script (scriptPubkey) must match isStandard forms (this rejects non-standard transactions). 10. A matching transaction in the pool, or in a block in the main branch, must exist.

91 APPENDIX B. BITCOIN APPENDICES

11. For each input, if the referenced output exists in any other transaction in the pool, the transaction must be rejected.

12. For each input, look in the main branch and the transaction pool to find the referenced output transaction. If the output transaction is missing for any input, this will be an orphan transaction. Add to the orphan transactions pool, if a matching transaction is not already in the pool.

13. For each input, if the referenced output transaction is a coinbase output, it must have at least COINBASE_MATURITY (100) confirmations.

14. For each input, the referenced output must exist and cannot already be spent.

15. Using the referenced output transactions to get input values, check that each input value, as well as the sum, are in the allowed range of values (less than 21 million coins, more than 0).

16. Reject if the sum of input values is less than sum of output values.

17. Reject if transaction fee would be too low to get into an empty block.

18. The unlocking scripts for each input must validate against the corresponding output locking scripts.

92 Appendix C

SecuRES Protocol

This section presents the proposed peer-to-peer communication protocol for the SecuRES resource sharing system. Due to the similarities of the systems this protocol follows the one used by Bitcoin[3] where ever it was deemed suitable. This was done to make it more easily understandable for people with prior experience with public ledger technologies and to keep the block-mining process as similar as possible to potentially be able to connect crypto-currency networks to SecuRES for intents such as possible transaction fees or mining rewards. The section first define standards used in the protocol followed by common data struc- tures and finally the messages that constitute the protocol.

C.1 Standards and Concepts

Hashes Hashes are usually computed twice using SHA-256, so called double hashes.

Merkle Trees Binary trees of hashes. The nodes in the tree are computed using double SHA-256 hashes.

Signatures Elliptic Curve Digital Signature Algorithm (ECDSA) is used to sign transactions. More specifically the secp256k1 curve is used.

Transaction Verification This involves integrity checking as well as permissions checking to ensure that this transac- tion doesn’t violate the permissions given to the user in the previous transaction.

File Slice When a file is uploaded to the network it is automatically split into smaller parts, slices, that are distributed across the network in an encrypted form.

93 APPENDIX C. SECURES PROTOCOL

File Crumb

When a client uploads a file to the network and splits the file into slices, each slice is further divided into crumbs which are encrypted using a secret key known only to the recipients of the file. These crumbs are used to build a Merkle Tree for each slice used when verifying the health of these slices across the network.

Address

Addresses will be ECDSA public keys. This differs from Bitcoin where the hash of these keys are used instead.

Node Mempool

The unconftxs contains transactions that haven’t yet been mined into the main chain.

Relay set

Data that should be relayed to other nodes.

Permissions

Permissions can be set on a per file and per user basis. Once a permission has been set for a particular file and user it cannot be revoked for the current state of the file. Whenever a file is modified it is possible to revoke permissions by encrypting the modi- fied slices and distributing those to only the relevant recipients by using splitting described further down. Permissions will be verified by checking the input for the transaction and assert that those transactions grant the current user permission to achieve what is attempted.

Branching

Each transaction will contain a timestamp and a branchid to which this transaction should belong. Branching is handled completely at the client side when one wishes to visualize the history of the file and its transactions in the network. If the transaction contains slice descriptions this means that the file has been modified and the client should consider this as a modification to this particular branch. Branch ids will never be validated by the network, as mentioned they only act as a help for the clients. Branch ids are arbitrary and set by the clients whenever they wish for modifications to belong to a certain branch. If the client issues transactions to a new branch slice descriptions of all slices that describe the state of the file that is branched upon should be present. Compare this to the checkout command in Git.

94 C.1. STANDARDS AND CONCEPTS

Splitting This is applied whenever permissions should be revoked. This entails encrypting slices, of the file, that should not be visible to certain users with a new encryption key and then encrypt that key with the public keys of those users that should indeed have that access. The split transaction needs to contain slicedescriptions of all slices that make up the version of the file that is split upon. This means that slices from previous transactions can be reused by pointing to the same id. However if the slices and their hashes have changed they will need to be encrypted using the new key. When splitting a new file id is generated by hashing the splitting transaction.

Modifying a file This will result in a transaction that doesn’t have to have any outputs. The modification of the file will be reflected in the encrypted slice description part of the file description. The fact that this part of the transaction is encrypted means that only parties sharing the file are able to see this information. Before the transaction is created the file is sliced up and each slice crumbified and each crumb encrypted. These crumbs are then uploaded to the network identified by a slice id (the hash of the slice). This means that there is no connection between a transaction and its file other than in the encrypted part of the transaction which is not public knowledge. This helps prevent targeted attacks on files.

File management in the network A file is uploaded to the network without reference to the corresponding transaction in the form of slices. Depending on the redundancy setting in the transaction a client uploads each slice to that number of storage nodes in the network. Each storage node then agrees to monitor the network to ensure that the redundancy is maintained in the network by issuing regular slice verification requests to other nodes that maintain a particular slice. This supervision is also performed by the clients themselves.

Possible transactions For each transaction the inputs provide proof for the permissions that the current transac- tion require. Combined Any combination of the types below. Modification This type of transaction will only contain encrypted slice descriptions for those slices of the file that has been modified. Thus no recipients are specified. Permissions Transaction like this aim to grant permissions to other already existing recipients. The file description part of the transaction will not contain any file data. The user granting permissions can never grant permissions to him- or herself and also not grant permissions that they do not possess.

95 APPENDIX C. SECURES PROTOCOL

It is not possible to revoke permissions using this type of transaction. Split If permissions should be revoked then this type of transaction is required. The split takes a snapshot of the current file state that should belong to this split and encrypts modified slices with a new encryption key that will be made available only to the recipients of this transaction. This way slices that are not modified retain their ids and can be fetched and decrypted using previous encryption keys together with the new slices decrypted using new encryption keys. The choice is possible to re-encrypt all slices using the new encryption key. This type of transaction will create a new file id for future reference. Join Collects slices encrypted using different encryption keys and re-encrypt them using one common key and assign a previous file id (matching the key chosen). For instandce if one wishes to revert back to a previous permission state before a split this is what should be used. Transfer This type of transaction is used to prevent double spending. This is set in the genesis transaction and entails transferring ownership to the recipient in each transaction. One input and one output. Sharing Whenever a file is shared to a new party the encryption keys for the slices of the file are encrypted using the recipient’s public key. The permissions for the user is also set. Of course multiple new recipients can receive the file in the same transaction.

C.2 Common Data Structures

All integers will be encoded in little endian except for IP and port number which will be encoded in big endian.

Message Structure

Field Size Description Data Type Comment Header 4 version uint32_t Protocol Version 4 network uint32_t Value that indicates which network this packet originated from 12 command char[12] String identifying the packet content. NULL padded 4 length uint32_t Length of payload in number of bytes 4 checksum uint32_t First 4 bytes of the double hash of the payload Data ? payload uchar[] Data Table C.1: Table

network value

Network Magic value Sent as secures 0x000000 000000

96 C.2. COMMON DATA STRUCTURES

Network Magic value Sent as

Table C.2: Table command

value Description addrs Contains known nodes alert General notification block Block filteradd Add data to Bloom filter at node filterclear Clears Bloom filter filterload Sends Bloom filter to node getaddrs Request known nodes getblocklist Request block hashes getdata Request data getheaders Request headers getstatus Request node status headers Response with headers inv Inventory message notfound Requested data not found ping Check node live pong Response that node is alive putslice Puts an slice on a node reject Reject message slice Contains a slice slicestat Status of a slice status Response message to getstatus singletx Transaction slicesuperv Request to node to supervise a slice in the network txs Array of transactions unconftxs Request transactions that has not be confirmed verack Acknowledge version version Implementation version verifyslice Request to verify a slice Table C.3: Table

inventory vectors

Used to notify other nodes about transactions and blocks that the current node knows about. Contains object type and the hash of the object.

97 APPENDIX C. SECURES PROTOCOL

Field Size Description Data type Comments 4 type uint32_t Object type for this inventory 32 hash char[32] Hash of the object Table C.4: Table object types

Value Name Description 0 ERROR Any data of with this number may be ignored 1 MSG_TX Hash of a main transaction 2 MSG_BLOCK Hash of block header 3 MSG_FILTERED_BLOCK Hash of main block header. Identical to MSG_MBLOCK. When used in a getdata message it indicates that the reply should be a filtblock message. This only works if a bloom filter has been set on the responding node. 4 MSG_SLICE Hash of a slice Table C.5: Table

variable length integer Used to save space. Precedes every array of data whose length may vary. The integer itself gives the length of the data.

Value Storage length Format < 0xfd 1 uint8_t <= 0xffff 3 0xfd followed by the length as uint16_t <= 0xffffffff 5 0xfe followed by the length as uint32_t - 9 0xff followed by the length as uint64_t Table C.6: Table variable length string

Field Size Description Data type Comments ? length var_int Length of the string ? string char[] The string itself (can be empty) Table C.7: Table

Node id

A node is identified by its IP address.

98 C.2. COMMON DATA STRUCTURES

File Structures

file id The hash of the genesis transaction regarding a particular file. Null if this is the first transaction for a file.

file desc

Field Size Description Data type Comments 32 file id file_id File id 32 current state char[32] Current hash of file 1+ redundancy var_int The number of redundant copies of slices that should be put into the network. 0 indicates industry standard (currently 3). 8 is the chosen maximum at the moment. 32 branch_id char[32] The id of some branch that this transaction relates to. 1+ slice_count var_int Number of slices that the file was split into. If the transaction does not involve a modified file this will be 0. ? slicedescs slicedesc[] Slice descriptions. The sequence indicates the file structure. This will be encrypted using the same key that encrypted the file to prevent the network from determining which slices belong to which transaction. Table C.8: Table slicedesc Message that represents a file slice which is what a file is divided into when the client uploads the file to the network.

Field Size Description Data type Comments 32 slice_id char[32] The slice id being described. 32 slice_merkle_root char[32] Hash representing the merkle root for the crumbs that this slice consists of. 4 crumb_len uint32_t Size of a crumb 4 slice_len uint32_t Size of the file slice. Table C.9: Table slice id The hash of the slice in question. crumb Message that represents a crumb which is what a file slice is subdivided into. The client divides each file slice into crumbs which is then used to calculate the merkle root for the slice. Each crumb is accompanied by an index that points to the position in the merkle tree where this crumb was located. Depending on the index it is possible to deduce whether or not the data belongs to an actual crumb or a hash inside the merkle tree. Quark indices less than n indicates a node inside the merkle tree and >=n indicates an actual file crumb.

99 APPENDIX C. SECURES PROTOCOL

Field Size Description Data type Comments 4 version uint32_t Protocol version that was used to generate this message 32 ‘ slice_parent char[32] The slice that this crumb belong to 4 crumb_index uint32_t Index for the crumb in the slice merkle tree 4 crumb_len uint32_t Crumb length ? crumb_data char[] The crumb itself Table C.10: Table challenge The challenge itself consists of one crumb index. The index points a bottom node in the slice merkle tree. The bottom nodes are file crumbs that are hashed together to build the rest of the merkle tree. The peer that receives a challenge responds with the file crumb that the index points to as well as its neighbour and other nodes in the Merkle Tree that are necessary to recalculate the Merkle root. permissions Bitfield describing the particular permissions for a recipient in a transaction. Permissions : - Read - Write - Share - Modify permissions - Branch Reading and writing are set implicitly by sharing the file with someone. If those privi- leges should be revoked then a new genesis transaction needs to be issued. This would then be a special kind of genesis transaction with a pointer back to the previous file id that was split from. Whether or not someone is allowed to share a file can be matched against the latest confirmed permissions set for a particular user and what he or she wishes to achieve. No sharing implies no modify permissions. If a user does not have permission to modify permissions it is not possible for that person to perform a split nor is it possible to set any other permissions than 0 in the transaction outputs. In terms of branching that can also easily be checked against the current permission set for a user. If branching is not allowed all branching transactions should be rejected.

Field Size Description Data type Comments 1 permissions uchar Bitfield for permissions, each bit represents one of the permissions listed above. Table C.11: Table

network address Address to a node on the network, when the information was logged and some more infor- mation.

Field Size Description Data type Comments 4 version uint32_t Version for the protocol that was used to generate this message

100 C.2. COMMON DATA STRUCTURES

Field Size Description Data type Comments 8 timestamp uint64 the time when this information was logged. UNIX epoch. 1 services var_int same services listed in version 16 IPv6/4 char[16] IPv6 address. Network byte order. 16 byte IPv4-mapped IPv6 address (12 bytes 00 00 00 00 00 00 00 00 00 00 FF FF, followed by the 4 bytes of the IPv4 address). 2 port uint16_t port number, network byte order Table C.12: Table services

Value Name Description 0b00000001 NODE_NETWORK This node can be asked for full blocks instead of just headers. 0b00000010 STORAGE Specifies that this node responds to storage messages Table C.13: Table

Transaction structures

TxType The type of transaction to indicate how this transaction should be handled.

Value Description Comments 0 NORMAL Normal transaction with double spending allowed. 1 TRANSFER No double spending allowed. Ownership of a file is transferred. This is set in the genesis transaction and following transaction must be of this type unless they are NORMAL transactions set only with read permission for the recipients. 2 SPLIT A splitting transaction which means that a new fileid is generated but the transaction still pointing back to a previous transaction. 3 JOIN A joining transaction collecting slices encrypted using different encryption keys under one previous encryption key and accompanying file id. Table C.14: Table

TxIn

Field Size Description Data type Comments 36 previous_output outpoint The previous output transaction reference 1+ script length var_int The length of the signature script ? signature script uchar[] Computational Script for confirming transaction authorization Table C.15: Table

OutPoint

101 APPENDIX C. SECURES PROTOCOL

Field Size Description Data type Comments 32 hash char[32] The hash of the referenced transaction. 4 index uint32_t The index of the specific output in the transaction. The first output is 0, etc. Table C.16: Table

TxOut

Field Size Description Data type Comments 1+ pk_script length var_int Length of the pk_script ? pk_script uchar[] Usually contains the public key as a Bitcoin script setting up conditions to claim this output. 1 permissions permissions Bitfield for permissions 1+ enc_key length var_int Length of the encrypted encryption key ? enc_key uchar[] The encryption key used for encrypting file and URIs, encrypted using recipient’s public key. Table C.17: Table

Block Header Block headers are sent in a headers packet in response to a getheaders message.

Field Size Description Data type Comments 4 version uint32_t Protocol version used to generate this blockheader 32 prev_block char[32] The hash value of the previous block this particular block references 32 merkle_root char[32] The reference to a Merkle tree collection which is a hash of all transactions related to this block 8 creation_time uint64_t A timestamp recording when this block was created 4 difficulty uint32_t The calculated difficulty target being used for this block 4 nonce uint32_t The nonce used to generate this block. To allow variations of the header and compute different hashes 1 txn_count var_int Number of transaction entries, this value is always 0 Table C.18: Table

Criteria

Criteria that must meet. For instance, criterias are used when selecting which trans- actions should be delivered back to the requesting node if it can not be specified using bloomfilters.

Field Size Description Data type Comments 4 type criteria_type Criteria type 1+ payload_len var_int Lenght of the criteria payload ? criteria_payload char[] The payload of the criteria Table C.19: Table

102 C.3. MESSAGES

Criteria type Specifies a type of criteria

Value Desc Comment 0 FILE_TX_HISTORY Used when a client requests the entire transaction history for a file. The payload will be a file id. Table C.20: Table

C.3 Messages

The payload described in the general message structure can consist of any of the following messages which can, in turn, consist of other types of structures. version This message is sent whenever a node creates an outgoing connection whereby the remote node responds with a verack message if it was accepted, as well as a version message.

Field Size Description Data type Comments 1 services uint8_t bitfield containing features that should be enabled for this connection 8 timestamp int64_t standard UNIX timestamp in seconds 26 addr_recv net_addr Network address for receiving node. 26 addr_from net_addr Network address of emitting node 8 nonce uint64_t Random nonce generated every time a packet is sent from a node to detect connections to self. ? user_agent var_str User Agent (0x00 if string is 0 bytes long) 4 start_height int32_t The last main block received by the emitting node 1 relay bool Whether the remote peer should announce relayed transactions or not Table C.21: Table verack This message is sent in reply to a version message and indicates that the version message was accepted. No payload. addrs This message is sent as a response to getaddrs and provides information about nodes that are known to the responding node.

Field Size Description Data type Comments 1+ count var_int Number of address entries (max: 1000) 30x? addr_list net_addr[] Address of other known nodes on the network.

103 APPENDIX C. SECURES PROTOCOL

Field Size Description Data type Comments

Table C.22: Table inv This message is used by a node to advertise its knowledge of one or more objects such as transactions or blocks etc. It is sent in reply to a getblocklist or may be pushed to another node unsolicited.

Field Size Description Data type Comments 1+ count var_int Number of inventory entries 36x? inventory inv_vect[] Inventory vectors Table C.23: Table getdata Sent as a response message to inv to retrieve objects that the node currently does not know of. Used to fetch transactions but only ones that are in the memory pool or relay set. If criteria are set then they must be met also. If no inventory is specified for MSG_TX then all transactions matching the criteria is supplied. Criteria can be empty only if inventory is not. If inventory is empty then criteria must be set. Criteria can be used to fetch the entire transaction history for a file.

Field Size Description Data type Comments 1+ count var_int Number of inventory items requested. Possibly zero 36* inventory inv_vect[] Inventory vectors ? criterias criteria[] Criteria that the data must meet Table C.24: Table notfound Response message if data requested by getdata can not be relayed. Same payload as inv. getblocklist A node that receives this message responds by returning an inv packet containing a list of main blocks following the supplied hashlocator object. Some supplied blocks may be invalid if they belong to an invalid branch.

Field Size Description Data type Comments 1+ hash count var_int number of block locator hash entries

104 C.3. MESSAGES

Field Size Description Data type Comments 32+ block locator hashes char[32] block locator object; newest back to genesis block (dense to start, but then sparse) 32 hash_stop char[32] hash of the last desired block; set to zero to get as many blocks as possible (500) Table C.25: Table getheaders This message is sent when one node wishes to retrieve headers from another. The response is a headers packet containing block headers following the last known hash among the supplied block locator objects. This is used by thin clients to download the blockchain without any in this case irrelevant transaction data. Some blocks provided might be belong to an invalid branch. Same payload as getblocklist. The only difference is that the max number of hashes are 2000 instead of 500 since headers are more lightweight than blocks. singletx A message that describes a file transaction. Sent in reply to getdata.

Field Size Description Data type Comments 4 version uint32_t Protocol version used for this transaction 4 type tx_type The type of this transaction 1 tx_in count var_int The number of inputs 41+ tx_in tx_in Single transaction input for one file 1 tx_out count var_int Number of recipients 9 tx_out tx_out[] A list of 1 or more transaction outputs for this file 8 creation_time uint64_t Creation time for this transaction 4 lock_time uint32_t The block number or timestamp at which this transaction is locked: 0 = Not locked, < 500000000 = Block number at which this transaction is locked >= 500000000 = UNIX timestamp at which this transaction is locked. If all TxIn inputs have final (0xffffffff) sequence numbers then lock_time is irrelevant. Otherwise, the transaction may not be added to a block until after lock_time. ? file_desc file_desc Description of the file that is being shared Table C.26: Table block Whenever a node issues a getdata message requesting transaction data the receiving node responds with this message.

Field Size Description Data type Comments ? block eader block_header A blockheader but with a non zero transaction count instead indicating the number of transactions in this block. ? txs singletx[] Transactions Table C.27: Table

105 APPENDIX C. SECURES PROTOCOL

filtblock After a Bloom filter has been set, nodes don’t merely stop announcing non-matching trans- actions, they can also serve filtered blocks. A filtered block is defined by this type of message.

Field Size Description Data type Comments 32 prev_block char[32] The hash value of the previous block this particular block references 32 merkle_root char[32] The reference to a Merkle tree collection which is a hash of all transactions related to this block 8 timestamp uint64_t A timestamp recording when this block was created 4 bits uint32_t The calculated difficulty target being used for this block 4 nonce uint32_t The nonce used to generate this block to allow variations of the header and compute different hashes 4 total_transactions uint32_t Number of transactions in the block (including unmatched ones) ? hashes uint256[] hashes in depth-first order (including standard varint size prefix) ? flags byte[] flag bits, packed per 8 in a byte, least significant bit first (including standard varint size prefix) Table C.28: Table getstatus Sent to a node to quest its status. For instance if it has space to handle another file slice. This message has no payload. status Response to getstatus message indicating the current status of the responding node. Could be used for load balancing in the future.

Field Size Description Data type Comments 1+ space var_int Number of bytes that are still available for storage. If large enough the exact number need not be specified. Table C.29: Table putslice Stores a slice of a file at the receiving node

Field Size Description Data type Comments ? slice slice The slice itself Table C.30: Table slice Mesage representing a slice of a file that is sent back in response to a getslices message. Thin clients will use this to reconstruct a file and full nodes will use it when distributing slices accross the network.

106 C.3. MESSAGES

Field Size Description Data type Comments 32 slice_id slice_id Id of the slice 1 redundancy var_int Required redundancy for this slice 1+ crumb_len var_int Crumb length ? crumbs crumb[] The crumbs that constitute this slice Table C.31: Table slicesuperv Message that is sent by nodes that upload a slice to the network, requesting that other nodes take part in supervising the status of this particular slice in the network. A node should have a limit how many slices they monitor. Maybe 1000.

Field Size Description Data type Comments 32 slice_id slice_id Id of the slice that is to be supervised 4 redundancy uint32_t Required redundancy for this slice throughout the network ? storage_locations net_addr Current locations for the slice Table C.32: Table verifyslice Message that is sent questing for the status of a particular slice. The status is quested for by issuing challenges that consist of crumb indices from the slice itself. The response will serve those crumbs as well as data from the slice merkle tree allowing for the requesting client to verify that the slice seems to be ok by recalculating the Merkle Root. The verifying node will also verify that there is enough redundancy so that the slice is not lost. If there is not enough redundancy the node will spread the slice to other nodes.

Field Size Description Data type Comments 32 slice_id char[32] Hash identifying the slice in question 4 challenge_count uint32_t Number of challenges that follows 8+ challenges challenge[] The challenges themselves Table C.33: Table slicestat Response sent from node receiving a verifyslice or putslice message. This message will contain 2 or more actual crumbs as well as the necessary hashes from the slice Merkle tree for the requesting node to be able to recalculate the Merkle root and thereby verify those crumbs in the slice. This mechanism is used to avoid a complete download of all crumbs each time a slice should be verified.

107 APPENDIX C. SECURES PROTOCOL

Field Size Description Data type Comments 4 crumb_count uint32_t Number of crumbs including possible hashes inside the slice merkle tree ? crumbs crumb Contains the crumbs or Merkle tree hashes. Table C.34: Table txs Message containing an array of transactions. Response message to gettxs. Returns all the transactions pertaining to a certain file, in the form of a txs message. Best used with a Bloom filter to prevent the node from sending transactions that the requesting node already knows about.

Field Size Description Data type Comments 1+ count var_int Number of transactions ? txs singletx[] The transactions. Table C.35: Table headers This message contains block headers in response to a getheaders packet.

Field Size Description Data type Comments ? count var_int Number of block headers 81x? headers block_header[] Block headers Table C.36: Table getaddrs Used for peer discovery. Sent to nodes to get information about their known active peers. No payload. unconftxs This is a request to a node for information about main transactions that the node has verified but not yet confirmed. The node responds with an inv message containing transaction hashes possibly only those matching a bloom filter if one is loaded. Unconfirmed transactions have been verified but not mined into any block yet. No payload. ping Sent to verify that the connection to a node is still valid. If it fails the address is removed as a current peer.

108 C.3. MESSAGES

Field Size Description Data type Comments 8 nonce uint64_t random nonce Table C.37: Table pong Response to a ping message.

Field Size Description Data type Comments 8 nonce uint64_t nonce from ping Table C.38: Table reject Sent as a response whenever a message is rejected for whatever reason.

Field Size Description Data type Comments 1+ message command type of message rejected 1 reason reason code relating to rejected message Table C.39: Table

Reason

Value Name Description 0x01 REJECT_MALFORMED 0x10 REJECT_INVALID 0x11 REJECT_OBSOLETE 0x12 REJECT_DUPLICATE 0x40 REJECT_NONSTANDARD Table C.40: Table

filterload This message is related to Bloom filtering and supplies a Bloom filter which results in the receiving node only supplying broadcast transactions matching the filter.

Field Size Description Data type Comments ? filter uint8_t[] The filter itself is simply a bit field of arbitrary byte-aligned size. The maximum size is 36,000 bytes. 4 nHashFuncs uint32_t The number of hash functions to use in this filter. The maximum value allowed in this field is 50. 4 seed uint32_t A random value to add to the seed value in the hash function used by the bloom filter. 1 nFlags uint8_t A set of flags that control how matched items are added to the filter.

109 APPENDIX C. SECURES PROTOCOL

Field Size Description Data type Comments Table C.41: Table

filteradd Adds data to a Bloom filter that was previously supplied via filterload. The data field must be smaller than or equal to 520 bytes in size (the maximum size of any potentially matched object). This command is useful if a new key or script is added to a clients wallet whilst it has connections to the network open, it avoids the need to re-calculate and send an entirely new filter to every peer (though doing so is usually advisable to maintain anonymity).

Field Size Description Data type Comments ? data uint8_t[] The data element to add to the current filter. Table C.42: Table

filterclear This message clears the currently set Bloom filter. No payload. alert An alert message is sent as a general notification message throughout the peer-to-peer network. Alert format:

Field Size Description Data type Comments ? payload uchar[] Serialized alert payload ? signature uchar[] An ECDSA signature of the message Table C.43: Table

The payload is serialized into a uchar[] to ensure that versions using incompatible alert formats can still relay alerts among one another. The current alert payload format is:

Field Size Description Data type Comments 8 RelayUntil int64_t The timestamp beyond which nodes should stop relaying this alert 8 Expiration int64_t The timestamp beyond which this alert is no longer in effect and should be ignored 4 ID int32_t A unique ID number for this alert 4 Cancel int32_t All alerts with an ID number less than or equal to this number should be cancelled: deleted and not accepted in the future ? cancel_set set All alert IDs contained in this set should be cancelled as above 4 MinVer int32_t This alert only applies to versions greater than or equal to this version. Other versions should still relay it. 4 MaxVer int32_t This alert only applies to versions less than or equal to this version. Other versions should still relay it. ? setSubVer set If this set contains any elements, then only nodes that have their subVer contained in this set are affected by the alert. Other versions should still relay it.

110 C.3. MESSAGES

Field Size Description Data type Comments 4 Priority int32_t Relative priority compared to other alerts ? Comment string A comment on the alert that is not displayed ? StatusBar string The alert message that is displayed to the user ? Reserved string Reserved Table C.44: Table

111

Appendix D

Logic Database Model

113 Blockchain Block TxOut UnconfirmedTransaction All plain version fields are meant for optimistic locking using hashId CHAR(64) ProtocolVersion MiningDifficulty transaction CHAR(64) id BIGINT JPA prev_block CHAR(64) id INT id BIGINT Indexes transaction CHAR(64) merkle_root CHAR(64) Indexes value BIGINT Tx pk_script BLOB creation_time BIGINT(64) If a transaction does not have a corresponding txin this means that the permissions TINYINT block_height BIGINT hashId CHAR(64) transaction is a genesis transaction difficulty BIGINT version INT type VARCHAR(12) enc_key BLOB nonce INT(32) Indexes lock_time BIGINT(64) sequence INT proto_version INT proto_version INT recipient CHAR(64) height BIGINT block CHAR(64) version INT txCount INT creation_time BIGINT(64) OrphanBlock version INT version INT Indexes hashId CHAR(64) Indexes Indexes Indexes

TxIn

FileDesc id BIGINT TxType transaction CHAR(64) transaction CHAR(64) value VARCHAR(12) file_id CHAR(64) prev_tx CHAR(64) Indexes branch_id CHAR(64) prev_tx_index INT current_hash CHAR(64) OrphanTransaction signature_script BLOB redundancy INT transaction CHAR(64) version INT slice_descs BLOB Indexes sequence INT iv BLOB Indexes version INT

Indexes

File Peer Client

FileOnClient SliceOnClient Peer hashId CHAR(64) id INT id INT Crumb Slice SliceHolder name VARCHAR(128) file CHAR(64) ip CHAR(16) id BIGINT hashId CHAR(64) id BIGINT version INT hashId CHAR(64) port INT(16) slice CHAR(64) redundancy INT peer INT Indexes seq_nr INT timestamp BIGINT(64) sequence INT version INT slice CHAR(64) size INT service INT data BLOB version INT branch_id CHAR(64) proto_version INT proto_version INT Indexes version INT version INT Indexes Indexes version INT Indexes

Indexes Appendix E

Deployment Diagram

115