DEGREE PROJECT IN INFORMATION AND COMMUNICATION TECHNOLOGY, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018

Private, secure, and censorship resistant document sharing for individuals and groups based on distributed ledger technology

JENS RÖWEKAMP

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE TRITA EECS-EX-2018:519

www.kth.se Abstract

The scandal around Facebook and Cambridge Analytica in 2017 showed drastically that new concepts to share and store information need to be developed in order to minim- ize the huge potential for abuse resulting from centralized information stored at trusted third parties. This thesis analysed to what degree current document ex- change systems (e.g. ) comply with the information security services confidentiality, integrity, privacy, anonym- ity, authenticity of authors, non-repudiation, and accountab- ility; with the result that all analysed systems lack support for privacy and anonymity. Mainly due to their centralized design, missing (meta)data encryption, and regulations of jurisdictions in which they operate. Based on that analysis a decentralized concept for docu- ment sharing in a peer-to-peer fashion utilising client-side encryption, the separation of data and metadata, metadata masking through Tor hidden services, and distributed ledger technology for directory service provision, was developed. The concept was proven through prototype implementation of a document exchange software called docShare and its information security services were compared with former analysed exchange technologies. The analysis showed that docShare has a better information security service provi- sion but is still leaking identity information in form of IP addresses when interacting with the distributed ledger Eth- ereum. Mainly because Ethereum doesn’t support traffic anonymization through Tor.

iii Referat

Privat, säker och censurresistent document sharing för individer och grupper baserad på DLT

Skandalen kring Facebook och Cambridge Analystica i 2017 visade drastiskt att nya koncept för hur information delas och sparas behöver utvecklas för att reducera den stora missbrukspotentialen som är ett resultat av att information sparas centralt hos betrodda tredje partier. Denna avhandling analyserar till vilken utsträckning nuva- rande document exchange system(s) (till exempel Dropbox) följer säkerhetsservices såsom förtrolighet, integritet, ano- nymitet, autorernas autenticitet, påvislighet och räkenskap. Undersökningen visar att alla analyserade system saknar stöd för integritet och anonymitet, mest på grund av den centraliserade designen, saknande informationskryptering och de juridiska reglerna som gäller för dem. Baserad på denna undersökning utvecklades ett koncept för peer-to-peer document sharing som innebär att information krypteras, att information och metainformation separeras, metainformation skyddas genom TOR hidden services samt att DLT används för katalogtjänster. Detta koncept bevisades genom prototypisk implementation av en dokumentutbytningssoftware som kallas för docSha- re vars information security services jämfördes med andra analyserade utbytetekniker. Analysen visade att docShare har en bättre information security service tillhandhållande, men den läcker fortfarande identitetinformationer i form av IP adresser när den interagerar med den distribuerande ledger Ethereum, främst för att Ethereum inte stödjer traffic anonymisering genom Tor.

iv Acknowledgments

I am deeply thankful to

• my family, friends, fellow students, and professors for their support during my studies,

• Pia Ströhle and Anja Pflugfelder for translating my abstract to Swedish, and

• George Lucas for creating the Star Wars universe.

Menlo Park, CA, 27th August 2018 Jens Röwekamp

v Contents

1 Introduction1 1.1 Problem statement...... 1 1.2 Research methodology...... 2 1.3 Goals ...... 2 1.3.1 Objectives...... 3 1.3.2 Deliverables...... 3 1.4 Purpose...... 3 1.5 Scope, limitations, and assumptions ...... 4 1.6 Ethical and sustainable aspects...... 5 1.7 Outline ...... 5 2 Technical background7 2.1 Information security services ...... 7 2.1.1 Data integrity...... 7 2.1.2 Data confidentiality ...... 7 2.1.3 Data security, privacy, and anonymity...... 8 2.1.4 Non-repudiation ...... 8 2.1.5 Accountability ...... 9 2.1.6 Authentication...... 9 2.1.7 Access Control...... 9 2.1.8 Availability...... 9 2.2 Cryptography...... 10 2.2.1 Symmetric cryptography...... 10 2.2.2 Hashing...... 10 2.2.3 Asymmetric cryptography...... 11 2.2.4 Public key management...... 13 2.2.5 Digital envelopes...... 14 2.2.6 Zero-knowledge proof ...... 15 2.3 Distributed ledger ...... 16 2.3.1 Bitcoin ...... 16 2.3.2 Ethereum...... 18 2.3.3 Tendermint...... 19 2.3.4 Corda...... 21 2.4 The Onion Router ...... 21

vi CONTENTS

2.4.1 Traffic anonymization ...... 22 2.4.2 Location-hidden-services...... 22 3 Security assessment of related work 25 3.1 Document exchange use case analysis and sensitivity categorization . 25 3.2 Centralized file-sharing services...... 27 3.2.1 Box ...... 28 3.2.2 Dropbox...... 28 3.2.3 Nextcloud...... 29 3.2.4 Citrix RightSignature ...... 30 3.3 Decentralized data storage and file-sharing services...... 30 3.3.1 Sia...... 31 3.3.2 Storj...... 32 3.3.3 SecuRES ...... 33 3.4 Miscellaneous...... 34 3.4.1 Secure E-Mail through OpenPGP...... 34 3.5 Summary ...... 34 3.6 Conclusion ...... 35 4 Prototype specification 38 4.1 Architecture...... 38 4.1.1 Identity management...... 40 4.1.2 Anonymization network...... 41 4.1.3 docShare client...... 41 4.2 Protocols ...... 42 4.2.1 User registration in the identity management system (IMS) . 42 4.2.2 Legal identity verification of an entity in the IMS ...... 42 4.2.3 Data exchange with compliance to private ...... 44 4.2.4 Data exchange with compliance to anonymous ...... 44 4.2.5 Data exchange with compliance to tracking ...... 44 4.2.6 Data exchange with compliance to business ...... 47 5 Prototype implementation 49 5.1 Limitations to the specification...... 49 5.2 Public identity management service...... 49 5.3 Key value storage for tracking ...... 50 5.4 Anonymization network ...... 51 5.5 docShare client...... 51 5.5.1 SQLite3 metadata ...... 52 5.5.2 docShare library ...... 52 5.5.3 Services...... 55 5.5.4 Terminal scripts ...... 57 6 Prototype evaluation 62 6.1 Evaluation environment ...... 62 6.2 Public identity management...... 62

vii 6.3 Partner management...... 63 6.4 Document exchange with regards to secure and private ...... 64 6.5 Document exchange with regards to anonymous ...... 64 6.6 Document exchange with regards to tracking ...... 65 6.7 Document exchange with regards to business ...... 65 6.8 Summary ...... 65 6.9 Theoretical connection of the concept ...... 67 7 Conclusion 68 7.1 Recommendations for future work ...... 69

Bibliography 71

Appendencies A Declaration of independence 79 B Installation and usage guide 80 C Digital Content 84

List of Figures

2.1 Structure of a hash-linked list...... 11 2.2 Structure of a Merkle tree...... 12 2.3 Structure of the Bitcoin blockchain...... 17 2.4 High level structure of a replicted state machine...... 20 2.5 Traffic anonymization in Tor...... 22 2.6 Simplified version of the Tor rendezvous protocol...... 23

4.1 Prototype architecture overview...... 39

5.1 Screenshot of docShare’s public identity management system’s web-frontend. 50 5.2 Screenshot of docShare’s key value store’s web-frontend...... 51 5.3 Message format specification used in docShare communications...... 52 5.4 Digital envelope format for documents of sensitivity secure and private. 54 5.5 Digital envelope format for documents of sensitivity tracking...... 55

6.1 Environment used to evaluate docShare’s implementation...... 63

viii List of Tables

List of Tables

2.1 Specification of combinations of data security, privacy, and anonymity.. 8

3.1 Summary of information security services based on the use case analysis. 26 3.2 Categorization of documents based on their need of information security services...... 27 3.3 Compliance of analysed document-sharing systems with defined document sensitivity levels in section 3.1...... 35 3.4 Overview of the analysed systems with regards to the information security services and features provided...... 36

5.1 List of docShare’s message protocol formats...... 53 5.2 Message handling by comDaemon...... 56

6.1 Summary of docShare’s implementation evaluation regarding complied with information security services...... 66 6.2 Summary of docShare’s implementation evaluation regarding complied with document’s levels of sensitivity...... 66

ix List of Abbreviations

BYOE bring-your-own-encryption

CA certificate authority

Dapp distributed application

DHT distributed hash table

DLP data loss prevention

DSS document sharing system

EVM Ethereum Virtual Machine

IMS identity management system

PKCS Public-Key Cryptography Standard

PKI public key infrastructure

SSO single-sign-on

Tor The Onion Router

URL uniform resource locator

UTXO Unspent Transaction Output

x Chapter 1

Introduction

The Facebook scandal around Cambridge Analytica in 2017 showed drastically that centralized information stored at trusted third parties has a huge abuse potential. In Facebook’s case private information of over 50 million users was illegally shared without consent and used by Cambridge Analytica to provide voter profiles used to influence the U.S. elections in 2016 and the Brexit referendum. [1] The Snowden leaks from 2013 disclosed the federal surveillance projects PRISM, Tempora and XKeyScore and how British and U.S. intelligence agencies use them to massively collect meta- and content data at internet backbones around the globe. [2] Under the context that the U.S. military kills on the basis of metadata, [3] the protection of metadata at third party services gets a higher significance than naturally anticipated. This thesis tries to make an effort to analyse a sub-problem of this general problem space by examining how (sensitive) documents are shared through the services of trusted third parties like Dropbox [4].

1.1 Problem statement

The exchange and signing of a contract between two parties is an example for secure document exchange. Depending on the sensitivity of the document to be exchanged different precautions need to be considered and different technologies are used. The parties could for example meet in person, use a courier for the delivery, sent the contract via fax or (secure) e-mail, or use a document sharing service like Dropbox or Citrix RightSignature [5]. Concepts from information security like data integrity, confidentiality, privacy, anonymity, non-repudiation, authenticity of authors and accountability are essential to be considered when choosing the right exchange technology. All above mentioned technologies offer different degrees of information security when exchanging a document. Depending on the sensitivity of the document to be exchanged, different information security services are required. Especially when using the services of a trusted third-party these information security services are hard

1 CHAPTER 1. INTRODUCTION to impossible to verify as a client relies on the correct server-side implementation and security provision which she simply can’t influence nor control. Third-party services can be further undermined by law. The USA Patriot Act [6] for example allowed U.S. federal agencies for terror prevention purposes to access files hosted by any U.S. company. Its successor, the Freedom Act [7], is currently used to legally access data stored at cloud services hosted in the United States. Dropbox for example states in its transparency reports [8] that they were forced in 1,619 cases in 2016 to provide user content through search warrants. The number of received and complied with national security letters though needed to be kept secret. Its dark figure might be much higher. Recent developments in distributed ledger technology allow the development of distributed systems that don’t rely on the services of a trusted third-party or communication partner and could be utilized for secure document exchange.

1.2 Research methodology

A constructive research methodology [9, 10] was chosen. From the assumption of lacking information security services in current document sharing systems two research questions emerged:

1. to what degree do current document sharing system (DSS) meet the information security requirements of their users?

2. how could the information security of document exchange be improved using distributed ledger technology?

To answer the first research question a literature review will be conducted to define information security parameters for document exchange systems. Afterwards commonly used DSS will be analysed with regards to their compliance to examined information security parameters. To answer the second research question a feasibility study will be conducted to build a decentralized DSS that can operate without trusting any third-party. After the feasibility is proven through prototype implementation, the theoretical connection and the research contribution of the solution will be shown. Finally, the scope of applicability of the solution will be examined.

1.3 Goals

This thesis aims at analysing existing DSS with regards to their information security services and features provided, and at designing and implementing a private, secure, and censorship resistant DSS for individuals and groups on the basis of distributed ledger technology. The DSS should guarantee different levels of information security based on the sensitivity of the documents to be exchanged. At the end of this thesis two goals will be achieved. First, relevant document

2 CHAPTER 1. INTRODUCTION exchange systems will be analysed and compared with regards to their supported information security services, and second a prototype of a decentralized document sharing system addressing the limitations distinguished during former analysis will be built.

1.3.1 Objectives To reach these goals following objectives need to be met.

1. different information security services for the exchange of documents need to be identified. Then documents need to be categorised into levels of sensitivity based on their need of security services.

2. existing document sharing systems need to be analysed with regards to their compliance to information security services and key concepts need to be extracted.

3. a concept for a decentralized DSS which supports the exchange of documents of different levels of sensitivity between individuals and groups needs to be created.

4. a rudimentary prototype that proves the concept for each defined level of sensitivity needs to be created.

5. the prototype needs to be analysed with respect to its compliance to the documents’ levels of sensitivity as defined in 1.

1.3.2 Deliverables This project has following deliverables:

Report: This report as analysis of existing DSS and as feasibility evaluation of the specified and prototyped decentralized document sharing system.

Prototype: A rudimentary proof-of-concept of a decentralized DSS to share documents of different levels of sensitivity between groups and individuals.

1.4 Purpose

The purpose of this thesis is twofold. First, uncensored information exchange is needed maintain free speech, formation of opinion and democracy. Considering recent efforts in the European Parliament to introduce upload filters, to remove illegally uploaded copyrighted material from content platforms like YouTube or Wikipedia, [11] realities are close where content platform operators need to fear huge penalties if they share copyrighted material. As a consequence, they will most likely implement nontransparent self-learning algorithms to decide which material to publish and

3 CHAPTER 1. INTRODUCTION which to filter. These algorithms will filter material which they can’t 100% identify as non-copyrighted material and therefore remove also legitimate material. But the larger danger lies in the huge abuse potential inherent in nontransparent censorship infrastructure. Image a scenario where a content provider uses its nontransparent censorship infrastructure for its own agenda, to for example influence the next elections. Second, as advancements in machine learning, storage, and computational power allow us to largely collect and analyse data and metadata; profiling of this data for marketing, terror prevention, and other purposes became reality. The problem is that people aren’t aware if they are tracked and how their information is used, and consequently could change their behaviour to avoid unpleasant situations. Consider an imaginary scenario where an internet service provider tracks what kind of food your ordered through the internet and shares that information with your health insurance company, which will depending on the healthiness of the food ordered increase or decrease your premiums. The author believes that through the concept of a decentralized, private, secure, and censorship resistant information exchange system alternatives can be built to ease illustrated situations and to help to maintain free speech, form of opinion, democracy, and unaltered behaviour.

1.5 Scope, limitations, and assumptions

The scope of this thesis is the design and evaluation of information security services of (distributed) document sharing systems. Therefore, performance considerations are secondary. Further, due to limitations in time and resources, this thesis underlies following assumptions and limitations:

• To ensure anonymity a TCP anonymization software called The Onion Router (Tor) (cf. section 2.4) is used in design and implementation. It is assumed that Tor, if used like documented in its best practices, sufficiently anonymizes its users.

• It is assumed that more than 50% of the participating nodes used in the distributed ledger are behaving as intended and aren’t malicious.

• It is assumed that the used cryptographic algorithms (SHA-256, RSA and AES) and the underlying on which these algorithms are executed on are secure.

• It is assumed that every user will act as miner to maintain the distributed ledger. Therefore, incentive considerations of used distributed ledger won’t be handled in detail.

• The evaluated implementation will be seen as proof-of-concept, there will be no formal verification of the algorithms used.

4 CHAPTER 1. INTRODUCTION

1.6 Ethical and sustainable aspects

Regarding ethics and sustainability there are two important aspects to consider. First, there is the ethical dilemma that users on the one hand need tools to exchange legal information like contracts or trade secrets securely and privately. On the other hand, the same tools could be used to exchange illegal information like child abusive material securely and privately. From the point of technology there is no difference in the data being transmitted. Every content would be transferred as encrypted bitstream to guarantee confidentiality. Therefore, it is hard to impossible to control if illegal material would be transmitted without sacrificing the security services provided. It is further to consider that the definitions of legal and illegal material differ between jurisdictions, even between persons. If content filtering would be technological possible without sacrificing security services, there arises another question: Who would be the party to decide which contents are allowed to be shared and which are forbidden? As a consequence, to this dilemma, the author decided not to publish the source code of the prototype implementation in an online repository, but to hand it out to interested researches on personal request. The second aspect is of more technical nature and impacts sustainability. Many distributed ledger technologies like Bitcoin (cf. section 2.3.1) provide a public available append only log that grows indefinite within time. In this log a series of transactions is saved to maintain a global state of the system. Therefore, the whole log is needed to verify and add new entries. As the further discussed solution will be based on distributed ledger technology, the question arises if it is sustainable to first download a huge transaction log (e.g. 100GB) to only share a small file (e.g. 4MB). Further, to agree on such shared transaction log a consensus algorithm is needed. Currently proof of work is one of the most common used consensus algorithms in distributed ledgers. Unfortunately, proof of work uses a lot of computational resources. According to [12, 13] Bitcoin mining has an energy footprint similar to the Republic of Ireland. Therefore, alternative consensus algorithms should be considered to reduce the energy footprint of the solution to develop.

1.7 Outline

This thesis is structured as follows:

Chapter2 covers the needed technical background to understand this thesis. It further discusses and defines relevant information security services for exchanging documents. In chapter3 documents are categorised into levels of sensitivity based on their need of information security services and other features provided. The categorization will be performed on the basis of practical use-cases. Afterwards, relevant DSS and concepts are introduced and analysed with regards to their information security services and features provided.

5 CHAPTER 1. INTRODUCTION

In chapter4 the architecture of the decentralized DSS is sketched to exchange documents of different levels of sensitivity between individuals and groups. Chapter5 describes the prototype implementation as proof-of-concept. In chapter6 the prototype is evaluated with regards to its information security services. It further shows the theoretical connection of the concept. Finally, chapter7 discusses the research contribution, draws a conclusion, and finishes with recommendations for future work.

6 Chapter 2

Technical background

In this chapter relevant technical background information is covered to understand this thesis. It further provides relevant definitions used within this thesis.

2.1 Information security services

Information security is the discipline of guaranteeing certain security properties in information systems by providing equivalent security services. These security properties and services are explained below. [14, p.153,274ff]

2.1.1 Data integrity Data integrity is the “property that data has not been changed, destroyed, or lost in an unauthorized or accidental manner.” [14, p.95] Data integrity services can’t protect data from being changed but ensure that changes to data are detectable. Amongst other hashes can be used to guarantee data integrity. If the hash of a transferred file equals the hash of its original it was transferred with integrity.

2.1.2 Data confidentiality Data confidentiality is the “property that data is not disclosed to system entities unless they have been authorized to know the data.” [14, p.94] Data confidentiality services provide data confidentiality by using encryption. An example for a protocol which provides data confidentiality is SSL [15].

Depending on an application’s use case, data confidentiality can be applicable only to actual user data, or also its meta data.

7 CHAPTER 2. TECHNICAL BACKGROUND

2.1.3 Data security, privacy, and anonymity Data security is the property to “protect data from disclosure, alteration, destruction, or loss that either is accidental or is intentional but unauthorized.” [14, p.97] It can be achieved by combining data integrity and data confidentiality services. Data security only protects data from illegal parties. Both legal parties (like trusted third parties that operate an infrastructure service used) and partners have access to the data transferred. Privacy describes the right of an entity to choose and enforce how much personal information is shared with its environment. [14, p.232] In the context of this thesis data is shared with regards to privacy if only partners that have been chosen expli- citly have access to the data. Anonymity is the property that an identity is unknown or concealed to every party and can’t be deanonymized by analysing metadata. [14, p.18]

Table 2.1 summarizes the definitions and specifies combinations of data security, privacy, and anonymity.

Parties Shared data Partners Legal Illegal Specification Ids    Metadata    Security Content    Ids    Security & Metadata    privacy Content    Ids    Security, privacy Metadata    & anonymity Content   

Table 2.1: Specification of combinations of data security, privacy, and anonymity. Adapted from [16]. Ids: Data identifying a user (e.g. name, telephone nr, IP address, e-mail address). Metadata: Some property of the user (e.g. gender, age, location, business partners). Content: Data to be exchanged (e.g. contracts, files). Partners: Entities explicitly chosen to exchange information with. Legal: Entities not explicitly chosen but necessary to exchange the information (e.g. mail relays). Illegal: Entities not explicitly chosen and not necessary to exchange the information (e.g. government agencies).

2.1.4 Non-repudiation Non-repudiation “provides protection against false denial of involvement in a [commu- nications] association.” [14, p.200] A non-repudiation service provides the recipient of a message with evidence that proves its origin and provides the sender of the

8 CHAPTER 2. TECHNICAL BACKGROUND message with evidence that proves it was received as addressed. Technical and legal aspects of non-repudiation need to be distinguished. Technical non-repudiation only assures that a digital signature was created with the corres- ponding private key. Legal non-repudiation refers to the possession and control of the private key used. [14, p.200ff]

2.1.5 Accountability Accountability is the “property of a system or system resource that ensures that the actions of a system entity may be traced uniquely to that entity, which can then be held responsible for its actions.” [14, p.13] Accountability is for example important to trace authorized changes in a document back to the entity which altered the document. An audit service can be used to achieve accountability. It records actions of system entities and their resulting system events. [14, p.26]

2.1.6 Authentication Authentication is the “process of verifying a claim that a system entity or system resource has a certain attribute or value.” [14, p.25] Authentication is widely used in computer systems to verify the identity of a user. Its process consists of two steps. In the identification step the claimed value (e.g. a user identifier) is presented to the authentication service. In the verification step authentication information belonging to the value (e.g. a password) is presented or generated. It acts as evidence to prove the binding between the attribute and that for which it is claimed. [14, p.25ff] Authenticity is the “property of being genuine and able to be verified and be trusted.” [14, p.28] In the context of document exchange an entity must be able to verify the authenticity of another entity and its messages. When an identity is being registered in a system, the system is responsible to prove the identity’s authenticity and its eligibility.

2.1.7 Access Control An access control service protects system resources against unauthorized access. The system’s access policy defines which resources are accessible by which entities. It can for example be implemented as access control list or access control . [14, p.11ff] Authorization defines the “process for granting approval to a system entity to access a system resource.” [14, p.29] Authorization depends on the form of access control used. It could for example be achieved through sharing of a secret (e.g. a decryption key).

2.1.8 Availability Availability is the “property of a system resource being accessible, or usable or operational upon demand, by an authorized system entity, according to performance specifications for the system.” [14, p.30] Performance specifications are usually

9 CHAPTER 2. TECHNICAL BACKGROUND specified by quantitative metrics. An availability service protects a system against denial-of-service attacks to guarantee its availability. It therefore relies on proper resource management and access control. [14, p.30ff]

2.2 Cryptography

Cryptography is a mathematical science that deals with transforming data to achieve data integrity, data confidentiality or data authenticity. [14, p.90ff] In this section main cryptographic concepts are described to provide different information security services.

2.2.1 Symmetric cryptography In symmetric cryptography the same (secret-)key is used to perform both of two counterpart cryptographic operations. Usually symmetric cryptography is used to encrypt plaintext into cyphertext and to decrypt cyphertext back into plaintext. Symmetric cryptography ensures data confidentiality if data is encrypted before it is sent. The disadvantage compared to asymmetric cryptography lies in the costly distribution of the (secret-)key if done securely. The security of the data depends on every entity owning a copy of the key. [14, p.296ff]

An example algorithm for symmetric cryptography is AES [17].

2.2.2 Hashing A hash function H maps an arbitrary, variable-length bit string, s, into a fixed-length string, h=H(s), called hash. A secure hash function, which can for example be used for fingerprinting, has two security properties:

• One-way function: Given H and h it is computationally infeasible to find s.

• Weakly collision-free: Given H and an input s it is computationally infeasible to find a different input s’ such that H(s) = H(s’), or

• Strongly collision-free: Given H it is computationally infeasible to find any pair of inputs s and s’ such that H(s) = H(s’). [14, p.140ff]

An example of a secure hash function is SHA-256 [18]. There are interesting data structures that can be build upon secure hash functions to ensure data integrity.

Hash-linked list A hash-linked list, also called blockchain, is a data structure similar to a linked list that uses hash pointers instead of pointers to ensure data integrity. A hash pointer points to the previous block of the list and includes its cryptographic hash (cf. figure 2.1). To verify the integrity of the whole blockchain a user only has to remember

10 CHAPTER 2. TECHNICAL BACKGROUND the hash pointer pointing to the head of the list. He then successively calculates the hash of the block the pointer is pointing to and compares both hashes. If the hashes match he repeats this action with the following hash pointer until he reaches the genesis block. Then the integrity of the whole list is verified.

If an adversary tampers with the data Figure 2.1: Structure of a hash-linked list. stored in one block of the linked list, the hash pointer of the following block

won’t match the hash of the block it previous is pointing to. Therefore, an adversary hash would have to tamper with the hash data pointer of the following block which pointer

would result in changing its block hash. previous As a result, an adversary would be hash forced to alter all hash pointers up to data the one pointing to the head of the pointer

list to legitimate his changes. As the previous user remembers this hash it can’t be hash changed by the adversary and therefore data

his change will be detected. [19] pointer previous Merkle tree hash

A Merkle tree is a binary tree that pointer uses hash pointers instead of pointers (cf. figure 2.2). Similar to hash-linked lists only the root hash pointer needs to be remembered to guarantee data integrity. Merkle trees have two other valuable features. Membership of a data object can be proven in O(log n) time and space by showing only the items from the leaf data object until the root hash pointer (as indicated in red). If all hashes match up, the showed data object is member of the Merkle tree. If the data objects in the leaves are ordered (e.g. alphabetical) a non-membership proof of any data object is also possible. Therefore, the data objects paths of the items before and after the missing one have to be shown. [19]

2.2.3 Asymmetric cryptography Asymmetric cryptography, also called public-key cryptography, uses a key-pair to perform two counterpart cryptographic operations. One key is used for the first operation and the other for its counterpart. Usual applications are encryption, digital signatures, and key-agreement. Both keys are differentiated by their accessibility: public and private. The public key is meant to be exchanged with a third-party so that a third-party can perform one operation (e.g. verify a signature). Contrary the private key is meant to be kept

11 CHAPTER 2. TECHNICAL BACKGROUND private and used for the counterpart operation (e.g. signing a document). Usually every party has its own keypair and publishes its public key to assure a mutual provision of services. [14, p.21ff]

Examples for asymmetric cryptography are RSA [20] and Elliptic Curve Cryp- tography [21, 22].

hash

pointer

hash hash

pointer pointer

hash hash hash hash

pointer pointer pointer pointer

data data data data

Figure 2.2: Structure of a Merkle tree.

Encryption If asymmetric cryptography is used for encryption the public key is used by a third-party to encrypt content for the entity owning the corresponding private key. Compared to symmetric encryption asymmetric encryption uses more resources [23]. Therefore, asymmetric encryption is usually used to securely communicate a secret-key for symmetric encryption if larger amounts of data need to be transferred (cf. section 2.2.5).

Digital signatures A digital signature is a value which can be associated to a data object to enable a recipient to verify the data’s integrity and authenticity. Therefore, the sender hashes the data object, encrypts the hash with his private key, which is called digital signature, and sends both data object and encrypted hash to the recipient. The recipient uses the sender’s public key to decrypt the hash from digital signature.

12 CHAPTER 2. TECHNICAL BACKGROUND

Afterwards he hashes the data object and compares both hashes. If the hashes match the recipient confirmed the object’s integrity and authenticity. [14, p.104ff]

Key agreement Key agreement algorithms generate a shared secret-key between multiple parties based on one’s own private key and the third parties’ public keys. Only the public keys need to be transferred, the calculated secret-key is never sent through any communication channel. [14, p.102ff]

An example algorithm is Diffie-Hellman-Merkle [24].

2.2.4 Public key management According to B. Schneier [25, p.169-187] proper key management is the hardest part of cryptography. What are the best cryptographic algorithms of use if you can’t secure your private keys or can’t obtain the public keys of the entities you want to communicate with? Public key management systems try to fill this gap. Here are some of their key responsibilities.

• Identities need to be bound to their public keys. This is usually done in (public key) certificates.

• Identities of owners of certificates need to be verifiable so that certificates can be trusted.

• Certificates need to be able to be updated/replaced by their owner.

• Certificates need to be able to be revoked if their private key is compromised.

• Certificates need to be exchanged without the possibility of tampering.

In the following three different public key management systems are described.

Public key infrastructure In public key infrastructures (PKIs) someone trustworthy sings the public key of another entity to verify that entity’s identity. Centralized certificate authorities (CAs) take this role of trusted third parties. If a certificate is verified by a CA another entity needs trust the CA that the certificate belongs to the right identity. This has the benefit that an entity only has to verify and manage the root certi- ficates of a couple of CAs and has not to verify the identity of every entity they communicate with. Further do CAs take care of the certificate update, replacement, and revocation. PKIs are hierarchical key management schemes where CAs are able to certify other CAs in a tree structure. CAs can also provide a list of public keys and identities of every entity they certified. This makes it easy to look up and obtain public key certificates of identities someone hasn’t communicated with before.

13 CHAPTER 2. TECHNICAL BACKGROUND

On the downside everyone has to trust the CAs not behaving malicious. They are a single point of failure. There are documented cases [26, 27] where a CA undermined the integrity of its clients by issuing or revoking wrong certificates. CAs can further be censored by law if they operate in jurisdictions of certain countries.

The X.509 certificate infrastructure [28] is an example for PKIs. It is most widely used in the internet (e.g. in TLS/SSL which is the basis for HTTPS).

Web of trust The web of trust was first introduced by PGP [29] and defines a distributed key management approach. Instead of trusting whole CAs each entity can define its own set of trusted entities and rules. Trusted entities, also called introducers, are similar to CAs in PKIs. They verify and sign certificates of identities they know. Based on the introducers’ signatures appended to the public key certificate a third-party can decide if she wants to trust the identity of the certificate or not. If she knows and trusts a couple of the introducers appended she will probably trust the certificate. Otherwise, if she doesn’t know any of the introducers she will probably not trust the certificate. Therefore, it is essential that introducers are careful when verifying identities. The web of trust model is resistant against denial of service and censorship as there is no central authority. On the downside there is no central repository of public keys to look up identities. As a result, certificate updates or revocation propagate slowly through the system as every entity needs to be informed by an introducer.

Distributed ledger based A different approach of designing public key management can be achieved through distributed ledger technology (cf. section 2.3). Here all public key certificates can be stored in one single distributed ledger. This makes it easy to look up identities and every entity can update and revoke its own identities. Only the verification of identities need to be taken care of. This could be done in the distributed ledger itself or locally in a fashion similar to the web of trust.

Examples of distributed ledger based public key management systems are Emer- coin [30], the decentralized public key infrastructure [31] or the BIX Certification Infrastructure [32, 33].

2.2.5 Digital envelopes A digital envelope is a data format in which content data intended for one or more recipients is encrypted with a oneßtime key. This one-time key is encrypted itself in a format that only the recipients can decrypt and appended to the message. This ensures that nobody else except the intended recipients can decrypt the content data. [14, p.103ff]

14 CHAPTER 2. TECHNICAL BACKGROUND

In Public-Key Cryptography Standard (PKCS) #7 [34, p.18ff] the content data is encrypted with a randomly generated symmetric key and the encryption key is asymmetrical encrypted for every recipient with the recipient’s public key.

2.2.6 Zero-knowledge proof Zero-knowledge proofs [35] are used to prove possession of some information (e.g. a secret) to another entity without revealing any of that information except that one is possessing it. [14, p.342] Zero-knowledge proofs usually take the form of interactive protocols and involve two types of interaction partners: One prover and one or multiple verifiers. The verifier asks the prover a series of questions. If the prover knows the secret she can answer the questions correctly. If she does not, she has some chance of answering correctly, e.g. 50% for every question. With each question the chances of answering correctly without knowing the secret drop rapidly. In the 50% example the chances 1 of answering correctly are 2n after n questions. Consequently, a verifier just has to define for himself his acceptable probability of being cheated and calculate the amount of questions to ask, e.g. 20 for a probability of lower than 0.0001%. The questions need to be in a form that the verifier won’t get any information about the secret itself, but only about the possession of the secret. Therefore, the information a verifier wants to proof needs to be a solution to a mathematical hard problem. Every time a new question is asked the prover transfers the mathematical hard problem into another hard problem that is isomorphic to the original. She then commits to the solution of the isomorphic problem and transfers both to the verifier. Afterwards, the verifier asks the prover to either show that both problems are isomorphic, or to reveal the solution in the commitment. Not all mathematical hard problems can be used for zero-knowledge proofs, but a lot. These include the discrete log of a given value [36] or the Hamiltonian cycle for a large graph [37]. M. Blum and Goldreich et al. documented [37, 38] that any NP statement has a zero-knowledge proof if it is translated into a instance of a mathematical hard problem that can be used for zero-knowledge proofs. Another feature of interactive zero-knowledge proofs is that the verifier can’t convince anybody else with a recording of the zero-knowledge protocol of the outcome of the proof. The verifier needs to be trusted. He and the prover could have collaborated, or the prover could have tampered with the recording. If any third party needs to be convinced non-interactive zero-knowledge proofs need to be used. Here the prover generates n isomorphic problems and commits to their solutions. The n first bits of the hash of concatenated commitments are used as basis for the decision to either prove the isomorphism or reveal the solution of the commitment. [25, p.101-111]

Practical applications of zero-knowledge proofs in computer science are authen- tication [39] or enforcing honest behaviour while maintaining privacy in distributed ledgers through zk-SNARKs [40].

15 CHAPTER 2. TECHNICAL BACKGROUND

2.3 Distributed ledger

In a ledger all business activities (e.g. asset transfers) are recorded as transactions. To reproduce the current state of a system all transactions need to be executed in the order they were recorded. But, when multiple parties interact with each other and keep their own ledgers and states, problems, and incidents through diverging ledgers (e.g. through fault or fraud) can appear. A distributed ledger is a ledger maintained by a group of entities that do not fully trust each other. They are systems that provide useful and trustworthy services like maintaining a shared state, mediating exchanges, and providing a secure computing engine. Many distributed ledgers can further execute arbitrary tasks, typically called smart contracts. Distributed ledgers offer an integrity-focused solution to Byzantine fault tolerant [41] atomic broadcast. Because of their Byzantine fault tolerance distributed ledgers can act as trusted and dependable third parties. They can operate without the need of central administration or central data storage which makes them resistant against censorship. [42, p.14] [43]

Usually distributed ledgers consist of following components:

• a peer-to-peer protocol to distribute transactions between nodes,

• a consensus algorithm to order transactions and maintain a coherent replicated state,

• a business logic to execute valid transactions,

• a tamper free data structure for storing executed transactions,

• an authentication mechanism to distinguish its users and rights, and

• an economic incentive to participate in the system.

A simple form of a distributed ledger is the replicated state machine used by Tendermint described in section 2.3.3 and depict in figure 2.4. Usually blockchains (cf. section 2.2.2) are used to achieve immutable and verifiable append-only logs, but reliable time in combination with multi-signatures could also be used. [44] Applications based on distributed ledgers can be either permissioned (private) or unpermissioned (public) and can be used to achieve technical non-repudiation, availability, and accountability.

2.3.1 Bitcoin The first known application of an unpermissioned distributed ledger is the digital currency Bitcoin [45]. In Bitcoin the shared data is the transaction log of every bitcoin consumed and generated, from which the current balance of every account can be deducted. Bitcoin is the first known application that used a blockchain as replicated tamper free

16 CHAPTER 2. TECHNICAL BACKGROUND data structure to record transactions. Each block of the blockchain is divided into header and transaction data (cf. figure 2.3). The header stores relevant metadata, like the hash-pointer pointing to the previous block, the Merkle root hash of the Merkle tree of transactions, and a nonce for the mining puzzle. The transactions are stored in a Merkle tree appended to the header. Only the header is used to calculate the hash of the block. Data integrity is ensured as the header contains the Merkle root hash of included transactions.

hash hash hash

header header prev. hash pointer prev. hash pointer prev. hash pointer header nonce nonce nonce

merkle root hash merkle root hash merkle root hash transactions transactions hash hash pointer pointer

hash hash hash hash pointer pointer pointer pointer

trans- trans- trans- trans- action action action action

Figure 2.3: Structure of the Bitcoin blockchain. Metadata is stored in the header of each block and transactions are stored in a Merkle tree. Only the header is used to calculate the hash of the block. Data integrity is guaranteed as the header contains the Merkle root hash of included transactions.

Bitcoin uses a consensus algorithm called proof of work, which combines cryptography and economics, to add new blocks to the blockchain. A node that wants to add a new block has to find a nonce (integer number of choice) so that the resulting hash of the block is smaller than a given threshold. Once a node found an appropriate nonce it can announce the new block to all other nodes. Every node which received the new block has to prove the validity of its transactions based on their local copy of the blockchain and that the calculated hash is below the given threshold. Bitcoins consumed in transactions further need to be checked to be signed by their former owners. If all conditions are met, they can add the proposed block to their local blockchain. As proposed blocks are propagated asynchronous in the network, race conditions can occur if multiple nodes propose new blocks simultaneously. In this case the blockchain is forked. To overcome this problem, in Bitcoin always the longest valid branch is effective. Therefore, the number of future blocks mined on top of multiple branches decides which branch is legitimate and which isn’t. In practice a mined block counts as confirmed if six or more blocks are mined on top of it. As solving a

17 CHAPTER 2. TECHNICAL BACKGROUND mining puzzle is computationally expensive it is assumed that it is uneconomic or unfeasible for a single entity to catch up that number of blocks. Especially as the threshold to find a new block, ten minutes average, is re-calculated every 2016 blocks (≈ two weeks) based on the hashing power of the entire Bitcoin mining network. Bitcoin further introduced incentives in form of block rewards and transaction fees to facilitate honest behaviour. Every block has a special transaction called coinbase transaction, which generates a certain amount of bitcoin1 that the miner can send to an address of his choosing. Every other transaction consumes bitcoins from one or multiple input addresses, creates new bitcoins in up to the same amount and assigns them to the defined output addresses. The difference in the amount of consumed coins and created coins is the transaction fee a miner can pocket. These economic incentives (and exchanges in fiat money) guarantee that enough miners participate in the mining process and therefore guarantee Bitcoin’s stability and success. [46, 47]

2.3.2 Ethereum Ethereum was introduced by V. Buterin [48] to create an unpermissioned distributed ledger with build in Turing-complete programming language for building distributed applications (Dapps). It adapts some concepts of the Bitcoin system and addresses several important limitations. Ethereum uses like Bitcoin a blockchain for storing executed transactions and an adaption of the proof-of-work consensus algorithm called Greedy Heaviest Observed Subtree [49] to address issues resulting from Ethereum’s fast confirmation times. In addition to externally owned accounts, which are controlled by private keys, Ethereum introduces a new form of accounts named contract accounts. Contract accounts, also called smart contracts or Dapps, are a form of autonomous agents that are controlled by their internal contract code. Contract accounts have direct control over their key/value store, to keep track of persistent variables, and their ether balance, the internal currency of Ethereum. The communication in Ethereum is done by messages and transactions. Transactions are signed data packets that store a message sent by an externally owned account. Messages are data objects that are send between contract accounts and only exist in the Ethereum execution environment. Messages and transactions contain amongst others its recipient, the amount of ether to transfer alongside the message, an optional data field, and a STARTGAS value. Contract code is invoked when a contract account receives a message or transaction. Transaction fees are deducted on the basis of the transaction length and instructions completed by the contract code. As contract code can contain infinite loops the transaction fees can’t be calculated in advance. Therefore, the sender specifies in the STARGAS parameter how much gas2 he is willing to spent maximal for his

1The amount is currently 12.5 BTC but is halved every 210,000 blocks (≈ 4 years) until the maximum of 21 million bitcoins is reached. 2Gas is an internal value for the transaction costs, calculated from the current gas price: gas ∗ gas price = price in ether.

18 CHAPTER 2. TECHNICAL BACKGROUND transaction. If the provided maximal transaction fee isn’t enough all state changes are revoked and the complete transaction fee is sent to the miner. If the maximal transaction fee is sufficient, only the actual transaction fee is deducted, and the rest is refunded to the sender. Ethereum contracts are written in a low-level stack-based byte language, which can be compiled from higher languages like Serpent, and are executed in the Ethereum Virtual Machine (EVM). Unlike blocks in the Bitcoin blockchain, Ethereum blocks contain a copy of both transactions and the most recent system state. Additional metadata like the block number and the mining difficulty is also included. As a result, light nodes don’t need to store the entire blockchain history to reproduce the latest state. The states are stored in a Patricia tree structure [50], which uses pointers to values of previous states, for efficiency. Further, have contract accounts access to the blockchain header that can act as a valuable source of randomness in their applications. Ethereum suffers from the same scalability issues as Bitcoin. Every transaction needs to be processed and verified by every node in the network and needs to be recorded in the blockchain. This results in an infinite growth of the blockchain within time which endangers Ethereum’s decentralization. At some critical point in the future the blockchain will be that huge, e.g. 100TB, that it is unfeasible for commodity household nodes to keep a local copy. At this point only large organizations with their own data centres will be able to verify the integrity of every transaction. Then light nodes are forced to trust some supposed honest full node(s) without the instant possibility to discover if they were cheated. Currently, Ethereum is experimenting to switch from their proof-of-work algorithm to a less resource intensive and more censorship resistant proof-of-stake consensus algorithm called Casper [51, 52].

2.3.3 Tendermint Tendermint [42] is the name of a software platform consisting of a byzantine fault tolerant consensus protocol, its implementation, an interface to build arbitrary applications on top of the consensus, and tools for deployments and management. Tendermint uses its own consensus algorithm achieving thousands of transactions per second with dozens of nodes distributed around the globe and latencies about one second. Internally, Tendermint abstracts applications into a replicate state machine and uses a blockchain to store the transaction log (cf. figure 2.4). In Tendermint nodes who “mine” new blocks in the blockchain are called validators. They are fixed in size and know each other. Validators are responsible for maintaining a full copy the replicated state, proposing new blocks, and voting on them. A round robin algorithm decides which node proposes the next block. Once a block is proposed a vote about his validity is performed in two phases before it is committed. In the first phase each node broadcasts its opinion about the validity of the proposed block. In the second phase every node broadcasts its opinion to commit the proposed block based on the validity information received from every other validator before.

19 CHAPTER 2. TECHNICAL BACKGROUND

Figure 2.4: High level structure of the replicated state machine implemented by Tendermint. The transaction log and resulting state is replicated across multiple nodes (diamonds). Source [42, p.7].

If a positive confirmation of more than 2/3 of all validators is received the proposed block is added to the blockchain. Otherwise a new round with the next proposer is started. Local timeouts are used to deal with offline validators and faulty network connections. The timeout resets once a valid block is committed or a new round is started. Tendermit introduces governance algorithms to allow changes of the protocol or validator set. They are essentially based on proposal and voting. Compared to proof-of-work consensus there is no cost for an entity to operate multiple validators. Therefore, it is important that each validator’s identity is proven during registration. Unfortunately, Tendermint doesn’t specify a mechanism to do this and leaves it for the application developer (e.g. through external channels). Further does Tendermint not have an internal currency nor incentives for behaving honestly. Disincentives for fraudulent behaviour based on exclusion is stated for future work.

20 CHAPTER 2. TECHNICAL BACKGROUND

2.3.4 Corda Corda [53, 54, 55, 56] is a permissioned distributed ledger developed by R3CEV LLC with the purpose of recording and enforcing business agreements among registered financial institutions. It uses an Unspent Transaction Output (UTXO) model similar to Bitcoin where states are the atomic unit of information. Each state is labelled after the transaction which created it. In contrast to any other described distributed ledger, data is shared on a need to know basis. Therefore, consensus is reached between parties to deals and not all participants. There is no single point that records every transaction. Thus, there is only minimal support for rollbacks. Corda doesn’t use a blockchain to order transactions. Instead it uses pluggable notaries that aren’t tied to any particular consensus algorithm. The role of miners is abstracted as transactions aren’t ordered into blocks. Notaries and timestamp services provide timestamping and transaction ordering functionality. Each transac- tion includes a window of timestamps in which it is asserted to have occurred and a signature of the notary who checked that all inputs weren’t consumed. Corda uses the X.509 certificate infrastructure [28] for connecting public keys to identities. A network map service publishes the IP addresses through which every node can be reached alongside with their identity certificates and provided services. Sybil attacks [57] are avoided as each participant needs to be authorized and identi- fied before it can join the network. A point-to-point messaging network ensures that transactions get delivered to the right parties. Corda has no internal currency nor transaction fees as its permissioned nature and limited use case is incentive enough for all participants. If smart contracts rely on outside facts, a trusted oracle needs to be queried that provides deterministic data to all participants. To increase security dedicated secure signing devices3, that store the private key and sign transactions, are supported on the client side.

2.4 The Onion Router

Tor [58] is an anonymization software that hides the source IP addresses of TCP connections by relaying them through various middle man computers. It further offers perfect forward security, congestion control, decentralized directory servers, integrity checking, configurable exit policies, and location-hidden-services via rendezvous points. Location-hidden-services, usually called hidden services, are services that are anonymously hosted in the Tor network and can’t be censored by law, as their location is protected by several middle man computers as well. Between 1.5 and 2 million clients use the Tor network daily to anonymize their network traffic, and in average 60,000 hidden services are hosted in the Tor network [59]. Through Tor anonymity and service availability, via decentralized directory servers and firewall piercing hidden services, can be achieved.

3e.g. TREZOR - https://trezor.io/ - last accessed 05.10.2017

21 CHAPTER 2. TECHNICAL BACKGROUND

2.4.1 Traffic anonymization Tor anonymizes TCP traffic by relaying it through, usually three, middle man computers. These middle man computers run a copy of the Tor software and are usually called Tor nodes. Tor nodes can be configured to relay traffic within the Tor network or also between the rest of the internet. Figure 2.5 depicts how the traffic anonymization works.

encrypted link unencrypted link

2 3 Tor node entry node

Tor client

1 www.kth.se relay node 4

5

Tor directory server www.kernel.org exit node

Figure 2.5: Traffic anonymization in the Tor network. Adapted from [60].

The Tor client first downloads a list of active Tor nodes from a directory server including their network configuration. Then a random path of three or more Tor nodes is chosen to anonymize the traffic. Incrementally a circuit of encrypted connections through the relays of the network is established. Hereby asymmetric cryptography is used to exchange symmetric link keys. For each hop along the circuit a new link key is negotiated. No individual relay ever knows the complete path a data packet has taken. Relays just know their predecessor and successor in the circuit. Once a circuit has been established, data can be exchanged anonymously. Any application which uses TCP streams and supports the SOCKS protocol [61] can be anonymized. To reach a good balance between efficiency and anonymity Tor circuits are changed every 10 minutes.

2.4.2 Location-hidden-services In addition to offer TCP traffic anonymization from client to server Tor also offers server location anonymization via hidden services. Therefore, it uses a rendezvous protocol, that has recently been upgraded to version 3 [62]. Compared to version 2

22 CHAPTER 2. TECHNICAL BACKGROUND it relies on stronger cryptographic algorithms, offers additional client authorization, and avoids that hidden service names can be leaked by dishonest hidden service directory nodes (cf. [63, 64]). Figure 2.6 shows a simplification of how the rendezvous protocol works.

Tor cloud Tor circuit 3 IP1-3 Introduction Points RP Rendezvouz Point hidden service directory TI Temporary index

cookie One time secret RP 4 Identity encryption key cookie

1 IP1-3 communication keys

IP1 IP2 1 2

Tor client 1 3 IP1 -3 TI

RP IP3

5 cookie hidden service

Figure 2.6: Simplified version of the Tor rendezvous protocol. Adapted from [65].

Initially, just as in traffic anonymization, the Tor client and hidden service download a list of Tor nodes from a directory server. The directory server further offers a mutually agreed random value from the Tor directory authority nodes which is also needed. This random value changes periodically, i.e. every 24 hours, and is used to prevent DoS attacks on hidden service directory nodes.

Each hidden service uses multiple asymmetric keypairs.

• A master (hidden service) identity keypair is used for long term identification. Its public key is part of the .onion address that uniquely identifies a hidden service.

• A temporary blinded singing keypair is derived from the master identity keypair and the downloaded random value. It changes every time a new random value is announced. Its public key is used as index in the hidden service directory and is therefore depict as “temporary index” in figure 2.6.

23 CHAPTER 2. TECHNICAL BACKGROUND

• A descriptor signing keypair is used to sign the hidden service descriptors uploaded to the hidden service directory. It is signed by the temporary blinded signing keypair. Its public part is included in the unencrypted section of the hidden service descriptor.

• For every introduction point two keypairs, one for authentication and one for encryption, are created. Their public keys are included in the encrypted part of the hidden service descriptor and labelled as “IP communication keys” in figure 2.6.

In the beginning a hidden service establishes permanent Tor circuits (of usually three hops) to multiple (usually three) Tor nodes, which will act as introduction points, and negotiates their communication keys to identify him as hidden service. Afterwards it builds its hidden service descriptor. The descriptor is divided into an unencrypted and a double encrypted section. Important parts of the unencrypted section are a copy of the descriptor signing key and its signature over the whole descriptor to ensure data integrity. As the descriptor signing key is signed by the blinded signing key its authenticity can also be confirmed. Important parts of the double encrypted section include information about the chosen introduction points and communication keys. The first layer of encryption provides confidentiality against everyone who doesn’t know the public identity key of the hidden service. The second layer of encryption protects against entities that do not possess valid client credentials and is only useful if client authorization is enabled. Once the hidden service descriptor is built it is uploaded to the responsible hidden service directory nodes which are arranged in a distributed hash ring. The responsible hidden service directory nodes are amongst others derived from the temporary blinded singing key and therefore change over time. Usually two nodes are responsible for hosting a hidden service descriptor to ensure availability. If a client wants to contact a hidden service it has to know its public identity key or rather its .onion address. In conjunction with the public available random value of the Tor directory authorities it can derive the hidden service’s public temporary blinded signing key. From this it gets the responsible hidden service directories from whom the hidden service descriptor can be downloaded. The temporary blinded singing key is used to validate the descriptor’s integrity and authenticity. Then the descriptor is decrypted using the public identity key and optional provided client credentials. Once the introduction point information is acquired, the client connects itself to a random Tor node which will further act as rendezvous point. The encrypted rendezvous point contact information along with an authentication cookie is sent to one of the introduction points, which forwards it to the hidden service. The hidden service decrypts the information and connects itself to the rendezvous point. Afterwards it authenticates itself against the client using the acquired authentication cookie. At this point client and hidden service can communicate anonymously, privately and securely.

24 Chapter 3

Security assessment of related work

In this chapter basic features and information security services required for document sharing are determined based on the analysis of three use cases. Afterwards docu- ments are categorised into levels of sensitivity based on their information security services needed. Then existing document sharing system are analysed with regards to their information security services and features provided. In a final step existing solutions are compared with their compliance to defined sensitivity levels. Observed designs and limitations will form the basis for the development of the design and implementation of the document sharing system described in chapter4 and5.

3.1 Document exchange use case analysis and sensitivity categorization

Based on the in section 2.1 identified information security services security, privacy, anonymity, non-repudiation, legal non-repudiation, sender authenticity, recipient authenticity, and accountability, 28 individual combinations are theoretical possible to form document sensitivity categories. As some information security services, like anonymity and sender authenticity, are mutual exclusive and some information security services, like privacy and security, rely on each other the number of possible combinations can be reduced to 84.1 As this are still too many combinations to check for compliance in existing DSS, use cases adapted from the post office analogue are used to extract five document sensitivity categories. An alternative approach to limit the number of considered use cases directly on the basis of the end-users need for information security services was dismissed as no statistics could be found, and end-users don’t have an information security awareness to make a reasonable statement [66]. The simplest use case is the exchange of documents or information with the sole intent that they will be consumed by their recipient(s). This includes sending a postcard to a friend, sharing holiday photos with the family in a sealed envelope,

1cf. “sensitivity-categorization-calculation.py” in appendixC.

25 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK or hinting a federal agency. In this case data security is fundamental. Therefore, data integrity and data confidentiality need to be met. Both rely on some form of authorization, identity management and identification. Depending on the sensitivity of the information to be exchanged privacy and sender anonymity could also be required. In addition, the authenticity of sender and recipient need to be verifiable. In case that anonymity is required only the authenticity of the recipient needs to be verifiable as sender authenticity directly conflicts with its anonymity. It is optional that the exchanged information is buffered at an intermediary until it is fetched by the receiver. The second use case extends above mentioned exchange of information with explicit sender and receive confirmations. This is for example important to prove that a contract cancellation or a valuable good reached its destination before a certain deadline, and equivalents to parcel tracking in the post office analogue. In terms of information security services this behaviour is called non-repudiation and implies that the receiving party can’t be trusted to send a receipt out of good faith. In the last use case a document or information is exchanged with the intent of being altered by the recipient and being sent back to the sender. Examples include the exchange and signing of a legally binding contract or collaborative work on a document. Therefore, the information security services of the former use case need to be extended with accountability. Here, especially legal aspects of non-repudiation need to be considered.

Table 3.1 summarizes for each use case the needed information security services.

Use case one two three Security    Privacy () () () Anonymity () ()  Non-repudiation    Legal non-repudiation    Authenticity of sender / ()/  /  /  recipient Accountability   

Table 3.1: Summary of information security services based on the use case analysis. Notation: : required, (): requirement depends on the sensitivity of the document, and : not required.

Based on the use case analysis documents will be categorised into five categories as depict in table 3.2. The first three categories, secure, private, and anonymous reflect the first use case and only differ in the sensitivity of data to be transmitted. They are further typed italic to differentiate them from their respective information security services. Categories four and five, tracking and business, reflect use cases

26 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK two and three, each with security and privacy required.

Document category / level of sensitivity secure private anonymous tracking business Security      Privacy      Anonymity      Non-repudiation      Legal non-repudiation      Authenticity of sender / /  /  /  /  /  recipient Accountability     

Table 3.2: Categorization of documents into levels of sensitivity based on their need of information security services. Defined are following document categories: secure, private, anonymous, tracking and business.

In the following different centralized and non-centralized document exchange systems are analysed with regards to their provided information security services and com- pliance to defined document categories. Implemented key concepts and algorithms are highlighted to form the basis for the design of the document exchange system to develop.

As many of the analysed systems are closed source software the analysis can’t be based on the source code of the software but will be based on publicly available information like product catalogues and industry certifications. If no information about a given information security property can be found it is assumed to be not supported.

3.2 Centralized file-sharing services

In centralized file-sharing services files are copied via the network to a centralized third-party that stores the files for their customers and takes care of data availability, redundancy, and synchronization. Centralized file-hoster usually maintain their own identity management system based on their customers’ e-mail addresses. In business segments self-hosted IMS like Microsoft Active Directory are also supported. Files can usually be shared directly between users of these directories or with externals via hyperlinks. In their “Cloud Adoption and Risk Report 2016 Q4” [67] Skyhigh analysed the cloud usage data for more than 30 million users worldwide at companies across all industries. According to their analysis over 20,000 cloud services were used in 2016 and an average company uploads between 9.8 TB and 24.5 TB per month into the cloud. Skyhigh further states that 18.1% of all documents uploaded to a centralized file-sharing or collaboration service contain sensitive information. 43.1%

27 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK of all uploaded files are shared (mostly in the same organization), and 9.3% of files shared externally contain sensitive information. Information security services vary drastically by cloud service. Only 42.1% of all services state that the customer owns all data uploaded and only 16.6% delete data immediately after account termination. Even worse, only 8.7% commit not to share customer data with third parties like advertising companies, only 8.6% encrypt data at rest, and only 0.8% allow customers to manage their own encryption keys; hence large room for improvement. Following analysis will focus on two of the most popular file-sharing services for businesses and consumers [67, p.24ff], Box and Dropbox, one open source file-sharing service, Nextcloud, and one file exchange service specialized in electronic signatures, Citrix RightSignature.

3.2.1 Box Box [68] is a secure centralized file-sharing platform for businesses. It is developed and maintained by Box, Inc. and offers services for file-sharing, collaboration, and fine-grained access control. Box ensures data confidentiality by encrypting data in transit with TLS and data at rest with AES-256. It further offers a bring-your- own-encryption (BYOE) solution named KeySafe [69, 70] where enterprises can manage their own encryption keys. In this case files are encrypted with an additional customer key and Box, Inc. has no possibility to access the files. Box further supports data integrity, accountability, and technical non-repudiation by using version control and maintaining an append only log for file access and decryption. [71] Box’s centralized IMS and support for companywide third-party IMS, like LDAP or ADFS2, guarantees authenticity of sender and recipient. Users are identified by their e-mail addresses. No further identity verification process is mentioned. Therefore, legal non-repudiation isn’t achieved. Box uses N+2 node fault tolerance to prevent data from being lost and promises an availability of 99.9%. [72] Other provided features include document watermarking, two-factor authentication, support for single-sign-on (SSO), in-region storage and data loss prevention (DLP) for mobile devices. [73] Box doesn’t offer privacy or anonymity as the platform needs access to ids and metadata to operate. As Box is a closed-source ecosystem a customer has no possibility to verify if all features are correctly implemented. Consequently, the customer needs to trust Box, Inc. to host its data responsibly. To maintain trust in its services Box complies with accepted standards including ISO 27001, ISO 27018, SOC 1 to 3, FedRAMP and FIPS 140-2. As Box is a centralized solution it is not censorship resistant.

3.2.2 Dropbox Dropbox [4] is a centralized file-sharing platform for consumer and businesses. It offers services for file-sharing, synchronization, collaboration, and fine-grained access control. It encrypts, similar to Box, data in transit with TLS and data at rest with

28 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK

AES-256. To minimize network overhead data is split into blocks and only changed blocks are synchronized. Metadata and block data are uncoupled and stored at different places to increase security. Hashes and redundancy ensure data integrity. Dropbox doesn’t offer a client to manage its own encryption keys. As Dropbox Inc. is a U.S. company it can be forced by law to decrypt files for government agencies (cf. [8]). Consequently, security, privacy and anonymity are not achieved. Certificate pinning is used to verify the identity of Dropbox’s servers. In addition, Dropbox uses logging and version control to ensure accountability and technical non-repudiation. It further supports its own identity management system and third-party directories to ensure authenticity. Users are identified by their e-mail addresses. Independent third-party audits, vulnerability reward programs, and compliance with accepted standards2 are used to maintain trust in Dropbox services. Additional services provided include perfect forward security for HTTPs sessions, transparency reports for government data requests, in-region storage for businesses in Europe, two-factor authentication, local network synchronization, disaster recovery plans and practices, support for single-sign-on through a third-party identity provider, data loss prevention, remote wipe support for stolen mobile devices, and development APIs. [74]

3.2.3 Nextcloud Nextcloud [75] is an open-source and secure centralized file-sharing platform that can be hosted at anyone’s own premises. It offers services for collaboration and for file and calendar sharing and synchronization. Nextcloud supports internal as well as external storage services like local hard drives, AWS buckets, Dropbox, or . It offers transport security through TLS and optional server side AES-256 encryption for data at rest. In its next version Nextcloud will also support end-to-end encryption where the client is in charge of its own encryption keys. Here, digital enveloping will be used for the realization. [76] Once a client activates end-to-end encryption an asymmetric key-pair will be created. Nextcloud’s server will act as PKI root authority and issue a public key certificate based on the public key the client uploads. The private key will be symmetric encrypted with a 12 word long mnemonic and uploaded to the server. This 12 word long mnemonic needs to be remembered by the client to add new devices to its account. This is for convenience and seems to be less error prone than remembering the whole private key. End-to-end encrypted files will be encrpyted with a random AES-256 password. This password will be stored in a metadata file which will be encrpyted with the public keys of the persons allowed to access the file. Next to end-to-end encryption, Nextcloud further supports various security features, including single-sign-on, two-factor authentication and thrid-party active directory and authentication support. Bug bounty programs, external code audits and reviews, and compliance with industry standards like ISO 27001 clause 14 perfect Nextcloud’s

2i.e. ISO 27001, ISO 27017, ISO 27018, SOC 1 to 3, . . .

29 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK security offering. [77] Nextcloud’s open source code basis allows it to be easily extended. In its various third-party extensions can be found. These can add amongst others support for decentralized storage providers, like Sia (cf. section 3.3.1), or integrate other projects like Draw.io. As Nextcloud needs access to metadata and user ids to operate it doesn’t offer privacy or anonymity. Unfortunately, no information could be found about non-repudiation and accountability. Authenticity of authors is supported through Nextcloud’s access control mechanisms that identify a user based on her e-mail address.

3.2.4 Citrix RightSignature Citrix RightSignature [5] is a centralized document-sharing platform to collect legally binding electronic signatures by adhering the U.S. E-Sign Act and Uniform Electronic Transactions Act. Citrix uses TLS for transport security and relies on Amazon’s datacentre redundancy and physical access control to protect data from loss and being accessed in an unauthorized manner. There is no additional encryption of data at rest. [78] Complex hash algorithms are used to guarantee data integrity of both signed documents and its complete audit log of interactions. The complete audit log stores who interacted when and how with the shared document and therefore guarantees accountability. Citrix RightSignature uses its own proprietary identity authentication system. In- stead of relying on certificates or username password combinations parties are identified and authenticated by multiple factors. These include the e-mail address used to open the document (a unique document link is sent to every party via e-mail), a biometric signature analysis (based on unique characteristics related to the speed and timing of a person’s signature given) and the IP address captured. In addition, unique identifiers of a signing party like its face through a webcam, its phone number through a challenge response protocol, or its driver’s license or passport number can be used to verify an identity. All these factors are added to the complete audit log to ensure the authenticity of authors and non-repudiation. [79, 80] Citrix RightSignature doesn’t meet the requirements for security, privacy, and an- onymity as amongst other data is stored unencrypted in a third-party datacentre and Citrix is a U.S. company and therefore bound to U.S. jurisdiction. In addition, Citrix RightSignature is a closed-source ecosystem and uses a proprietary identity authentication system. It further doesn’t state any compliance with accepted stand- ards. In conclusion, any party needs to trust Citrix RightSignature not behaving malicious and that they implemented their services correctly.

3.3 Decentralized data storage and file-sharing services

Instead of renting storage from one single provider users in decentralized data storage services rent storage from each other. Due to their decentralized design decentralized

30 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK storage solutions don’t rely on a single trusted third-party to be operational and as a result are more censorship resistant than centralized solutions. On the other hand, different mechanisms need to be implemented to guarantee data security, privacy, reliability, and availability when storing files at multiple untrusted third- parties. Especially incentives to participate in the system, ways to deal with failing or inaccessible nodes, and the shift from server-side security to client-side security provision need to be considered. In the following two decentralized data storage services and one decentralized file- sharing services will be analysed with regards to their design, features and information security services provided.

3.3.1 Sia Sia [81] is a software developed by Nebulous Inc. for decentralized data storage. Currently it is only useful for archival purposes, but in a future release also file- sharing functionality between Sia users will be implemented [82]. Sia uses a distributed ledger similar to Bitcoin to govern payments for provided storage in its internal currency Siacoin. Contrary to Bitcoin, transactions in Sia can’t execute scripts. Instead they can contain storage contracts, storage proofs and up to 64KB of arbitrary data. Storage contracts are agreements between users that hold the amount of data to store, price, and duration. They further include a deposit of both client and host. Storage contracts are updated in revisions. Only the contract with the highest revision number that is singed by every contracting party is valid. This has the benefit that only the first and latest revision need to be committed to the blockchain. All other revisions can be negotiated between the contracting parties offside the blockchain as all funds are in escrow. In addition, a Merkle root hash of the stored data is updated every revision in the storage contract. The Merkle root hash is one key ingredient for the proof of storage a host has to provide. Before data is uploaded it is split into chunks of 4MB and each chunk is individually encrypted. The Merkle root hash is built using consecutive segments of 64Bytes as leaves. To proof the storage a random segment is depict from the block prior termination of the contract and the total amount of data stored. The host has to commit that segment and its membership proof to the blockchain as proof of storage. Settled siacoins will be transferred from the client to the host only if the proof of storage was successful. Otherwise deposited siacoins will be sent to contractually agreed penalty addresses. The 64KB arbitrary data field of Sia transactions is used by hosts to advertise their storage conditions and prices. Developers can further use it to build applications on top of the Sia blockchain. [83, 84, 85]

Regarding defined information security services Sia offers data integrity by us- ing Merkle root hashes, data confidentiality by using client-side encryption and therefore data security. Privacy isn’t achieved. Even though users communicate

31 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK only with their public Siacoin addresses and therefore achieve pseudonymity on blockchain transactions, no anonymity or proxy network is used to hide their IP address when interacting with hosts or miners. To achieve availability Reed-Solomon error correction codes [86] are used. Per default data is uploaded to 30 different hosts and only 10 are required to restore the data. Sia’s code base is open source and makes the service verifiable. Escrow and storage proofs guarantee that hosts will get paid and clients get refunded if they were cheated. Therefore, no trust between participating parties is required. As Sia is currently only a data storage service and not a file-sharing service non- repudiation, authenticity of authors, accountability and collaboration are not applic- able.

3.3.2 Storj Storj [87] is a secure decentralized data storage developed by Storj Labs Inc. that doesn’t offer any file-sharing services. Similar to Sia data is encrypted client-side before it is uploaded. Data is sharded into chunks of fixed size to preserve metadata- privacy. Consequently, a host can’t infer from the chunk size what type of information was uploaded. Chunks of files are uploaded to multiple hosts (per default three) to achieve availability in case hosts are inaccessible. Proofs of retrievability are used to guarantee that a host is still storing a file. Merkle trees and Merkle proofs are used in the implementation. Storj supports complete and partial audits of chunks. Both use a challenge response protocol where a pre-defined salt is sent to the host who uses this salt to generate a membership proof. To validate a proof of retrievability a client only has to remember the set of salts belonging to a file, its Merkle root hash and the depth of the Merkle tree. Storj doesn’t rely on distributed ledger technology to store and communicate metadata. Instead distributed hash tables (DHTs) are used. Storj uses and extends the Kademlia protocol [88] for efficient message routing between its users. A user creates an ECDSA keypair equivalent to Bitcoin wallets so that its node id equival- ents its wallet address. These Bitcoin addresses can be used for instant payments for storage contracts, file downloads and proofs of retrievability. However, in its reference implementation Storj will use its own cryptocurrency Storjcoin. For the duration of a contract money is hold escrow in a micropayment channel between client and host. Proofs of retrievability in Storj are only sent between the contracting parties without non-repudiation. Thus, they can’t be publicly verified. Therefore, penalties for malicious behaviour can’t be enforced as no party can prove the wrong behaviour. It also complicates the development of reliable reputation systems as an important reliability metric isn’t publicly available. Without the knowledge about who is trustworthy and who isn’t there is no certainty that a host storing important data won’t be inaccessible in the near future. Therefore, a client needs to regularly check if all chunks are still retrievable from all hosts through proofs of retrievability. In case a host is inaccessible the client needs to

32 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK initiate a re-upload of the related chunks to restore redundancy. Therefore, it is suggested that every client permanently runs a so called “bridge service” that takes care of the contract negotiation, audit insurance and verification, payments, and file monitoring. [89]

Regarding information security services Storj provides data security by achiev- ing data confidentiality through client-side encryption and data integrity through hashing in Merkle trees. Privacy and anonymity aren’t achieved as clients’ IP addresses aren’t anonymized through proxy or anonymization networks. Storj’s open source code basis and its decentralized design allow participants to don’t trust any third party. As Storj is only a data storage service and not a file-sharing service non-repudiation, authenticity of authors, accountability and collaboration are not applicable.

3.3.3 SecuRES “SecuRES: Secure Resource Sharing System” [90] is the title of the bachelor’s thesis of D. Svensson and P. Leung. The authors investigated in 2015 to what extend public ledger technology can be used to create a decentralized digital resource sharing system. They especially focussed on non-repudiation, data confidentiality and data integrity, and complemented their work with a prototype implementation. In their prototype they combined concepts of the Bitcoin blockchain and Storj’s decentralized data storage. Metadata is stored in transactions in the blockchain and files are sharded into chunks, encrypted and uploaded to storage nodes. Transactions contain amongst others the sender, recipients, file creation time, file description, and the with the recipient’s public key encrypted file decryption key. In SecuRES’s case both client and storage nodes are responsible for monitoring the file state by using proofs of retrievability. Due to their limited time and resources the authors weren’t able to specify any IMS. Consequently, users are recognized by their wallet address only. They further had no time to think about incentives to participate in the system. Nonetheless, their developed prototype concept shows that decentralized file-sharing based on public ledger technology is possible. Even though the prototype isn’t publicly available the authors achieved non-repudiation by using the blockchain as metadata storage, data integrity by using file hashing in Merkle trees, and data confidentiality by using client-side encryption and digital enveloping of encryption keys. Privacy and anonymity weren’t achieved as not all metadata in the blockchain is encrypted. Everyone can reproduce which user was interacting with whom by looking at the transaction input and output addresses even though wallets are pseudonyms. In addition, IP addresses weren’t anonymized. Therefore, storage nodes can infer from the IP address used for uploading and downloading a chunk who interacted with each other. The authenticity of authors is given as each transaction needed to be signed by its sender, even though no identification mechanism during the registration is specified. The accountability property is achieved through the

33 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK linkage of transactions of the same file in the blockchain. Though, the availability of the system is unknown as important parts as incentives aren’t implemented and prototype tests weren’t performed.

3.4 Miscellaneous

In this section not solely centralized or decentralized file-sharing alternatives to exchange documents are discussed.

3.4.1 Secure E-Mail through OpenPGP OpenPGP [91] is a security software that provides information security services for messages and data files, key management services, and certificate services. Applied to e-mail OpenPGP offers data security by providing data confidentiality through encryption and data integrity through digital signatures. A prerequisite of using OpenPGP is the successful exchange of public keys and belonging e-mail addresses. Otherwise digital envelopes can’t be encrypted, and digital signatures can’t be validated. Public keys can be acquired from various key servers. Unfortunately, these usually don’t provide any identification. Therefore, identities behind acquired keys still need to be personally verified through an external channel. As OpenPGP is only able to encrypt the content of e-mails, its metadata and ids are still available to the mail relays responsible for the exchange. Therefore, privacy and anonymity can’t be achieved. A recently discovered attack called efail [92], exploits OpenPGP’s solely content encryption, and is able to undermine its confidentiality on certain client implementations. Further doesn’t OpenPGP implement any mechanisms to force a recipient to confirm the receipt of a message. Consequently, non-repudiation and accountability aren’t feasible.

3.5 Summary

Table 3.3 below summarizes the results of preceding security assesment and shows that none of the analysed document exchange systems supports the defined sensitivity levels private, anonymous, tracking, and business.

34 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK

Document category / level of sensitivity secure private anonymous tracking business Box      Dropbox      Nextcloud      RightSignature      Sia n.a. n.a. n.a. n.a. n.a. Stroj n.a. n.a. n.a. n.a. n.a. SecuRes      Secure E-Mail     

Table 3.3: Compliance of analysed document-sharing systems with defined document sensitivity levels in section 3.1. Following notation is used: : supported, : not supported, and n.a.: not applicable.

Table 3.4, on the next page, gives a detailed overview about provided information security services on each analysed system. It further states additional features like censorship resistance or the need to trust a third-party.

3.6 Conclusion

To conclude, the initial assumption of lacking information security services in current document exchange systems was proven and shows the demand for practical alternatives. Based on the analysis of three use cases, five sensitivity categories of documents with different needs of information security services have been extracted. Afterwards four centralized document sharing systems, two decentralized data storage services, one decentralized data sharing service, and document exchange through secure e-mail were analysed with regards to the compliance of defined sensitivity categories. None of the analysed DSS was able to support all four levels of sensitivity. Indeed, only three of the analysed eight systems were able to support the most basic document category secure. The more advanced categories private, anonymous, tracking, and business weren’t supported by any DSS. Analysed systems mainly lack support for privacy and anonymity on which the more sensitive categories are built upon. The main limitations are the leak of information through non-anonymized network connections, and centralized metadata storage.

35 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK             /  Secure E-Mail             /  SecuRes           n.a. n.a. Storj Decentralized           Sia n.a. n.a.             /  Right- Signature          / unk. unk. unk. Next-  cloud  Centralized            /  Dropbox : not supported, unk.: unknown, and n.a.: not applicable.              / Box  : supported,  Security Privacy Anonymity Non-repudiation Legal non-repudiation Authenticity of sender / recipient Accountability Information buffering Censorship resistant Verifiable source code Thrid-party audit No third-party trust notation is used: Table 3.4: Overview of the analysed systems with regards to the information security services and features provided. Following

36 CHAPTER 3. SECURITY ASSESSMENT OF RELATED WORK

The analysis showed further that it is difficult to precisely define and verify inform- ation security services with only little information about the document exchange systems to analyse. Especially when it comes to specifying legal parties that are accepted to have access to the data. Another result of the analysis is that legal non-repudiation and identity verifica- tion are hard to implement. Only Citrix RightSignature is able to provide legal non-repudiation through its proprietary identification process. All other services either achieved only technical non-repudiation or none. The problem with technical non-repudiation though is that an entity can always claim that it lost its identifica- tion credentials in a public place and someone else used them in its name (cf. [25, p.111]).

Centralized DSS are characterized by a central trusted arbitrator between the exchanging parties. This arbitrator is mainly responsible for the identity manage- ment, authentication, redundant file buffering in case an exchanging party is offline, security provision through encryption and hashing, and non-repudiation through a log of actions. The main limitations result from the centralized structure of such systems. Clients rely on the correct server-side security provision and implementation and have no possibility of direct control, even though independent third-party audits and certifications try to fill this gap. Dropbox’s transparency reports [8] further show that centralized services are forced, by jurisdictions in which they operate, to decrypt data for federal agencies. Central arbitrators are a single point of failure that can be used for censorship.

Decentralized systems try to avoid this single point of failure by distributing re- sponsibilities evenly through the system. Thereby, new mechanisms for decentralized identity management, authentication, security provision, non-repudiation and file buffering are implemented. They usually rely on distributed ledger technology, public key encryption and digital enveloping. Unfortunately, none of the analysed decentralized systems in the market currently offers services for document exchange, only for decentralized storage. Discovered shortcomings of distributed DSS are the identity validation of transaction partners with regards to legal non-repudiation, the redundant buffering of information at untrusted and unreliable intermediaries, and incentives to participate in the system to provide a consistent service.

37 Chapter 4

Prototype specification

In this chapter the prototype specification of a document sharing system for individu- als and groups, called docShare, will be sketched. It should support the exchange of documents of the, in section 3.1 defined, sensitivity categories secure, private, anonymous, tracking, and business. docShare’s implementation will be handled in chapter5, and its evaluation in chapter6.

Based on the analysis of existing document exchange systems in chapter3, three overall requirements for docShare can be extracted. First, docShare needs to be designed as distributed system to avoid censorship through single points of failure and to strengthen the overall robustness of the system. Second, IP addresses and other identifying attributes need to be masqueraded when interacting with entities not explicitly chosen to exchange information with to preserve privacy. Third, the source code and design specifications of docShare need to be publicly available for everyone to verify its functionality. However, due to the dual-use property of private and anonymous document exchange technology (cf. section 1.6) the source code of docShare won’t be publicly available in an online repository. Instead, interested readers are asked to obtain a copy of the source code attached to appendixC either from KTH or the author directly.

4.1 Architecture

The architecture of docShare combines two public decentralized services to achieve privacy and anonymity for the exchange of documents. A decentralized anonym- ization network, i.e. Tor, will be used for anonymous routing of packages and for providing an infrastructure for location hidden services operated by the clients. A distributed ledger will be used for decentralized identity management and will map these hidden services to each user. As depict in figure 4.1 parties will communicate directly with each other when they are online. There will be no document buffering at an intermediary, even though this feature could be easily extended. The IMS is depict as smart contract within the Ethereum ecosystem. This is only an example.

38 CHAPTER 4. PROTOTYPE SPECIFICATION

Bob 24 Carol 71

docShare client docShare client

def.onion ghi.onion exchange

private document 3 8 exchange document

anonymous abc.onion Alice Dave identification and docShare client docShare client docShare jkl.onion document exchange setup e.g. Tor e.g. Anonymization network Anonymization IMS entry fetching Exit node Exit

...... updates ...

IMS entry IMS entry fetching publ. key d34fGh3x b49lkf3ye ... jkl.onion identifier e.g. Ethereume.g. abc.onion IMS as smart contract Figure 4.1: Overview of docShare’s architecture and functionality. 3 8 Public Public distributed ledger id ...

Miner Miner Miner

39 CHAPTER 4. PROTOTYPE SPECIFICATION

Any distributed ledger based IMS could be used in the implementation.

4.1.1 Identity management The decentralized identity management will be realized similar to the decentralized public key infrastructure [31] in a public distributed ledger. This has the benefit that every user can create an identity whenever she likes to and is further directly in charge of her identity entry and management. Each entry in the IMS describes a receiving service of the respective entity in the anonymization network, i.e. an .onion address, and includes the entity’s public key for symmetric key exchange and signature verification. Both attributes are bound to a unique user identifier which could simply be an integer. Every user identifier can be claimed by any entity during the registration in a first come first serve basis. The IMS should also allow optional fields that can be defined by the client herself. These include her name, telephone number, expiration date of the account, time-periods when she will be available to receive documents, or her willingness to accept anonymous documents. Using a public distributed ledger as identity management system implies that the legal identity of each entity can’t be verified during the registration. As a result, a client needs to verify the legal identity behind an identifier before she can be sure with whom she is communicating. This could be done through an external channel, or through video chat where both parties show each other their face, passport, and a proof of actuality, e.g. the daily newspaper. Once the legal identity is verified the binding between identifier and identity can be saved locally on each client’s computer to avoid future legal identity verifications for that entity. Only in case of the exchange of documents of sensitivity business, where legal non-repudiation is required, legal identity verification needs to be performed again before the document exchange. Otherwise a transaction partner could claim that someone else used his authentication credentials.

Decentralized identity management systems based on public distributed ledger technology can be realized in two variants. They can be either built as independent distributed ledger or built on top of an existing distributed ledger, e.g. as Dapp in Ethereum. The realization as independent distributed ledger has the benefit that all aspects can be hand tailored to the purpose of the decentralized DSS. Therefore, there won’t be much overhead with regards to resource consumption. Further would the blockchain growth be proportional to the usage of the system. On the other hand, incentives to participate in the system and mining of blocks needs to be taken care of. Existing distributed ledgers already have an active community to secure their network. On the downside, in this case, the user might be required to download a huge transaction log to get started. The transaction log would further not increase linear with the usage of the DSS as many applications are using the distributed ledger. With regards to docShare’s concept and information security services only the dis-

40 CHAPTER 4. PROTOTYPE SPECIFICATION tributed ledgers’ properties of a decentralized, byzantine fault tolerant, append-only log of transactions are relevant. As the prototype implementation will be further only seen as proof-of-concept, the decision of which of both variants to use will be justified by the ease of implementation and available systems on the market.

4.1.2 Anonymization network The anonymization network has two responsibilities in docShare. Its first responsibil- ity is to masquerade network traffic between the docShare clients and the distributed IMS during the registration and update of entries to preserve anonymity. In case the IMS is built upon an existing distributed ledger the transaction update fetching can be done without using the anonymization network as users can’t be correlated to docShare based on their blockchain download pattern. All updates need to be fetched, like for every other user, to verify the latest state. In case a dedicated ledger is used for the IMS the anonymization network should be used for fetching new entries as here a correlation is possible. Its second responsibility is the provision of the infrastructure for location hidden services operated by each docShare client. Using hidden services allows the client to anonymously communicate with other entities. Through hidden services documents can be exchanged anonymously, entities can carefully prove their identity to each other, and private channels outside the anonymization network for faster data transfer can be negotiated. Tor will be used in the realization as it offers both traffic anonymization and location hidden services, but any other anonymization network that complies to both require- ments would be sufficient. Location hidden services have another useful feature. They are operational behind NAT gateways. Therefore, configuration changes in the network infrastructure aren’t necessary.

4.1.3 docShare client The docShare client is the main software that needs to be installed to transfer documents via docShare. It offers interfaces to interact with the anonymization network and the distributed IMS. Consequently, it depends on installed instances of Tor and the distributed IMS client. The docShare client is responsible for the management and verification of the legal identities behind the entries in the IMS, and the exchange of documents of the sensitivity categories secure, private, anonymous, tracking, and business. In the realization the docShare client provides hidden services from which anonymous communication between the entities can be established. Access control mechanisms, based on filter lists of allowed and blocked entities, regulate from whom to receive documents and messages to avoid denial of service attempts. The docShare client is further responsible for the client-side encryption of documents to be exchanged. Digital enveloping will be used to support the exchange of files within groups. Each docShare client maintains its own database of verified legal identities behind the entities in the IMS. Entities can be verified through external channels, i.e. by

41 CHAPTER 4. PROTOTYPE SPECIFICATION personally exchanging the entities identifier in the IMS, or via the docShare client itself. Here both entities establish a video or message only conference, and within this conference convince each other about their identities. The docShare client is further responsible for the registration and management of the client’s own identity in the distributed IMS. Details about used communication protocols are handled in the next section.

4.2 Protocols

Following communication protocols are used to maintain docShare’s functionality. Communication partners are Alice, the first client, Bob, the second client, Carol, the third client, and the IMS.

4.2.1 User registration in the IMS To register a new entity in the IMS following protocol should be used between Alice and the IMS. If a dedicated IMS is used, the download of new transactions from the IMS is performed through the anonymization network.

1. Alice creates an asymmetric key-pair that can be used for encryption and digital signatures. 2. Alice configures her anonymization network and IMS client software and caches her hidden service identifier. 3. Alice downloads all updates from the IMS and reproduces its latest state. 4. Alice creates an IMS entry with a random identifier that hasn’t occurred before and signs it with her private key. This entry must comprise the random identifier, Alice’s public key and her hidden service identifier. Optional fields, like Alice’s name and time periods of availability, can be added. 5. Alice transfers the signed entry via the anonymization network to the IMS. 6. The IMS receives Alice’s signed entry, validates its signature, and checks if the claimed identifier wasn’t used before. If both requirements are met her entry is added to the next transaction. Otherwise, Alice’s request is dropped. 7. Alice waits for and downloads the next six updates from the IMS and checks if her entry is included in the latest state. If not steps 3 to 7 are repeated.

In this protocol steps 1 and 2 could be executed in parallel.

4.2.2 Legal identity verification of an entity in the IMS The legal identity verification of an entity in the IMS can be done in various ways and mainly depends on how much both entities know and trust each other. The identity verification process needs to protect both entities from falsely identifying the other party and from leaking own identity information in case the opposing

42 CHAPTER 4. PROTOTYPE SPECIFICATION party is malicious. Two basic cases can be differentiated. In the first case Alice and Bob know each other and have an external channel to communicate with to verify their identities. In the second case Alice and Bob might know each other but have no external channel from the beginning. It should be noted that both forms of identification can’t be used for legal non- repudiation. For legal non-repudiation the identification also needs to be stored in a tamper free data structure by both entities. More information can be found in section 4.2.6.

Verification through an external channel Following protocol should be used by Alice and Bob to verify each other’s legal identity with the help of an external channel. This external channel could be a personal meeting between the two parties or a phone call where both parties can identify the other by his/her voice. 1. Alice and Bob communicate their IMS identifier and a one-time secret (per person) to each other through the external channel. 2. Alice and Bob both whitelist each other’s identifier in their docShare client to allow message chatting with the other party. 3. Alice initiates a, with Bob’s IMS mentioned public key, encrypted chat to Bob through Bob’s hidden service. 4. Alice and Bob verify their IMS identifier by telling each other the opponents one-time secret. 5. If the verification was successful, Alice marks Bob’s identifier as verified on her computer and Bob marks Alice’s identifier as verified on his computer.

Verification without external channel Without external possibility to verify each other’s identity Alice and Bob have to verify each other through docShare. This arises the problem that a malicious entity could trick Alice into revealing her identity without showing his. In this case he would gain knowledge that Alice actively uses docShare, and about her hidden service identifier. Therefore, the author encourages the users to be extremely cautious when verifying their own identity with this method. Following protocol should be used by Alice and Bob. 1. Alice finds Bob’s identifier in the IMS, whitelists him for message chatting, and sends Bob an encrypted request through Bob’s hidden service that she would like to identify herself to Bob. 2. Bob who doesn’t know yet Alice can either accepts her offer or not based on his own trust preferences and Alice’s entry in the IMS. If Bob accepts he initiates an encrypted chat with Alice through Alice’s hidden service. Otherwise the protocol terminates.

43 CHAPTER 4. PROTOTYPE SPECIFICATION

3. During the chat Alice convinces Bob to a certain degree of being Alice and Bob convinces Alice of being Bob. If both partners are confident enough they initiate a video conference to remove the last doubts about their identities. It is recommended that every party shows the other a prove of actuality, e.g. today’s newspaper, to ensure the other party of not viewing a recording. 4. If the verification was successful, Alice marks Bob’s identifier as verified on her computer and Bob marks Alice’s identifier as verified on his computer.

4.2.3 Data exchange with compliance to private Following protocol describes how Alice can privately share a document with Bob and Carol. It assumes that Alice successfully verified her identity to Bob and Carol and vice versa.

1. Alice digitally envelopes the digitally signed document to share, so that only Bob and Carol can decrypt it, and verify its integrity and authenticity. 2. Alice creates a random uniform resource locator (URL) for the digitally envel- oped document and publishes it in her hidden service or public private service if available. 3. Alice sends the URL to Bob and Carol through their hidden services. 4. Bob and Carol fetch the enveloped document from the URL, decrypt it, and verify its integrity and authenticity.

It should be noted that documents exchanged with regards to private also meet the requirements for secure.

4.2.4 Data exchange with compliance to anonymous Following protocol describes how Alice can anonymously share a document with Bob. It assumes that Bob allows the anonymous receipt of documents through his hidden service and that Alice successfully verified Bob’s identity, e.g. through a trusted third party.

1. Alice hashes the document to share and digitally envelopes the hash and document, so that only Bob can decrypt it. 2. Alice uploads the digitally enveloped document to Bob’s hidden service. 3. Bob decrypts the digitally enveloped document and verifies its integrity by using the included hash.

4.2.5 Data exchange with compliance to tracking There are principally three ways to achieve non-repudiation which is the main require- ment for tracking. Non-repudiation can be achieved through a trusted arbitrator who takes an active role in the document exchange, through direct document exchange

44 CHAPTER 4. PROTOTYPE SPECIFICATION via oblivious transfer [25, p.166ff], and through optimistic document exchange where an arbitrator is only involved if one party is dishonest.

Non-repudiation through a trusted active arbitrator The ISO/IEC 13888 standards [93, 94, 95] provide a straightforward solution to non-repudiation. They involve a trusted third party which receives the expectations of both communication parties, the document to exchange and its receipt. If both document and receipt are received by the third party and the expectations are met they are forwarded to the sender and receiver. There are several drawbacks of this solution. First, data collected at the third party could deanonymize the communication partners and therefore violate privacy and anonymity. Second, the third party is always involved in the exchange even if both parties are honest. Third, involving a third party can lead to performance bottlenecks if its computational resources aren’t sufficient. Forth, trusted third parties are a good way to introduce censorship in a system. Therefore, trusted third parties won’t be used in docShare to ensure non-repudiation.

Non-repudiation through oblivious transfer Another way to exchange documents for receipts was shown by Even et al. [96] The authors describe a protocol for certified mail that relies on oblivious transfer to keep both parties honest and assumes that both parties have approximately the same computational resources. In oblivious transfer a sender sends one out of two recognizable messages to a recipient but doesn’t know which message is received. The document exchange protocol works like this.

1. Alice encrypts the document d with a one-time key kd and sends the encrypted document kd(d) to Bob.

2. Alice creates n key pairs kpAn(k1,k2) where k1 is chosen randomly and k2 is the XOR of kd and k1. 3. Alice creates a dummy document dd, copies it 2n-1 times, encrypts each one with a key of the n key pairs and sends them to Bob.

4. Bob generates n key pairs kpBn(k1,k2) and n unique receipts each with a left half and right half.

5. Bob encrypts the receipts with the n key pairs, k1 is used for the left half and k2 for the right half and sends them to Alice. 6. Alice and Bob send each other one of both keys of the n key pairs through oblivious transfer, decrypt the halves they can and make sure that they are valid. 7. Alice and Bob send each other the first bits of all 2n keys and verify that the n first bits of the already known keys are equal.

45 CHAPTER 4. PROTOTYPE SPECIFICATION

8. Step 7 is repeated for the second bits, third bits, etc. until all keys have been transferred. 9. Alice and Bob decrypt the remaining halves of the received messages. Alice has n receipts from Bob, and Bob can XOR any key-pair to get the decryption key for d. 10. Alice and Bob exchange the private keys used during the oblivious transfer and verify that the other party did not cheat.

Alice could cheat in step two and use a different key kx instead of kd to generate k2. Bob would still be able to decrypt the dummy document, but has no possibility, until step 9, to detect that Alice has cheated. Therefore, his receipt is only one part of the complete receipt. The other part is Alice’s proof that each of the key pairs she sent to Bob yields to kd. Unfortunately, Even et al. don’t describe how such proof should look like. In case of a conflict an arbitrator probably demands from Bob to show him all of Alice’s key pairs to validate that Alice sent the correct keys. This requires from Bob to store the key pairs received indefinitely. If Bob loses the received key pairs, accidentally or in purpose, there is no way to prove if Alice’s receipt is valid or not. Nonetheless, as Alice still has a (valid) receipt from Bob, Bob can’t decline that a transaction happened between them. Due to the uncertainty how Alice can proof that each sent key-pair yields to kd, oblivious transfer won’t be used in docShare to ensure non-repudiation.

Non-repudiation through optimistic protocols Optimistic protocols for non-repudiation only require a trusted arbitrator in case one of the communication parties is dishonest. In the usual case that both com- munication partners are honest, the communication takes place directly between the communication partners. This has two benefits compared to non-repudiation through a trusted active arbitrator. First, in the usual honest case no additional metadata, that could violate privacy or anonymity, is stored at any third party. This leads to performance benefits and no single point of failure. Second, in the case of a dishonest party a distributed ledger can be used as trusted third party. Only one of the communication partners needs to actively interact with the distributed ledger. The other just needs to fetch updates. Therefore, privacy requirements can be met easily. To address privacy requirements of tracking docShare will use an adapted version of the optimistic non-repudiation protocol described in [97] and [98]. A distributed ledger will be used as trusted arbitrator to harden docShare against censorship. Similar to the IMS the trusted arbitrator can be realized as independent distributed ledger or built upon an existing distributed ledger. Its actual realization in the prototype will be justified by the ease of implementation and available systems in the market.

46 CHAPTER 4. PROTOTYPE SPECIFICATION

The adapted optimistic protocol for non-repudiation works like this:

1. Alice creates a one-time key kd and uses it to encrypt the document d.

2. Alice and Bob negotiate a temporary transport key kt to encrypt their message transfer.

3. Alice sends a with kt encrypted, signed message to Bob including kd(d), its hash h(kd(d)), d’s description, and a deadline td until Alice expects Bob’s receipt. The message equivalents to following question: “Bob, would you like to receive this document with following description and commit to sign its receipt until deadline td?” 4. Bob decrypts the message, verifies Alice’s signature, and reads d’s description. If he doesn’t want to receive the document the protocol terminates. Otherwise Bob sends a with kt encrypted, signed message back to Alice including “yes”, d’s description, h(kd(d)), td, a fallback reference rf , a fallback symmetric encryption key kf , and a fallback deadline tf . The message equivalents to following statement: “Alice, yes I would like to receive the described document and commit to sign its receipt until deadline td. If I don’t sign the receipt until td, I acknowledge the receipt of d, if you upload the with kf encrypted document’s decryption key kd under the reference rf in the distributed ledger until tf .” 5. Alice decrypts the message, verifies Bob’s signature, and the messages validity. This includes that the proper document is cited and that tf is in reasonable future. If all requirements are met, Alice sends a with kt encrypted, signed answer to Bob including d’s decryption key kd and h(kd(d)).

6. Bob decrypts the message, verifies Alice’s signature, and uses kd to decrypt the document. Afterwards he sends a with kt encrypted, singed receipt back to Alice. It states the word “receipt”, and a copy of the received message in step 5.

7. If Alice hasn’t received a receipt from Bob until td, she uploads the with kf encrypted document’s decryption key kd under the reference rf to the distributed ledger.

4.2.6 Data exchange with compliance to business Business has two additional requirements to tracking. Namely legal non-repudiation and accountability. Accountability can be archived by versioning changes to every document. Documents could reference the hash of their predecessor in their header and build a hash linked list, reflecting the changes made in every iteration. To achieve legal non-repudiation both communication partners need to legally identify

47 CHAPTER 4. PROTOTYPE SPECIFICATION each other before each transfer, map the identification to each others public keys, and store that identification process in a tamper free data structure in case a dispute arises. As additional requirement the combination of private key and any identification recording mustn’t be exploited to generate new valid identification recordings. Otherwise a malicious party could publish its private key in public domain, and claim that another party that already obtained a valid identification recording from a former exchange, identified itself as the malicious party. As legal non-repudiation depends on the legal jurisdiction both exchange partners agree to operate in, it needs to obey that jurisdiction’s guidelines and data privacy regulations. This makes it a very interesting and challenging problem to solve. Unfortunately, the author hasn’t found any solution to the problem yet. Therefore, contrary to initial intended, data exchange with compliance to business won’t be supported in docShare.

48 Chapter 5

Prototype implementation

In this chapter a reference implementation of docShare, regarding to its design specification in chapter4, is described. The accompanying source code and test environment virtual machines can be found in appendixC. The installation and usage guide in appendixB has more information about the usage of the prototype and the used software environment to reproduce this research.

5.1 Limitations to the specification

Due to its purpose as rudimentary proof of concept not all features mentioned in the specification are implemented. Only those necessary to validate the initial assumption, that the exchange of documents of different levels of sensitivity, as described in section 3.1, is possible in a decentralized and censorship resistant fashion using distributed ledger technology, are implemented.

Therefore, the implementation underlies following limitations:

• Only one mode of legal identity verification, through an external channel, is supported. Legal identity verification without external channel using chat and video conference won’t be implemented. (cf. section 4.2.2)

• Only one channel of document sharing, through the anonymization network, is supported. Faster private document sharing through a public peer-to-peer connection won’t be implemented. (cf. section 4.2.3)

5.2 Public identity management service

The public identity management service is realized as Ethereum smart contract. The contract serves as key value store where users can upload their identity information. An identity consists of the Tor hidden service address used to operate the anonymous docShare endpoints, the public RSA key used for digital enveloping during document exchange, and optional fields a user can define for herself. Every time a user creates

49 CHAPTER 5. PROTOTYPE IMPLEMENTATION a new identity a unique identity identifier in form of an integer is assigned by the smart contract. A web-frontend was created to interact with the smart contact. As shown in figure 5.1 users can use the frontend to lookup identity information, update their own identity information, and permanently deactivate their identity in case their Ethereum wallet used to create the identity got compromised.

Figure 5.1: Screenshot of docShare’s public identity management system’s web- frontend.

5.3 Key value storage for tracking

The key value store to store encrypted decryption keys in case a sender didn’t receive a receipt for documents of sensitivity tracking (cf. section 4.2.5) is also realized as Ethereum smart contract. The contract has procedures to store arbitrary string tuples in Ethereum’s blockchain and to query these strings using the first string as key.

50 CHAPTER 5. PROTOTYPE IMPLEMENTATION

A web-frontend (cf. figure 5.2) was built to access the information stored in the key value store and shows the block timestamp of a key value tuples creation.

Figure 5.2: Screenshot of docShare’s key value store’s web-frontend.

5.4 Anonymization network

Tor is used to masquerade docShare’s connection metadata and to provision the infrastructure for location hidden services. To facilitate Tor’s location hidden services, it needs to be configured to forward docShare’s service endpoints listening to ports 2342, 8888, and 9999 into the Tor network. The Tor software is further responsible for end-to-end encrypting the communication channels of provided hidden services by following Tor’s rendezvous protocol.

5.5 docShare client

The docShare client is the main program a user interacts with to exchange documents of different levels of sensitivity. It’s written in Python 3 and consists of four main components:

• a SQLite3 database for local metadata storage,

• three network daemons that form the Tor endpoints for network communication,

• 13 terminal scripts to share documents through the Tor endpoints, to visualize information, and to manage the identity verification, and

51 CHAPTER 5. PROTOTYPE IMPLEMENTATION

• the docShare library that bundles common configurations, cryptographic functions, and interfaces to interact with the Ethereum smart contracts and local database.

5.5.1 SQLite3 metadata database docShare’s SQLite3 metadata storage is responsible for storing information regarding partner verification, shared documents and received documents. It consists of six tables, “partners”, “shared”, “sharedConfirmation”, “sharedMapping”, “received”, and “receivedConfirmation”.

The “partners” table stores additional information to an identity of the public identity management system. Entries are linked by their unique id. A user can add a name to the partner, the information if he is verified for document exchange, and the authentication token exchanged through the legal identity verification process. The “shared” and “sharedMapping” tables hold information regarding documents exchanged with a unique partner or groups of partners, like the document locator. If documents of sensitivity tracking are exchanged additional receive metadata, like the actual receipt or the fallback deadline, is stored in the “sharedConfirmation” table. Table “received” stores general metadata about received share offerings, i.e. the sender, the document locator and if the document was downloaded. If a document of sensitivity tracking is received additional metadata, like the fallback reference, is stored in the “receivedConfirmation” table.

5.5.2 docShare library The docShare library bundles common configurations, cryptographic functions, and interfaces to interact with the Ethereum smart contracts and local database. It is referenced by the docShare network services and terminal scripts to achieve reusable and maintainable code. It further defines the message format for the network communication and the format for digital enveloping.

Message format specification As illustrated in figure 5.3 messages in docShare have a defined format. They consist of the sender identifier, the communication protocol used, the actual usage data to transmit, a timestamp, and the sender’s digital signature over the first four fields.

8-byte 2-byte 1024-byte 8-byte 384-byte

sender protocol data time signature

1426 bytes

Figure 5.3: Message format specification used in docShare communications.

52 CHAPTER 5. PROTOTYPE IMPLEMENTATION

The timestamp is used to avoid reply attacks and the signature to guarantee data integrity and authenticity. Each message has a fixed length of 1426 bytes to avoid side channel attacks by analysing differences in the packet size transmitted. docShare defines nine message protocols using this format. Their details can be found in table 5.1. id name purpose data field content 0 partner verifica- authenticate yourself to your the partner’s one time secret tion request partner for document ex- exchanged through the ex- change ternal channel during verific- ation 1 resource locator request a list of all resource none request locators your partner shares with you 2 resource locator transmit up to 4 resource loc- up to 4 resource locators offer ators shared with this partner 3 confirmed re- transmit up to 4 resource loc- up to 4 resource locators source locator ators shared with this partner offer that require a receive confirm- ation 4 resource locator inform the partner about number of transmitted/re- summary the number of sent/received ceived shared resource locators shared resource locator offers 5 accept resource accept to confirm the receipt of hashsum of the encrypted locator offer a shared resource that requires archive, expiry time of the of- confirmation fer, fallback reference, fallback encryption key, fallback dead- line 6 resource locator share the decryption key of a hashsum of the encrypted decryption shared resource that requires archive, expiry time of the of- confirmation fer, decryption key 7 resource receive acknowledge the receipt of a “acknowledge receipt of file confirmation shared resource with hashsum: [hashsum of en- crypted archive] from your of- fer deadlined: [expiry time of the offer]” 666 error notification inform the recipient that some- error message thing went wrong

Table 5.1: List of docShare’s message protocol formats.

53 CHAPTER 5. PROTOTYPE IMPLEMENTATION

Digital envelope formats docShare uses two different digital envelope formats to exchange documents of sensitivity secure, private, and tracking.

The digital envelope format for sensitivities secure and private is depict in fig- ure 5.4. The content data and its digital signature are encrypted with a randomly generated AES-256 key which is asymmetrical encrypted for every recipient with the recipient’s public key.

A A R C R File C H H I I V V ……. E Signature E

Figure 5.4: Digital envelope format for documents of sensitivity secure and private.

To masquerade the number of recipients randomly generated fake decryption keys are generated, asymmetrical encrypted with randomly generated fake public keys and added to the archive until the number of keys stored reaches a multiple of 20.

The digital envelope format for documents of sensitivity tracking (cf. figure 5.5) extends the digital envelope format of sensitivities secure and private with an addi- tional nested encrypted archive. Similar to above mentioned format the first AES encrypted archive can be decrypted with the asymmetrical encrypted decryption key provided for each recipient. But, instead of getting access to the content data, recipients get access to the content data’s file descriptor and its digital signature.

54 CHAPTER 5. PROTOTYPE IMPLEMENTATION

A R A C R A H C R File descriptor File I H C V I H E V I E V ……. E Signature Signature

Figure 5.5: Digital envelope format for documents of sensitivity tracking.

The file descriptor holds information about the encrypted content data, the hash of the nested encrypted archive it is referring to, a timestamp when the offer is expiring, and information needed to proceed with the optimistic protocol for tracking. This could be in form of a legally binding contract. The key to decrypt the content data isn’t part of the envelope format and needs to be received through the optimistic document exchange protocol.

5.5.3 Services docShare uses three background daemons to handle the exchange of documents of different sensitivities and to perform the identity verification. These services use TCP sockets that are bound to localhost and are forwarded through Tor. Therefore, no additional network channel encryption is implemented, as Tor provides an end to end encryption for location hidden services. comDaemon The communication daemon (comDaemon) takes care of all protocol communication in docShare and handles messages of the format described in section 5.5.2. Once a message arrives it is checked if it contains a valid signature and if its timestamp is not older than 60 seconds. The first check is used to guarantee the message’s integrity and authenticity and the second is used to reduce the probability of successful reply attacks. If both checks succeed the message is handled depending on its protocol field. Otherwise the TCP connection is terminated. The comDaemon accepts messages of protocols partner verification request, resource locator request, resource locator offer, confirmed resource locator offer, accept resource locator offer, and resource receive

55 CHAPTER 5. PROTOTYPE IMPLEMENTATION confirmation. Any message except these of protocol partner verification also requires that the sender of the message is a verified partner. If he isn’t, the connection is terminated. Table 5.2 describes how the comDaemon handles the different protocols of received messages once they pass all checks.

message protocol action taken by comDaemon partner verification request Check if the provided one-time secret matches the one assigned to the sender in the database. If so, verify the sender as partner and reply with a partner verification request. Otherwise close the connection. resource locator request Sent all resource locators of shared documents assigned to this partner via resource locator of- fer and conf. resource locator offer messages to the partner’s comDaemon. Afterwards reply with the number of shared resources via resource locator summary message. resource locator offer Save containing resource locators in the data- base and reply with a resource locator summary message. conf. resource locator offer Save containing resource locators in the data- base and reply with a resource locator summary message. accept resource locator offer Check if hashsum, expiry date and provided fall- back data are valid. If so, save the received mes- sage, update the fallback data in the database, and reply with the decryption key via resource locator decryption message. Otherwise reply with an error message. resource receive confirmation Check if the receipt states a valid hashsum and deadline. If so, save the received confirmation, update the database that a confirmation was received and close the connection. Otherwise reply with an error message.

Table 5.2: Message handling by comDaemon. shareDaemon The share daemon (shareDaemon) is responsible for providing access to shared documents of sensitivity secure, private, and tracking based on their resource locator. It is realized as HTTP service providing access via HTTP GET request. Only partners that know the correct 255-character long resource locator consisting of an alphabet of 66 letters can download the encrypted document.

56 CHAPTER 5. PROTOTYPE IMPLEMENTATION

Assuming a future TCP connection establish time of 1ms1 in the Tor network it would take 25566 ms (around 2.151148 years) to iterate through all possible combinations of resource locators. It is therefore deemed unrealistic, that an attacker can use this technique to figure out the number of documents a particular person is sharing through docShare. Even though if an attacker is lucky to download a shared document it can’t access its content as it is encrypted. In addition, he won’t be able to either know the intended recipients nor the actual number of recipients as they are masqueraded through the digital envelope formats. Therefore, this rudimentary proof of concept abstains of an additional layer of authentication in shareDaemon’s implementation. anonDaemon The anonymous receive daemon (anonDaemon) is used to anonymously receive documents from any docShare user. It is implemented as HTTP service that receives arbitrary UTF-8 encoded data2 via HTTP POST request and stores them on disk. To avoid that two documents with the same name overwrite each other a uploaded document is prefixed with the current timestamp. Due to the high abuse potential of operating an anonymous receive daemon, it is de- activated by default and needs to be explicitly activated in docShare’s configuration.

5.5.4 Terminal scripts The last components of the docShare client are the terminal scripts used to interact with the local docShare instance and partner docShare hidden service endpoints. Their main purpose is the sharing of documents, partner management, system initialization, and information visualization. As a proof of concept doesn’t require a graphical terminal scripts were chosen as form of implementation to reduce the development time. Each script is shortly described below, stating its purpose, command line arguments, and main execution steps. add_confirmed_share.py This script is used to share a document of sensitivity tracking with one or multiple partners.

It requires the file to share, the file’s description, the days of validity until the sharing offer expires, and the recipients’ ids or names as command line arguments.

When the script is called it checks the command line arguments for validity, creates the encrypted archive of the digital envelope format for tracking, moves the archive

1Currently the average round trip time of TCP connections in the internet is around 90ms. 2To reduce the development time only UTF-8 encoded data is supported. This is sufficient for a rudimentary proof of concept.

57 CHAPTER 5. PROTOTYPE IMPLEMENTATION to the folder where the shareDaemon can provide it, updates the metadata database tables “shared”, “sharedMapping” and “sharedConfirmation” accordingly, and sends messages of protocol conf. resource locator offer to each recipient’s comDaemon. add_partner.py This script is used to add a user from the IMS as potential partner and to initiate the verification process of the exchanged one-time-secrets.

It requires the user id in the IMS of the potential partner, a local identification string (e.g. name), the own one-time-secret exchanged with the potential partner, and the one-time-secret received from the potential partner as command line arguments.

When the script is called it checks if the user id for validity, updates the respective fields in the metadata database table “partners”, and sends a message of protocol partner verification request to the potential partner’s comDaemon. If it receives a partner verification message as reply it checks if stored and received one-time-secrets match and changes the status from potential partner to partner. add_share.py This script is used to share a document of sensitivity secure or private with one or multiple partners.

It requires the file to share, the file’s description, and the recipients’ ids or names as command line arguments.

When the script is called it checks the command line arguments for validity, creates the encrypted archive of the digital envelope format for secure and private, moves the archive to the folder where shareDaemon can provide it, updates the metadata database tables “shared” and “sharedMapping”, and sends messages of protocol resource locator offer to each recipient’s comDaemon. delete_partner.py This script is used to delete a (potential) partner from the metadata database.

It requires the partner’s id and the partner’s name to delete as command line arguments.

When the script is called it checks the command line arguments for validity and deletes the according entry from the “partners” table.

58 CHAPTER 5. PROTOTYPE IMPLEMENTATION delete_share.py This script is used to delete a shared document from docShare.

It requires the id of the share to delete and its description as command line arguments.

When the script is called it checks the command line arguments for validity, obtains the according resource locator from the database, deletes the respective archive from shareDaemon’s folder, and deletes the share’s entries from the “shared”, “sharedMap- ping” and “sharedConfirmation” tables. download_received.py This script is used to download a document of sensitivity secure, private, or tracking from a partner.

It requires the document’s share id in the database as command line argument.

When the script is called it checks the command line argument for validity, and constructs the document’s resource locator from the partner’s .onion address in the IMS and the locator information stored in the “received” table. Then the share is downloaded to a temporary directory. Afterwards the script iterates through the archive’s encrypted decryption keys while trying to decrypt the according decryption key with its private key. Once the decryption key is decrypted the encrypted archive will be decrypted. Depending on the form of digital envelope format distinguished the script will either check the file’s validity or the file descriptor’s validity using the digital signature provided. If the archive is of digital envelope format secure or private, all temporary files are deleted, and the file and its signature are moved to docShare’s receive directory. Afterwards the status of the share will be updated in the “received” table. If the archive is of digital envelope format tracking, the encrypted archive’s hashsum, and the offer’s deadline and description, are extracted from the file descriptor. Using the hashsum the archive is checked for integrity. Afterwards the files description is displayed to the user who has to explicitly accept to confirm the receipt of the file. If he chooses to fallback reference and fallback encryption key are generated, and the fallback deadline is calculated to be one hour in the future. According entries are updated in the “receivedConfirmation” table. Then a message of protocol accept resource locator offer is sent to the partner. The received reply is updated in the “receivedConfirmation” table and checked for integrity and proper protocol. Then the decryption key is extracted, and the archive decrypted. Finally, the content file’s validity is checked using the digital signature provided, according entry in the “received” table is updated, and a resource receive confirmation is sent to the partner’s comDaemon.

59 CHAPTER 5. PROTOTYPE IMPLEMENTATION initialize.py This script is used to initialize docShare’s SQLite3 metadata database and to create the RSA key pair of 3072-bit length.

No command line arguments are required. share_file_anonymously.py This script is used to share a document of sensitivity anonymous with one or multiple recipients.

It requires the file to share and the recipients’ ids as command line arguments.

When the script is called it checks the command line arguments for validity and tries to send the file to the recipients’ anonDaemons via HTTP POST request. show_partner.py This script is used to show all partner information from the “partners” table.

It has no command line arguments. show_received.py This script is used to show relevant information related to received share offerings and their download status.

It has an optional “verbose” argument to display more information. show_shares.py This script is used to show relevant information related to documents shared with partners and their status.

It has an optional “verbose” argument to display more information. update_received.py This script is used to manually synchronize all shared offerings of documents partners shared with you.

It has no command line arguments.

When the script is called it sends a resource locator request message to every partner.

60 CHAPTER 5. PROTOTYPE IMPLEMENTATION upload_key_to_ledger_from_accepted_but_unconfirmed_shares.py This script is used to determine the documents of sensitivity tracking that require a manual upload of the encrypted decryption key to the fallback reference in case the optimistic protocol failed and a accept resource locator offer message was received but no resource receive confirmation.

It has no additional command line arguments.

When this script is called it determines the documents of sensitivity tracking that require a manual upload of the encrypted decryption key, decrypts it, and uploads it to the regarding fallback reference. Afterwards the regarding entries in the “sharedConfirmation” table are updated.

61 Chapter 6

Prototype evaluation

In this chapter the prototype implementation of chapter5 is analysed with regards to its compliance to the defined document sensitivity categories of section 3.1 and to its functionality as outlined in chapter4. Each of docShare’s components that interacts via the network is analysed regarding to its information security services to validate the global compliance of the system. Once the validation is complete the theoretical connection of the concept is extracted.

6.1 Evaluation environment docShare was tested and evaluated in a virtual environment of three 17.10 virtual machines that is attached to appendixC and sketched in figure 6.1. The environment was setup according to the installation and usage guide of appendixB using its own private Ethereum test network. Each of the three virtual machines is running an instance of the Go Ethereum client Geth, the anonymization software Tor, docShare, and the web-frontend to access the Ethereum smart contracts for IMS and the simple key value store.

6.2 Public identity management

In its web-frontend docShare uses the JavaScript library web3.js to interact with the Ethereum smart contract for the public identity management service. As web3.js and the Ethereum smart contract interact through Geth’s RPC API bound to localhost no information security evaluation is needed at this interface as no information is leaving the host. Therefore, only the connection between the Ethereum nodes at the submission of transactions stating the change or creation of an identity needs to be analysed. Here Ethereum only provides confidentiality, integrity, and sender authenticity by using hashes and asymmetric cryptography. Privacy isn’t achieved as Geth doesn’t have interfaces to route traffic through Tor’s SOCKS proxy. As a result, IP addresses are shared with the receiving Ethereum nodes.

62 CHAPTER 6. PROTOTYPE EVALUATION

internet

Alice Bob Carol

Figure 6.1: Environment used to evaluate docShare’s implementation.

Geth’s usage of UDP for node discovery and its block update process wouldn’t be critical if docShare’s smart contracts are part of Ethereum’s global distributed ledger as an adversary won’t be able to distinguish which smart contracts are used by the client. In the evaluated test network however here also valuable non-masqueraded metadata in form of IP addresses undermines the users privacy. From a functionality standpoint creating new identities, updating existing identities, and permanently deactivating identities was successfully tested using the web- frontend provided and by verifying the changes on the two other node’s web-frontends.

6.3 Partner management

The partner management and verification communicate solely through Tor hidden services. Therefore, confidentiality and privacy are achieved as Tor’s rendezvous protocol sufficiently masquerades TCP metadata and encrypts the communication channels. Data integrity and authenticity are met through docShare’s defined message format using digital signatures over the transferred content data. The functionality of the partner management was successfully tested by simulating the exchange of one-time-tokens and user ids between Alice, Bob, and Carol through

63 CHAPTER 6. PROTOTYPE EVALUATION external channels and using the script add_partner.py. Also, the delete of a partner was successfully tested using the script delete_partner.py. The handling of wrong command line arguments and exchanged one-time-tokens was tested as well.

6.4 Document exchange with regards to secure and private

The document exchange for sensitivity secure and private uses the same scripts. Therefore, they only need to be analysed once. Similar to partner management the share of document offers through comDaemon provides confidentiality and privacy through Tor’s rendezvous protocol. Similarly, data integrity and authenticity are met through docShare’s message format. The actual fetching of documents through the shareDaemon and the client script download_received.py is also secured through Tor (security and privacy wise). Integrity and authenticity are achieved through docShare’s digital envelope format with digital signatures over the content data. Asymmetric cryptography is not only used for encryption but also to masquerade the actual recipient information. The functionality of document exchange with regards to secure and private was successfully tested by sharing a test file between Alice and Bob, and another test file between Bob, Alice, and Carol, and comparing their SHA-256 hashes once they were downloaded by all parties. Another test was performed where Alice temporarily deleted Carol as partner and Carol tried to share a file with Alice, this sharing failed as expected. Also, the delete of a file via delete_share.py was successfully tested by observing the delete of the share in the file system. The manual synchronization via update_received.py was successfully tested as well. First host Bob shut down its docShare daemons, then Carol shared a file with Bob but couldn’t submit the resource locator offer, afterwards Bob’s docShare daemons were reactivated and update_received.py executed.

6.5 Document exchange with regards to anonymous

The anonDaemon is implemented as Tor hidden service and doesn’t require partner verification. Confidentiality, privacy, and sender anonymity are achieved through Tor’s rendezvous protocol and build in channel encryption. No additional services for data integrity are implemented. The recipient authenticity can be achieved if sender and recipient did verify themselves beforehand through an external channel. This service has been successfully tested by activating the anonDaemon in Alice’s docShare configuration and sharing an UTF-8 encoded file from Bob using share_- file_anonymously.py. For verification the original shared file’s SHA-256 hash and the received SHA-256 hash were compared.

64 CHAPTER 6. PROTOTYPE EVALUATION

6.6 Document exchange with regards to tracking

Tracking uses the same hidden service, docShare messaging format - digital envelope combination as document exchange with regards to secure and private. Therefore, confidentiality, integrity, privacy, and authenticity are met for the case that the optimistic protocol exits without the need to store the encrypted decryption key in the key value store smart contract. In case the optimistic protocol fails, and the upload of fallback data is required docShare suffers from the same privacy issues as public identity management when interacting with Geth and Ethereum to update identities. The implemented functionality was tested successfully. Bob is able to share a file with Carol and Alice using add_confirmed_share.py, and Alice can retrieve the file via download_received.py. Once the process is done, Bob has Alice’s accept resource locator offer and resource receive confirmation in his local data- base. It was also tested when Carol retrieves the file via a modified version of download_received.py that doesn’t send a resource receive confirmation. In this case Bob was able to store Carol’s accept resource locator offer message in his database and upload the decrypted encryption key to the fallback reference using upload_key_to_ledger_from_accepted_but_unconfirmed_shares.py.

To avoid a mapping between the Ethereum wallet used to register oneself in the IMS and the wallet used to upload fallback keys to the key value smart contract, it is advised to use a second independent wallet for uploading fallback keys. docShare supports this in its configuration.

6.7 Document exchange with regards to business

As stated in section 4.2.6 data exchange with compliance to business wasn’t imple- mented in docShare. Therefore, its implementation can’t be evaluated and is simply not supported.

6.8 Summary

To summarize docShare currently only supports the document exchange of sensitivity secure and partially supports sensitivities private, anonymous, and tracking.A detailed summary is given in tables 6.1 and 6.2. docShare’s main limitation is the metadata masking of IP addresses when interacting with the Ethereum smart contracts. Ethereum currently has no support for SOCKS proxies to update entries anonymously. Further is Ethereum’s node discovery protocol based on UDP and therefore also leaks identity information in form of IP addresses. Anonymity isn’t achieved as data integrity services aren’t implemented to verify that a file was uploaded correctly.

65 CHAPTER 6. PROTOTYPE EVALUATION ) ) )    total  ( ( (  business n.a. n.a. n.a. n.a. n.a. n.a. n.a. / n.a. n.a. business n.a. n.a. n.a. n.a. n.a.  ) /  tracking   (      )  tracking  (     / anonymous         )  anonymous   (    / private         document exchange with compliance to private       document exchange with compliance to / secure         ): partially complied, and n.a.: not applicable.  secure       ): partially complied, and n.a.: not applicable. /  partner man- agement     n.a. n.a.  n.a. partner man- agement    n.a. n.a. : not complied, (  / n.a : not complied, ( public identity management     n.a. n.a.    : complied,  public identity management    n.a. n.a. : complied,  Confidentiality Integrity Privacy Anonymity Non-repudiation Legalrepudiation Authenticity non- sender of / recipient Accountability secure private anonymous tracking business notion is used: Following notion is used: Table 6.1: Summary of docShare’s implementation evaluation regarding complied with information security services. Following Table 6.2: Summary of docShare’s implementation evaluation regarding complied with document’s levels of sensitivity.

66 CHAPTER 6. PROTOTYPE EVALUATION

Nonetheless, the prototyped implementation shows that with slight modifications in the implementation and the change of the distributed ledger for IMS and key value store a decentralized DSS on the basis of distributed ledger technology complying to defined document sensitivities secure, private, anonymous, and tracking is possible as all implemented functionality was tested successfully.

6.9 Theoretical connection of the concept

To align with the constructive research methodology the theoretical connection of docShare is shown in this section. docShare differentiates itself from all other analysed DSS by separating the storage of content- and metadata. Metadata is handled and stored at the client. It is used for the local identity management of trusted partners and the management of shared documents including their encryption. Content data could be stored at any location as its locator information and decryption management is transferred directly via private peer-to-peer connections utilizing Tor hidden services. Identity management and name system services are designed as decentralized systems using distributed ledger technology to combine a central repository to look up entries with the benefits of censorship resistant and denial of service resistant distributed systems. Using this design, it is still possible to attack single host’s Tor hidden service endpoints via denial of service attacks or to attack single distributed ledger nodes, but this won’t impact the global state of the system. Relying on Tor hidden services makes you further mostly independent from your network operator and his attempts of censoring services (e.g. through DNS filters).

67 Chapter 7

Conclusion

To analyse to what degree current document sharing systems meet the information security requirements of their user, a literature review was conducted to extract and define relevant information security services for document exchange; namely: data integrity, data confidentiality, security, privacy, anonymity, authenticity of sender and recipient, (legal) non-repudiation, and accountability. Based on extracted information security services and practical use cases derived from the post office analogue, five combinations of information security services were defined as document sensitivity categories. These document sensitivity categories secure, private, anonym- ous, tracking, and business form the basis for the information security compliance analysis of commonly used document exchange systems. Four centralized file-sharing services, including the marked leaders Box and Dropbox, three decentralized data storage and file-sharing services, and the exchange of digital assets through secure e-mail were analysed based on their publicly available information, with the result that none of the systems complies to any of the higher document sensitivity categor- ies private, anonymous, tracking or business. Existing document systems mainly lack the support to masquerade metadata information needed to share documents. Therefore (trusted) third parties are able to get information about who shared what documents with whom. Also the leakage of identifiers in form of IP addresses is a huge concern. The analysis further pointed out that legal jurisdictions, in which the services operate, can undermine the service’s confidentiality by forcing the operator to decrypt data and hand it out to federal agencies. As a result, a new concept to share documents using distributed ledger technology and anonymization software was developed to exchange documents without the need of centralized metadata storage and trusted third party. In this concept the client is directly responsible for its metadata management. This includes metadata related to shared document locators, their encryption keys, and her partner management. The concept uses a twofold identity management system. Global identities, their accessibility endpoint as Tor hidden service, and public RSA key, are available in a decentralized directory in form of a distributed ledger. But the partner management, including the real name partner verification is outsourced to the client. Therefore,

68 CHAPTER 7. CONCLUSION only a client holds all information with whom to interact. Different protocols from literature were analysed for applicability and ease of implementation to support the higher sensitivity categories tracking and business. In the end an optimistic protocol was chosen to be used to implement technical non-repudiation. Only in case the optimistic protocol fails a distributed ledger is needed to resolve a conflict. An interesting problem regarding legal non-repudiation and how to preserve information regarding legal non-repudiation in a data structure that can only be used once but is still legally binding was found. Unfortunately, it couldn’t be solved in scope of this thesis. Therefore, no working concept to support sensitivity tracking could be provided. Based on the developed concept a prototype named docShare was implemented as rudimentary proof of concept. It uses a private Ethereum network for the global identity management and key value store in case the optimistic protocol of sensitivity tracking fails. Its functionality was proven through test cases and its compliance to defined sensitivity services was analysed. It was found out that the implemented distributed ledger Ethereum can’t be anonymized to masquerade its metadata when creating or updating entries. Therefore, metadata in form of IP addresses can leak while interacting with the Ethereum smart contracts. As a result, docShare only supports sensitivity secure, and partially sensitivities private, anonymous, and tracking. With that key limitation identified a future prototype can be built on the basis of developed concept and a different distributed ledger that supports metadata anonymization, completely complying to defined document sensitivities secure, private, anonymous, and tracking.

7.1 Recommendations for future work

There are a couple of distributed ledger related topics that are a good starting point to investigate further for future work. Obviously implementing a newer version of docShare using a distributed ledger that supports metadata anonymization through Tor’s SOCKS proxy should be considered. Another starting point is more sustainability related by analysing different consensus algorithms that rely on less computational resources than proof of work consensus. Also, the indefinite growth of current distributed ledgers over time should be analysed as it eventually will impact their decentralization and censorship resistance. In a distant future the distributed ledger’s records will be that enormous (e.g. 10TB) that only bigger companies can afford to store all transactions. At this point general household computers won’t be able to store a local copy of the transaction log and rely on the bigger companies to validate all transactions. Legal aspects of maintaining a distributed ledger are also a great field for research. People already abuse the structure of the blockchain by uploading virus fragments and child abusive material directly into the blockchain. [99] Depending on the jurisdiction a miner is operating from this could lead to serious accusation and imprisonment. Therefore, either technical or legal solutions to this problem need to be found.

69 CHAPTER 7. CONCLUSION

Besides distributed ledger related research one could investigate how to support legal non-repudiation in distributed document exchange and how to save it in an unforgeable and not reusable format. It also might be worth investigating to find other areas besides document sharing where the concept of metadata and content data separation is applicable and useful, and how important IP address masquerading really is for end-users. From the prototype implementation side, the switch from Tor version 2 to Tor version 3 to prohibit denial of service attempts of Tor hidden services through double encrypted location hidden service identifiers, and the storage of encrypted content data at untrusted intermediaries could also be worth evaluating. One could also investigate to improve data availability by uploading digital enveloped documents to untrusted intermediaries.

70 Bibliography

[1] N. Confessore, “Cambridge analytica and facebook: The scandal and the fallout so far,” New York Times, 2018. [Online]. Available: https://www.nytimes.com/ 2018/04/04/us/politics/cambridge-analytica-scandal-fallout.html (Accessed 2018-06-02).

[2] E. Macaskill and G. Dance, “Nsa files: Decoded - what the revelations mean for you,” The Guardian, 2013. [Online]. Available: https://www.theguardian.com/world/interactive/2013/nov/01/ snowden-nsa-files-surveillance-revelations-decoded (Accessed 2018-06-02).

[3] J. Naughton, “Death by drone strike, dished out by algorithm,” The Guardian, 2016. [Online]. Available: https://www.theguardian.com/commentisfree/2016/ feb/21/death-from-above-nia-csa-skynet-algorithm-drones-pakistan (Accessed 2018-06-02).

[4] Dropbox, Inc. (2017) Dropbox - homepage. [Online]. Available: https: //www.dropbox.com/ (Accessed 2017-10-11).

[5] Citrix. (2017) Rightsignature homepage. [Online]. Available: https: //rightsignature.com/ (Accessed 2017-09-05).

[6] Congress, US, “h.r.3162 - 107th congress (2001-2002): uniting and strengthening america by providing appropriate tools required to intercept and obstruct terrorism (usa patriot act) act of 2001,” Washington, DC, 2001.

[7] ——, “h.r.2048 - 114th congress (2015-2016): Uniting and strengthening america by fulfilling rights and ensuring effective discipline over monitoring (usa freedom act) act of 2015,” Washington, DC, 2015.

[8] Dropbox, Inc., “Transparency reports.” [Online]. Available: https://www. dropbox.com/transparency/reports (Accessed 2017-10-25).

[9] E. Kasanen, K. Lukka, and A. Siitonen, “The constructive approach in manage- ment accounting research,” Journal of management accounting research, vol. 5, p. 243, 1993.

71 BIBLIOGRAPHY

[10] G. Crnkovic, “Constructive research and info-computational knowledge genera- tion,” Model-Based Reasoning in Science and Technology, vol. 314, pp. 359–380, 2010.

[11] European Comission, “Proposal for a directive of the european parliament and of the council on copyright in the digital single market,” 2016. [Online]. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX: 52016PC0593 (Accessed 2018-07-01).

[12] K. J. O’Dwyer and D. Malone, “Bitcoin mining and its energy footprint,” 2014.

[13] A. de Vries, “Bitcoin’s growing energy problem,” Joule, vol. 2, no. 5, pp. 801–805, May 2018. doi: 10.1016/j.joule.2018.04.016. [Online]. Available: http://dx.doi.org/10.1016/j.joule.2018.04.016

[14] R. Shirey, “Internet Security Glossary, Version 2,” Internet Requests for Comments, RFC 4949, August 2007. [Online]. Available: https: //tools.ietf.org/html/rfc4949 (Accessed 2017-09-07).

[15] A. Freier, P. Karlton, and P. Kocher, “The Secure Sockets Layer (SSL) Protocol Version 3.0,” Internet Requests for Comments, RFC 6101, August 2011. [Online]. Available: https://tools.ietf.org/html/rfc6101 (Accessed 2017-09-12).

[16] S. Muftic, N. bin Abdullah, and I. Kounelis, “Business information exchange sys- tem with security, privacy, and anonymity,” Journal of Electrical and Computer Engineering, vol. 2016, 2016.

[17] NIST, “197: Advanced encryption standard (aes),” Federal information processing standards publication, November 2001. [Online]. Available: http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (Accessed 2017-09- 14).

[18] ——, “180-4: Secure hash standard,” Federal information processing standards publication, August 2015. [Online]. Available: http://nvlpubs.nist.gov/nistpubs/ FIPS/NIST.FIPS.180-4.pdf (Accessed 2017-09-14).

[19] E. W. Felten. (2015) Hash pointers and data structures. Bitcoin and Cryptocurrency Technologies Lecture at Coursera.org. [Online]. Available: https://www.coursera.org/learn/cryptocurrency/lecture/EYEAo/ hash-pointers-and-data-structures (Accessed 2017-09-21).

[20] R. Rivest, A. Shamir, and L. Adleman, “Cryptographic communications system and method,” Sep. 20 1983, uS Patent 4,405,829.

[21] N. Koblitz, “Elliptic curve cryptosystems,” Mathematics of computation, vol. 48, no. 177, pp. 203–209, 1987.

72 BIBLIOGRAPHY

[22] V. S. Miller, “Use of elliptic curves in cryptography,” in Conference on the Theory and Application of Cryptographic Techniques. Springer, 1985, pp. 417–426.

[23] P. Mahajan and A. Sachdeva, “A study of encryption algorithms aes, des and rsa for security,” Global Journal of Computer Science and Technology, 2013.

[24] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE transactions on Information Theory, vol. 22, no. 6, pp. 644–654, 1976.

[25] B. Schneier, Applied cryptography: protocols, algorithms, and source code in C, 20th ed. john wiley & sons, 2017. ISBN 978-1-119-09672-6

[26] R. Gill. (2016) Trust in the era of hackable certificate authorities. Akamai. [Online]. Available: https://enterprise-access.akamai.com/blog/ trust-in-the-era-of-hackable-certificate-authorities/ (Accessed 2017-09-16).

[27] L. Zeltser. (2015) How digital certificates are used and misused. [Online]. Available: https://zeltser.com/how-digital-certificates-are-used-and-misused/ (Accessed 2017-09-16).

[28] R. Housley, W. Ford, W. Pok, and D. Solo, “Internet X.509 Public Key Infrastructure - Certificate and CRL Profile,” Internet Requests for Comments, RFC 2459, August 2011. [Online]. Available: http://www.ietf.org/rfc/rfc2459.txt (Accessed 2017-09-16).

[29] P. R. Zimmermann, The Official PGP User’s Guide. Cambridge, MA, USA: MIT Press, 1995. ISBN 0-262-74017-6

[30] emercoin.com. (2017) emercoin website. [Online]. Available: https://emercoin. com/ (Accessed 2017-09-18).

[31] C. Allen, A. Brock, V. Buterin, J. Callas, D. Darje, C. Lundkvist, P. Kravchenko, J. Nelson, D. Reed, M. Sabadello, G. Slepak, N. Thorp, and H. T. Wood, “Decentralized public key infrastructure,” 2015. [Online]. Available: https://github.com/WebOfTrustInfo/rebooting-the-web-of-trust/ blob/master/final-documents/dpki.pdf (Accessed 2017-09-18).

[32] S. Muftic, “Bix certificates: Cryptographic tokens for anonymous transactions based on certificates public ledger,” Ledger, vol. 1, pp. 19–37, 2016.

[33] ——, “Blockchain identity management system based on public identities ledger,” Apr. 25 2017, uS Patent 9,635,000.

[34] B. Kaliski, “PKCS #7: Cryptographic Message Syntax - Version 1.5,” Internet Requests for Comments, RFC 2315, March 1998. [Online]. Available: https://tools.ietf.org/html/rfc2315 (Accessed 2017-09-18).

73 BIBLIOGRAPHY

[35] S. Goldwasser, S. Micali, and C. Rackoff, “The knowledge complexity of inter- active proof systems,” SIAM Journal on computing, vol. 18, no. 1, pp. 186–208, 1989.

[36] D. Chaum, J.-H. Evertse, and J. Van De Graaf, “An improved protocol for demonstrating possession of discrete logarithms and some generalizations,” in Workshop on the Theory and Application of of Cryptographic Techniques. Springer, 1987, pp. 127–141.

[37] M. Blum, “How to prove a theorem so no one else can claim it,” in Proceedings of the International Congress of Mathematicians, vol. 1, 1986, p. 2.

[38] O. Goldreich, S. Micali, and A. Wigderson, “Proofs that yield nothing but their validity or all languages in np have zero-knowledge proof systems,” Journal of the ACM (JACM), vol. 38, no. 3, pp. 690–728, 1991.

[39] U. Feige, A. Fiat, and A. Shamir, “Zero-knowledge proofs of identity,” Journal of cryptology, vol. 1, no. 2, pp. 77–94, 1988.

[40] E. Ben-Sasson, A. Chiesa, E. Tromer, and M. Virza, “Succinct non-interactive zero knowledge for a von neumann architecture.” in USENIX Security Sym- posium, 2014, pp. 781–796.

[41] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982.

[42] E. Buchman, “Tendermint: Byzantine fault tolerance in the age of blockchains,” Guelph, Ontario, Canada, 2016.

[43] C. Cachin and M. Vukolić, “Blockchains consensus protocols in the wild,” arXiv preprint arXiv:1707.01873, 2017.

[44] S. Muftic. (2017, June) Blockchain and smart contracts. GDG Meetup - Presentation. [Online]. Available: https://youtu.be/w7p6B-SS1PA?t=1h1m41s (Accessed 2017-09-20).

[45] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” 2008.

[46] A. Narayan. (2015) How bitcoin achieves decentralization. Bitcoin and Cryptocurrency Technologies Lecture at Coursera.org. [Online]. Available: https://www.coursera.org/learn/cryptocurrency/home/week/2 (Accessed 2017- 09-27).

[47] J. Bonneau. (2015) Mechanics of bitcoin. Bitcoin and Cryptocurrency Technologies Lecture at Coursera.org. [Online]. Available: https://www. coursera.org/learn/cryptocurrency/home/week/3 (Accessed 2017-09-27).

74 BIBLIOGRAPHY

[48] V. Buterin et al., “A next-generation smart contract and decentralized applica- tion platform,” white paper, 2014.

[49] Y. Sompolinsky and A. Zohar, “Secure high-rate transaction processing in bitcoin,” in International Conference on Financial Cryptography and Data Security. Springer, 2015, pp. 507–527.

[50] G. H. Gonnet, R. A. Baeza-Yates, and T. Snider, “New indices for text: Pat trees and pat arrays.” Information Retrieval: Data Structures & Algorithms, vol. 66, p. 82, 1992.

[51] V. Zamfir, “Introducing casper “the friendly ghost”,” Ethereum Blog, 2015. [Online]. Available: https://blog.ethereum.org/2015/08/01/ introducing-casper-friendly-ghost (Accessed 2017-09-29).

[52] V. Buterin, “Casper version 1 implementation guide,” Ethereum Github repository, 2017. [Online]. Available: https://github.com/ethereum/research/ wiki/Casper-Version-1-Implementation-Guide (Accessed 2017-09-29).

[53] R. G. Brown, J. Carlyle, I. Grigg, and M. Hearn, “Corda: An introduction,” R3 CEV, August, 2016. [Online]. Available: https://www.researchgate.net/profile/Ian_Grigg/publication/308636477_ Corda_An_Introduction/links/57e994ed08aed0a291304412.pdf (Accessed 2017-07-20).

[54] M. Hearn, “Corda–a distributed ledger,” Corda Technical White Paper, 2016. [Online]. Available: https://block.academy/researches/ corda-technical-whitepaper.pdf (Accessed 2017-07-20).

[55] R. G. Brown. (2017) The corda way of thinking. [Online]. Available: https://gendal.me/2017/02/21/the-corda-way-of-thinking/ (Accessed 2017-07- 20).

[56] ——. (2016) Introducing r3 corda: A distributed ledger designed for financial services. [Online]. Available: https://gendal.me/2016/04/05/ introducing-r3-corda-a-distributed-ledger-designed-for-financial-services/ (Ac- cessed 2017-07-20).

[57] J. R. Douceur, “The sybil attack,” in International Workshop on Peer-to-Peer Systems. Springer, 2002, pp. 251–260.

[58] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The second-generation onion router,” DTIC Document, Tech. Rep., 2004.

[59] Tor Project, “Tor Metrics,” 2017, [Online] Available: https://metrics.torproject. org/ [Accessed: 14.05.2017].

75 BIBLIOGRAPHY

[60] The Tor project. (unknown) Tor: Overview. [Online]. Available: https: //www.torproject.org/about/overview.html.en (Accessed 2017-10-05). [61] M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas, and J. Jones, “SOCKS Protocol Version 5,” Internet Requests for Comments, RFC 1928, March 1996. [Online]. Available: http://www.ietf.org/rfc/rfc1928.txt (Accessed 2017-10-05). [62] N. Mathewson, “Tor Rendezvous Specification - Version 3,” Tor Gitweb, Tech. Rep., September 2017. [Online]. Available: https://gitweb.torproject.org/ torspec.git/plain/rend-spec-v3.txt (Accessed 2017-10-05). [63] A. Biryukov, I. Pustogarov, and R.-P. Weinmann, “Trawling for tor hidden services: Detection, measurement, deanonymization,” in Security and Privacy (SP), 2013 IEEE Symposium on. IEEE, 2013, pp. 80–94. [64] G. Noubir and A. Sanatinia, “Honey onions: Exposing snooping tor hsdir relays,” DEF CON, vol. 24, 2016. [65] The Tor project. (unknown) Tor: Hidden service protocol. [Online]. Available: https://www.torproject.org/docs/hidden-services.html.en (Accessed 2017-10-05). [66] S. Talib, N. L. Clarke, and S. M. Furnell, “An analysis of information security awareness within home and work environments,” in Availability, Reliability, and Security, 2010. ARES’10 International Conference on. IEEE, 2010, pp. 196–203. [67] Skyhigh Networks, “Cloud adoption & risk report 2016 q4,” Campbell, CA 95008, USA, Tech. Rep., 2017. [Online]. Available: https://www. skyhighnetworks.com/cloud-report/ (Accessed 2017-10-11). [68] Box, Inc. (2017) Box - homepage. [Online]. Available: https://www.box.com/ (Accessed 2017-10-10). [69] ——, “Box keysafe.” [Online]. Available: https://cloud.app.box.com/v/ KeySafeDatasheet (Accessed 2017-10-10). [70] T. Shields and H. Shey, “Quick take: Use “customer-managed keys” to regain control of your data,” February 2015. [Online]. Available: https: //www.box.com/security/forrester-encryption-key-management (Accessed 2017- 10-10). [71] Box, Inc., “Box: Redefining content security,” Security White Paper. [Online]. Available: https://cloud.app.box.com/v/RedefiningContentSecurity (Accessed 2017-10-10). [72] Box Inc., “Comprehensive security at all levels.” [Online]. Available: https://www.box.com/static/download/Security_Overview_2-1.pdf (Accessed 2017-10-10).

76 BIBLIOGRAPHY

[73] Box, Inc., “Box: Securing business information in the cloud.” [Online]. Available: https://cloud.app.box.com/v/SecurityeBook (Accessed 2017-10-10).

[74] Dropbox, Inc., “Dropbox business security,” whitepaper. [Online]. Avail- able: https://cfl.dropboxstatic.com/static/business/resources/dfb_security_ whitepaper-vfllunodj.pdf (Accessed 2017-10-11).

[75] Nextcloud GmbH. (2017) Nextcloud - homepage. [Online]. Available: https://nextcloud.com/ (Accessed 2017-10-13).

[76] ——, “End-to-end encryption design,” no. September 20, 2017. [Online]. Available: https://nextcloud.com/endtoend/ (Accessed 2017-10-13).

[77] ——. (2017) Nextcloud - homepage - security and authentication. [Online]. Available: https://nextcloud.com/secure/ (Accessed 2017-10-13).

[78] Citrix. (2017) Rightsignature homepage - electronic signature security. [Online]. Available: https://rightsignature.com/security (Accessed 2017-10-17).

[79] ——. (2017) Rightsignature homepage - legality of electronic signatures. [Online]. Available: https://rightsignature.com/legality (Accessed 2017-10-17).

[80] ——. (2017) Rightsignature homepage - are electronic signatures leg- ally binding? [Online]. Available: https://rightsignature.com/legality/ are-electronic-signatures-legally-binding (Accessed 2017-10-17).

[81] Nebulous Inc. (2017) Sia - homepage. [Online]. Available: https://sia.tech/ (Accessed 2017-10-19).

[82] Z. Herbert, D. Vorick, and L. Champine. (2017) Trello - sia public roadmap. [On- line]. Available: https://trello.com/b/Io1dDyuI/sia-public-roadmap (Accessed 2017-10-19).

[83] D. Vorick and L. Champine, “Sia: Simple decentralized storage,” 2014.

[84] dinkel. (2017) Sia wiki - contracts. [Online]. Available: https://siawiki.tech/ renter/contracts (Accessed 2017-10-19).

[85] Taek. (2015) Sia forum - how sia works. [Online]. Available: https: //forum.sia.tech/topic/108/how-sia-works (Accessed 2017-10-19).

[86] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” Journal of the society for industrial and applied mathematics, vol. 8, no. 2, pp. 300–304, 1960.

[87] Storj Labs Inc. (2017) Storj - homepage. [Online]. Available: https://storj.io/ (Accessed 2017-10-21).

77 BIBLIOGRAPHY

[88] P. Maymounkov and D. Mazieres, “Kademlia: A peer-to-peer information system based on the xor metric,” in International Workshop on Peer-to-Peer Systems. Springer, 2002, pp. 53–65. [89] S. Wilkinson, T. Boshevski, J. Brandoff, and V. Buterin, “Storj a peer-to-peer cloud storage network,” 2016. [Online]. Available: https://storj.io/storj.pdf (Accessed 2017-09-08). [90] D. Svensson and P. Leund, “Secures: Secure resource sharing system,” Bachelor’s Thesis, KTH Royal Institute of Technology, Brinellvägen 8, 114 28 Stockholm, Sweden, 2015. [91] J. Callas, L. Donnerhacke, H. Finney, D. Shaw, and R. Thayer, “OpenPGP Message Format,” Internet Requests for Comments, RFC 4880, November 2007. [Online]. Available: https://tools.ietf.org/html/rfc4880 (Accessed 2017-09-08). [92] D. Poddebniak, C. Dresen, J. Müller, F. Ising, S. Schinzel, S. Friedberger, J. Somorovsky, and J. Schwenk, “Efail: Breaking s/mime and openpgp email encryption using exfiltration channels (draft 0.9.1).” [Online]. Available: https://efail.de/efail-attack-paper.pdf (Accessed 2019-05-27). [93] “Information technology – Security techniques – Non-repudiation – Part 1: Gen- eral,” International Organization for Standardization, Geneva, CH, Standard, Jul. 2009. [94] “Information technology – Security techniques – Non-repudiation – Part 2: Mechanisms using symmetric techniques,” International Organization for Stand- ardization, Geneva, CH, Standard, Dec. 2010. [95] “Information technology – Security techniques – Non-repudiation – Part 3: Mechanisms using asymmetric techniques,” International Organization for Standardization, Geneva, CH, Standard, Dec. 2009. [96] S. Even, O. Goldreich, and A. Lempel, “A randomized protocol for signing contracts,” Communications of the ACM, vol. 28, no. 6, pp. 637–647, 1985. [97] N. Asokan, M. Schunter, and M. Waidner, “Optimistic protocols for fair ex- change,” in Proceedings of the 4th ACM conference on Computer and commu- nications security. ACM, 1997, pp. 7–17. [98] B. Pfitzmann, M. Schunter, and M. Waidner, “Provably secure certified mail,” 2000. [99] R. Matzutt, J. Hiller, M. Henze, J. H. Ziegeldorf, D. Müllmann, O. Hohlfeld, and K. Wehrle, “A quantitative analysis of the impact of arbitrary blockchain content on bitcoin,” in Proceedings of the 22nd International Conference on Financial Cryptography and Data Security (FC). Springer, 2018. [Online]. Available: https://fc18.ifca.ai/preproceedings/6.pdf (Accessed 2018-06-05).

78 Appendix A

Declaration of independence

I hereby certify that I have written this thesis independently and have only used the specified sources and resources indicated in the bibliography.

Menlo Park, CA, 27th August 2018

...... Jens Röwekamp

79 Appendix B

Installation and usage guide

Installation

This section shows the installation of a docShare instance based on a fresh 64Bit Ubuntu 17.10 Desktop installation.1

First, extract the contents of the Code.tar archive of appendixC to the Desktop.

Second, install the dependencies Geth, Tor and the required Python3 libraries:

Listing B.1: Software dependency installation. sudo apt−get install software −p r o p e r t i e s −common sudo add−apt−r e p o s i t o r y −y ppa:ethereum/ethereum sudo apt−get update sudo apt−get install ethereum tor python3−pip g i t cd ~/Desktop/docShare/bin/ pip3 install −r requirements.txt Third, configure the Tor daemon in /etc/tor/torrc to forward ports 2342, 8888, and 9999 to the Tor network as Tor hidden services.

Listing B.2: Tor hidden service configuration for docShare. HiddenServiceDir /var/lib/tor/docShare/ HiddenServicePort 2342 127.0.0.1:2342 HiddenServicePort 8888 127.0.0.1:8888 HiddenServicePort 9999 127.0.0.1:9999 Afterwards restart the Tor daemon to apply the new configuration.

Fourth, initialize the Ethereum testnet and determine your enode id for the testnet mashup.

1Used instllation medium: ubuntu-17.10.1-desktop-amd64.iso

80 Listing B.3: Ethereum testnet initialization. cd ~/Desktop/Ethereum\ Testnet/ ./start.sh init ./connect.sh admin. nodeInfo exit . / stop . sh Afterwards, configure the Ethereum testnet by setting obtained enode information and IP information in ∼/Desktop/Ethereum Testnet/start.sh for every node you want to communicate with. Now you can start the Ethereum testnet again, create your wallet information and mine Ether. Listing B.4: Ethereum testnet account initialization. cd ~/Desktop/Ethereum\ Testnet/ . / s t a r t . sh ./connect.sh admin. peers personal .newAccount() miner. start() exit Fifth, initialize the smart contracts in the Ethereum testnet using the webservice from remix.ethereum.org. Open Firefox and browse to http://remix.ethereum.org and use the open icon to add a local file. Select ∼/Desktop/Smart Contracts/IMS.sol and open it from the browser. Afterwards switch to the run tab on the right and select web3 provider as environment. Confirm http://localhost:8545 to interact with the local Ethereum testnet. Before you can deploy the smart contract you have to unlock the account. Therefore use connect.sh to connect to Geth and type personal.unlockAccount(eth.accounts[0]). Now you can hit the deploy button in the remix IDE to deploy the contract. Save the contract address and repeat the procedure for ∼/Desktop/Smart Contracts/KV.sol to deploy the smart contract for the key value store. Once both smart contracts are published you have to adapt the scripts ∼/Desktop/- docShare/bin/docShare.py, ∼/Desktop/Contract Webfrontends/js/kv.js, and ∼/- Desktop/Contract Webfrontends/js/app.js to use the right contract addresses.

Sixth, initialize the docShare database and create your asymmetric key pair. Listing B.5: docShare initialization. cd ~/Desktop/docShare/bin ./initialize.py Seventh, start the web-frontend to create a new identity in the identity management system. Listing B.6: Web-frontend start and identity registration in IMS. cd ~/Desktop/Contract Webfrontends/ . / s t a r t . sh cd ~/Desktop/Ethereum\ Testnet/ ./connect.sh personal.unlockAccount(eth.accounts [0]) exit Afterwards, navigate in Firefox to http://localhost:8080 and add your contact details for the IMS. Please ensure that the hidden service address equals the hidden service address in /var/lib/tor/docShare/hostname and the RSA public key equals the public key in ∼/Desktop/docShare/public.pem.

Finally, feel free to set your assigned IMS ID as ownId in ∼/Desktop/docShare/- lib/docShare.py.

Now you can start docShare via:

Listing B.7: docShare start. cd ~/Desktop/docShare/ . / s t a r t . sh

Usage

User registration

Listing B.8: Actions on docShare1 cd ~/Desktop/docShare/bin ./add_partner.py 2 docShare2 blub blab ./show_partner.py

Listing B.9: Actions on docShare2 cd ~/Desktop/docShare/bin ./add_partner.py 0 docShare1 blab blub ./show_partner.py Document sharing with regards to private

Listing B.10: Actions on docShare1 cd ~/Desktop/docShare/bin ./add_share.py test_msg.py test_msg.py docShare2 ./show_shares.py Listing B.11: Actions on docShare2 cd ~/Desktop/docShare/bin ./show_received.py ./download_received.py 1 Document sharing with regards to tracking

Listing B.12: Actions on docShare1 cd ~/Desktop/docShare/bin ./add_confirmed_share.py initialize .py "A␣Python␣script␣that ␣initializes␣docShare ’s␣database␣and␣keys" 1 docShare2 ./show_shares.py

Listing B.13: Actions on docShare2 cd ~/Desktop/docShare/bin ./show_received.py ./download_received.py 2 Appendix C

Digital Content

data medium here

SHA-256: aeb41948850f541aec9686c861ee5c3204950901cd0d579caf652f09c293007c hashsums.txt

84