2019 38th Symposium on Reliable Distributed Systems (SRDS)

Making Federated Networks More Distributed

Alex Auvolat Univ. Rennes, Inria, CNRS, IRISA [email protected]

Abstract—Federated networks such as or Matrix their actions, are replaced by a distributed scheme based have seen rising usage thanks to their ability to provide users on public key where users detain their with good privacy and independence from large service providers, identity themselves in the form of a private signing key. while retaining the familiar model of server-backed websites or • mobile apps with its advantages of speed, availability and ease Authoritative copies of objects (such as a user’s set of of use. However such systems are fragile since each individual messages) held by a single server are replaced by a server of the federation is a single point of failure for its users. distributed data model where any node may produce We argue that new secure distributed algorithms could be updates, as long as they are signed by an authorized conceived and applied without changing the server-backed nature user, and may do so without coordination with other of the system, and that such a configuration would provide systemic resilience and better independence of end users from nodes thanks to eventually consistent data types such as their service providers, without sacrificing privacy, availability, CRDTs [3]. efficiency or ease of use. In this manner, users may store their data on several servers Index Terms —distributed system, online social networks, fed- to ensure redundancy. When any issue arises with a specific eration of servers², privacy, zero-knowledge architecture, CRDT server, they may easily switch to another server node. The process of selecting server nodes and switching from one to I. INTRODUCTION another could be completely automated and invisible to the user. Mainstream online applications rely on massive centralized In this paper, we propose a preliminary exploration of this platforms that have demonstrated several important modes architecture and of its impacts on privacy and usability. of failure such as non-respect of user privacy, vendor lock- in, abuse of authority and conflicts of interest. Acting upon II. A DECENTRALIZED DATA MODEL the observation that centralized platforms owned by large To make the architecture we propose possible, we must corporations cannot be trusted with users’ personal data, the define a new distributed data model that allows for eventual academic community as well as the Free and Open Source consistency and gives authority to the users. We propose Software movement have produced many alternative systems. a data model where data is stored in objects that have a Federated networks [1], [2] are such systems that have caught defined access control policy as well as CRDT semantics for up in usage thanks to the familiar model they provide to users conflict-free merges, enabling eventual consistency without of access through simple and efficient websites or mobile apps. coordination. Applications may combine many objects with However we argue that federated networks are inherently different parameters in order to provide familiar functionalities fragile, since they rely strongly on each individual server of such as chat rooms, news feeds, discussion forums or private the federation to realize critical tasks such as identity manage- messages. In order to protect user privacy, only encrypted ment, access control and data management. The diversity of data is stored in these objects and applications have a key providers involved augments the probability of crash failure, distribution mechanism so that only authorized clients and no data leak and Byzantine behaviour by a provider. Moreover server may obtain the encryption keys [4], [5]. since many such servers are run by volunteers, the probability The CRDTs that have been studied up to date do not of failure of the social support structure (lack of funding, lack incorporate a permission model: all nodes that participate in of members’ involvement) is also high. the system are allowed to update the data. In order to imple- To correct these issues, authority and trust must be removed ment a permission model, verification of user identity must from the individual servers, allowing for redundancy without be achieved. This can be achieved in a decentralized fashion coordination. To realize this vision, we suggest an architecture using public key cryptography where end users’ devices hold where: the private keys. Thus all updates produced to a shared object • Centralized user directories at each server of the feder- are signed by the key of a user, and the validity of updates is ation, which are used to identify users and authenticate proved by such a signature, and not by authority of a server. The simplest example of such an object is a set of messages This work has been partially supported by the French ANR projects where only messages signed by an authorized user may be DESCARTES n. ANR-16-CE40-0023, and PAMELA n. ANR-16-CE23-0016. I wish to thank my advisors, Franc¸ois Ta¨ıani and David Bromberg, for their stored and no message may be deleted. Conflict resolution feedback and support. consists simply in exchanging missing messages from one side

2575-8462/19/$31.00 ©2019 IEEE 383 DOI 10.1109/SRDS47363.2019.00058 to the other. All applications can be implemented over such a A detailed analysis of the properties of our system under a simple abstraction [6], [7], however the ever-growing nature defined thread model remains to be done. While our primary of the data makes its use impractical. aim is not to provide stronger privacy than existing solutions To allow for the data to shrink, updates must be allowed to such as to end-to-end encrypted Matrix, an interesting av- supersede previous events. However allowing any user to pro- enue of research consists in finding appropriate cryptographic duce an update that erases previous events allows for abusive techniques for ensuring CRDT safety in a sufficiently general deletion of information, which must be prevented. In order to model while revealing as little as possible on the identity of implement safely any specification of update and delete rights participants. without a trusted authoritative server, a set of users could be IV. USABILITY CONSIDERATIONS designated as verifiers whose validation is required before any data is terminally erased. A collective verification scheme has Being based on a client/server model, our system could be already been proposed by Frientegrity [8] in the context of an implemented as a traditional website where all encryption is untrusted central server. We propose to revisit this scheme in done by the JavaScript client code and keys are kept in the a distributed setting where events do not form a linear stream browser’s local storage. With the help of servers for storing, ordered by a central server but form a DAG of events produced serving and disseminating information, the performance loss in a decentralized fashion such as in the Matrix network [2]. compared to a regular centralized or federated service could be In such a scheme, each event of the DAG would contain a minimal. Lightweight devices such as smartphones could also reference to the whole state of the object at that point, for connect to the service without needing to store or exchange instance in the form of a Merkle tree [9]. The validity of such large amounts of data. Thanks to compact proofs of object a state would be proved by a small number of predecessors state validity, clients need not trust the servers in any fashion in the event DAG, under the hypothesis that no more than and can verify the provided data at a limited cost. a certain number of trusted validators are Byzantine 1. Thus Identity and key management remains an issue for non- nodes of the network could safely delete older events and states technical users, however we believe that good user interface and reclaim unused space. design can help with this task. Mechanisms and user interfaces to ease key management is an actively researched area in the III. IMPACTS ON PRIVACY context of cryptocurrencies, with solutions such as social key recovery or hardware wallets. With standard end-to-end encryption [4], [5], neither the server nor an unauthorized eavesdropper may read the content V. P ERSPECTIVES AND FUTURE WORK of messages. However metadata such as the timing of mes- Future works will consist in formalizing our system and sages and identity of participants in a conversation can still providing a detailed technical analysis of the privacy and be leaked. Strong metadata privacy requires costly approaches safety properties with respect to the goals we have set. In this such as Vuvuzela [10]. regard, precise threat models and attack scenarios will have Unfortunately, our proposed scheme for CRDTs with ver- to be defined and analyzed. A first prototype could then be ified states is at odds with metadata privacy, since updates implemented and evaluated from an efficiency point of view. must be signed by a valid key from a set of authorized users, In addition to micro-benchmarks, direct comparisons with thus revealing participant identity and activity to the servers. systems such as Secure Scuttlebutt (SSB) [6] could be made However we propose several means to improve data protection: for instance by crawling the SSB network and replaying the • A honest server node will communicate object data to whole set of events in the different systems that are evaluated. another node only when authorized by a user that has REFERENCES read rights to that object. Thus metadata may leak only if one user accesses the object through a malicious or [1] https://joinmastodon.org/. [2] https://matrix.org/. compromised server. [3] M. Shapiro, N. Preguic¸a, C. Baquero, and M. Zawirski, “Conflict-Free • The client generates different key pairs for every object Replicated Data Types,” in SSS, 2011. with which the user interacts, so that analysis of the data [4] S. Taheri-Boshrooyeh, A. Kupcu, and O. Ozkasap, “Security and Privacy of Distributed Online Social Networks,” in ICDCS Workshops, 2015. at rest reveals limited information on the social graph. [5] A. De Salve, P. Mori, and L. Ricci, “A survey on privacy in decentralized • If required, clients can connect to servers through an online social networks,” Computer Science Review, 2018. anonymous network such as , making the link between [6] https://www.scuttlebutt.nz/. [7] https://github.com/orbitdb/orbit-db. a pseudonymous identity and the physical person of the [8] A. J. Feldman, A. Blankstein, M. J. Freedman, and E. W. Felten, “Social user untraceable. Networking with Frientegrity: Privacy and Integrity with an Untrusted • Our data model is generic and not limited to the federa- Provider,” in USENIX Security, 2012. [9] A. Auvolat and F. Ta¨ıani, “Merkle search trees: Efficient state-Based tion of servers model. It could be used in more restrictive CRDTs in open networks,” in SRDS, 2019. network topologies such as friend-to-friend networks. [10] J. Van Den Hooff, D. Lazar, M. Zaharia, and N. Zeldovich, “Vuvuzela: Scalable private messaging resistant to traffic analysis,” in SOSP, 2015. 1In the case of a permissioned network with a small number of known participants, tolerance to one or a few Byzantine node is already much stronger than zero Byzantine tolerance.

384