Cachet: a Decentralized Architecture for Privacy Preserving Social Networking with Caching
Total Page:16
File Type:pdf, Size:1020Kb
Cachet: A Decentralized Architecture for Privacy Preserving Social Networking with Caching Shirin Nilizadeh Sonia Jahid Prateek Mittal Indiana University University of Illinois at University of California, Bloomington Urbana-Champaign Berkeley [email protected] [email protected] [email protected] Nikita Borisov Apu Kapadia University of Illinois at Indiana University Urbana-Champaign Bloomington [email protected] [email protected] ABSTRACT and (b) use of social contacts for object caching results in Online social networks (OSNs) such as Facebook and significant performance improvements. Google+ have transformed the way our society communi- cates. However, this success has come at the cost of user Categories and Subject Descriptors privacy; in today's OSNs, users are not in control of their C.2.4 [Computer-Communication Networks]: Dis- own data, and depend on OSN operators to enforce access tributed Systems|Distributed Applications; K.6.m control policies. A multitude of privacy breaches has spurred [Management of Computing and Information research into privacy-preserving alternatives for social net- Systems]: Miscellaneous|Security working, exploring a number of techniques for storing, dis- seminating, and controlling access to data in a decentral- ized fashion. In this paper, we argue that a combination General Terms of techniques is necessary to efficiently support the complex Algorithms, Security functionality requirements of OSNs. We propose Cachet, an architecture that provides strong Keywords security and privacy guarantees while preserving the main functionality of online social networks. In particular, Cachet privacy, peer-to-peer systems, social networking, caching protects the confidentiality, integrity and availability of user content, as well as the privacy of user relationships. Cachet 1. INTRODUCTION uses a distributed pool of nodes to store user data and en- In the last decade, online social networks (OSNs) such sure availability. Storage nodes in Cachet are untrusted; we as Facebook, Google+, and Twitter have revolutionized the leverage cryptographic techniques such as attribute-based way our society communicates and have become the de facto encryption to protect the confidentiality of data. For ef- mechanism for information sharing between users. Their ficient dissemination and retrieval of data, Cachet uses a user bases exceed hundreds of millions of users and their hybrid structured-unstructured overlay paradigm in which adoption is still growing at a rapid pace.1 a conventional distributed hash table is augmented with so- However, the success of OSNs has come at the cost of user cial links between users. Social contacts in our system act privacy. Users are not in control of their data and depend as caches to store recent updates in the social network, and on the OSN operator to protect their sensitive information. help reduce the cryptographic as well as the communication Users' expectations of privacy are often at odds with the op- overhead in the network. erator's business incentives, and, in fact, several providers We built a prototype implementation of Cachet in the have been caught selling user data [53]. Moreover, the pri- FreePastry simulator. To demonstrate the functionality of vacy policies of OSNs are often hard to understand, and existing OSNs we implemented the `newsfeed' application. constant changes therein further magnify this problem [48]. Our evaluation demonstrates that (a) decentralized architec- Additionally, existing centralized architectures present a sin- tures for privacy preserving social networking are feasible, gle point of failure in the system. Any vulnerability in these systems (or even accidental leaks) can be exploited by a ma- licious adversary to obtain unencrypted sensitive user data. This lack of user privacy in deployed OSNs has spurred re- search into the design of mechanisms for privacy-preserving Permission to make digital or hard copies of all or part of this work for social networking. Some work has focused on using cryptog- personal or classroom use is granted without fee provided that copies are raphy to protect the contents stored by a centralized OSN not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to provider [7, 17, 30]; our view, however, is that this does not republish, to post on servers or to redistribute to lists, requires prior specific sufficiently protect users' privacy as it allows the provider permission and/or a fee. to learn user relationships and patterns of interactions by CoNEXT’12, December 10–13, 2012, Nice, France. 1 Copyright 2012 ACM 978-1-4503-1775-7/12/12 ...$15.00. http://newsroom.fb.com/ 337 means of traffic analysis. Decentralized architectures can cryptographic capabilities used by the storage nodes to au- address this issue, yet present a new series of challenges, as thenticate update requests and attribute-based encryption in addition to confidentiality and integrity protection that used to provide flexible and fine-grained access policies. To cryptography can provide, it is necessary to ensure the avail- reduce computational overhead cached containers are stored ability of and efficient access to data that is necessary to in decrypted form and are shared with other contacts upon support common OSN functionality, such as a `newsfeed'. verifying that they satisfy the corresponding access policy; Previous work on decentralized OSNs [3, 7, 13, 18, 22, 40] as such, containers must be decrypted only when fetched has explored several design decisions: how nodes are orga- directly from the DHT. Storage nodes are trusted only to nized (in a structured distributed hash table (DHT) or with provide availability of the data with replication used to de- links between social contacts), where content is stored (by fend against node failures or intentional misbehavior. the owner, social contacts, or in a DHT), how it is dissem- We develop a prototype implementation of Cachet in the inated (push or pull), and how access control is enforced FreePastry simulator [51]. To demonstrate the functional- (cryptographically or with online authentication). We ar- ity of existing OSNs, we also build and evaluate the news- gue that to efficiently support the complex functionality of feed application. Our results show the importance of us- an OSN, a combination of methods must be used. For ex- ing social caching, which reduces the latency of displaying a ample, replicating data at random DHT nodes ensures its newsfeed from hundreds of seconds in the base architecture availability even when users are offline; however, assembling to less than 10. Our architecture thus demonstrates how a a newsfeed requires thousands of DHT lookups and, as we careful combination of several distributed systems and cryp- saw in our preliminary work [32], can take hundreds of sec- tographic techniques can be used to provide a compelling onds to generate. Likewise, encrypting stored content can privacy-preserving alternative to centralized OSNs. provide strong confidentiality guarantees, yet we found that we need online authentication of updates and annotations 2. REQUIREMENTS AND PROPERTIES to ensure the availability of data. Furthermore, attribute- Functional Model. At a high level OSN functionality con- based encryption schemes that provide highly flexible poli- sists primarily of users sharing some form of content with cies are computationally expensive and contribute signifi- their contacts, who then view, comment on, and annotate cantly to the above performance overhead. it. To support this generic functionality, we define a con- In our preliminary design [32], we showed how the confi- tainer object that consists of a main content object and a dentiality and integrity of data can be protected by a crypto- list of annotations. The main content can take many types, graphic mechanism so that they can be stored in untrusted such as a status, a web URL, a photo, or a collection of nodes of a DHT. However, the design suffers from perfor- container objects (e.g., a photo album, or a `wall'). Annota- mance issues that arise due to the fetching and decryption tions take the form of references to other container objects. of hundreds of small objects belonging to friends, which is A key application common in OSNs is a `newsfeed', which required for viewing their walls or for viewing one's own aggregates and displays recent updates from a user's social newsfeed. We therefore propose Cachet, a decentralized ar- contacts; implementing such a newsfeed efficiently is a key chitecture for social networks that provides strong security challenge in a decentralized OSN. and privacy guarantees while efficiently supporting the cen- The container also becomes the unit of access control, with tral functionality of OSNs. Central to Cachet is a hybrid a potentially different set of permissions associated with a structured-unstructured overlay in which a conventional dis- container, internally referenced objects, and each individual tributed hash table is augmented with social links between annotation. Access policies can be defined over social con- users. We use the distributed hash table as a base storage tacts and their attributes, as well as social network distance layer, but add a gossip-based social caching algorithm that (e.g., `friend-of-friend'). Usable access controls in OSNs re- dramatically increases performance. New updates are im- main an area of active research [19,26,38], so a highly flexi- mediately propagated to online social contacts. When an ble and fine-grained permissions architecture is necessary to offline user comes back online a presence protocol is used to support future developments in this field. locate online contacts and query them directly for updates. Security Requirements. The primary security require- Additionally, these contacts are used to retrieve cached up- ments are confidentiality and integrity of user data, stored dates from mutual contacts who are offline as well as speed in distributed and untrusted storage nodes, and availability the discovery of other online contacts. The DHT is then of the correct and latest version of the data.