Stadium: a Distributed Metadata-Private Messaging System

Stadium: A Distributed Metadata-Private Messaging System Nirvan Tyagi Yossi Gilad Derek Leung Cornell University Boston University MIT CSAIL MIT CSAIL Matei Zaharia Nickolai Zeldovich Stanford University MIT CSAIL ABSTRACT CCS CONCEPTS Private communication over the Internet remains a challeng- • Security and privacy → Pseudonymity, anonymity ing problem. Even if messages are encrypted, it is hard to and untraceability; Privacy-preserving protocols; Dis- deliver them without revealing metadata about which pairs tributed systems security; of users are communicating. Scalable anonymity systems, such as Tor, are susceptible to traffic analysis attacks that KEYWORDS leak metadata. In contrast, the largest-scale systems with anonymous communication, differential privacy, mixnet, ver- metadata privacy require passing all messages through a ifiable shuffle small number of providers, requiring a high operational cost for each provider and limiting their deployability in practice. This paper presents Stadium, a point-to-point messaging 1 INTRODUCTION system that provides metadata and data privacy while scal- The continued prominence of anonymous whistleblowing ing its work efficiently across hundreds of low-cost providers and private communication in world affairs means that these operated by different organizations. Much like Vuvuzela, issues, and the systems that enable them, have become an the current largest-scale metadata-private system, Stadium integral part of society. As a result, there is substantial inter- achieves its provable guarantees through differential privacy est in systems that offer strong privacy guarantees—often and the addition of noisy cover traffic. The key challenge in against a global adversary with the ability to monitor and Stadium is limiting the information revealed from the many inject network traffic, such as an ISP or nation-state. observable traffic links of a highly distributed system, with- Services such as WhatsApp, Signal, and Telegram have de- out requiring an overwhelming amount of noise. To solve ployed end-to-end encryption solutions to popular reception this challenge, Stadium introduces techniques for distributed from the public. Even so, encryption is only the first step noise generation and differentially private routing as well as in protecting users’ communications. Although encryption a verifiable parallel mixnet design where the servers collabo- hides communication content, it does not hide metadata be- ratively check that others follow the protocol. We show that cause adversaries can still learn who is communicating with Stadium can scale to support 4× more users than Vuvuzela whom and at what times. Metadata reveals a great deal of using servers that cost an order of magnitude less to operate information; in fact, NSA officials have stated that “if you than Vuvuzela nodes. have enough metadata, you don’t really need content” [39]. In recent years, researchers have made significant progress in designing systems with provable protection against meta- Permission to make digital or hard copies of all or part of this work for data leakage [1, 12, 14, 15, 45]. Despite the variation in solu- personal or classroom use is granted without fee provided that copies tions, these past systems share a similar deployment strategy are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights and trust scheme that limits their scalability (with a few no- for components of this work owned by others than the author(s) must table exceptions [1, 32], see §11). Specifically, in most of these be honored. Abstracting with credit is permitted. To copy otherwise, or systems, a small set of providers carries out the anonymity republish, to post on servers or to redistribute to lists, requires prior specific protocol, and users must trust that at least one provider in permission and/or a fee. Request permissions from [email protected]. the set is honest (i.e. “the anytrust assumption”). SOSP ’17, October 28, 2017, Shanghai, China This strategy limits the number of users these systems © 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery. can support in two main ways. First, because each individ- ACM ISBN 978-1-4503-5085-3/17/10. ual provider is responsible for ensuring global privacy, each https://doi.org/10.1145/3132747.3132783 provider’s workload increases proportionally with the total SOSP ’17, October 28, 2017, Shanghai, China Tyagi et al. number of users of the system (e.g., mixing messages sent by Stadium mixes user messages and noise messages together all users). For example, Vuvuzela [45] reported exchanging using a mixnet architecture, which takes as input a set of 68,000 messages per second with 3 providers at a bandwidth messages and outputs a random permutation of those mes- cost of 1.3 Gbps per provider, which would cost each provider sages [8]. Furthermore, Stadium requires its mixnet to scale around $1,000 per month [36]. Second, users must trust one horizontally. Messages must be distributed among providers of the system’s few participating providers. As the number and mixed in parallel so that each provider only handles a of users grows, it becomes hard to find a small set of organi- fraction of messages. Secure and efficient parallel mixing is zations for which the anytrust assumption holds for all users. challenging due to the observable links between distributed Relaxing this assumption by adding more providers quickly mixing servers; advanced traffic analysis techniques can be degrades performance by raising costs that are at least linear used to trace messages [6]. in the number of providers [14, 45]. In summary, these sys- We provide a differential privacy analysis of Stadium’s tems are restricted by a combination of a small number of mixnet routing to bound the information leaked through providers and a large operating cost per provider. We refer traffic analysis. To meet the requirements of the differential to these systems as scaling vertically, in that providers must privacy model, we augment Stadium’s parallel mixnet with a run more infrastructure to support more users. number of novel verifiable processing techniques, allowing In contrast, the Tor network [18] is an example of an providers to verify that other providers are correctly relaying anonymity system that scales horizontally. Tor consists of and mixing messages. Providers are organized into small thousands of commodity servers run by volunteer providers. groups called mixchains, and messages are distributed among Instead of trusting any single provider, users trust that some mixchains and mixed in parallel. The mixing and verification fraction of participating providers are honest. The total load workload of a provider is confined to the fraction of messages of all users on the system is distributed evenly across the that flow through its mixchain. As long as there exists a single providers. Thus, the system scales to support more users with honest provider in each mixchain, Stadium achieves global the addition of new providers. However, Tor is vulnerable to verification and mixing. traffic analysis and injection attacks that can deanonymize Stadium’s scalability does come with one significant caveat. users and leak metadata [2, 7, 21, 30, 46]. In order to achieve comparable privacy to previous systems, This paper presents Stadium, a point-to-point messaging Stadium requires both more noise and a higher computation system that uses a horizontally scaling deployment strat- cost due to verifiable processing, which often translates intoa egy and provides provable privacy guarantees. We show higher latency (several minutes vs. 30–60 seconds). Nonethe- that Stadium scales to support up to 4× more users than less, Stadium’s total bandwidth cost, which constitutes the previous systems with provable guarantees, using servers dominant monetary cost for providers, is within a factor of whose operating costs are an order of magnitude smaller 1 to 3 of Vuvuzela, and Stadium can divide this cost across than the equivalent Vuvuzela deployment. For example, at hundreds of small providers. Therefore, we believe that Sta- a latency of 2 minutes, we find Stadium can support more dium is a compelling new design point for metadata-private than 20 million simultaneous users with 100 providers, while messaging for two reasons: it can be deployed incremen- Vuvuzela [45], the current largest-scale metadata-private tally using smaller providers, and it does not require a single system, is limited to 5 million. Additionally, Stadium’s ca- centralized anytrust set as in previous systems. pacity scales near-linearly with the number of participating To summarize, our contributions are: providers, i.e., adding a provider additively increases the total number of users that can be supported. • The design for a horizontally scaling metadata-private Like Vuvuzela, Stadium uses differential privacy to bound messaging system that can distribute work efficiently the metadata leakage of the system over time. Informally, across hundreds of servers. differential privacy can be thought of as “plausible deni- • A novel verifiable parallel mixnet design that allows ability.” By observing the system, an adversary may gain participating providers to efficiently verify the correct- some small amount of statistical information in learning ness of the mix, including techniques such as hybrid the communication patterns

Load more