Talek: Private Group Messaging with Hidden Access Patterns

Talek: Private Group Messaging with Hidden Access Patterns Raymond Cheng William Scott Elisaweta Masserova University of Washington University of Washington Carnegie Mellon University [email protected] [email protected] [email protected] Irene Zhang Vipul Goyal Thomas Anderson Microsoft Research Carnegie Mellon University University of Washington [email protected] [email protected] [email protected] Arvind Krishnamurthy Bryan Parno University of Washington Carnegie Mellon University [email protected] [email protected] Abstract 1 Introduction Talek is a private group messaging system that sends messages Messaging applications depend on cloud servers to exchange data, through potentially untrustworthy servers, while hiding both data giving server operators full visibility into the communication pat- content and the communication patterns among its users. Talek terns between users. Even if the communication contents are en- explores a new point in the design space of private messaging; it crypted, network metadata can be used to infer which users share guarantees access sequence indistinguishability, which is among messages, when traffic is sent, where data is sent, and how muchis the strongest guarantees in the space, while assuming an anytrust transferred. This can allow the servers and/or network providers threat model, which is only slightly weaker than the strongest to infer the contents of the communication [53]. When remote threat model currently found in related work. Our results suggest hacking, insider threats, and government requests are common, that this is a pragmatic point in the design space, since it supports protecting the privacy of communications requires that we guaran- strong privacy and good performance: we demonstrate a 3-server tee security against a stronger threat model. For some users, e.g., Talek cluster that achieves throughput of 9,433 messages/second for journalists and activists, protecting metadata is critical to their job 32,000 active users with 1.7-second end-to-end latency. To achieve function and safety [68, 69]. its security goals without coordination between clients, Talek relies As we describe in §10, a wide variety of systems explore ways on information-theoretic private information retrieval. To achieve of protecting the privacy of such metadata. We can classify this good performance and minimize server-side storage, Talek intro- prior work into two groups based on the privacy guarantees of- duces new techniques and optimizations that may be of independent fered and the threat model each system defends against. The first interest, e.g., a novel use of blocked cuckoo hashing and support for group of work [8–10, 19, 46, 47, 50, 72] offers strong security guar- private notifications. The latter provide a private, efficient mech- antees against very strong threat models (e.g., assuming that only anism for users to learn, without polling, which logs have new the clients themselves are trusted). Unfortunately, this typically messages. imposes prohibitive computational or network costs. The second CCS Concepts group [2, 4, 28, 29, 31, 58–60, 63, 76, 79, 87, 89, 90] offers weaker security guarantees (such as k-anonymity [85], plausible deniabil- • Security and privacy ! Pseudonymity, anonymity and un- ity [52] or differential privacy [39, 40]) and often much weaker traceability; Privacy-preserving protocols. threat models too; e.g., a fraction of the servers must be honest. Keywords However, in exchange for weakening the guarantees and threat privacy, anonymity, messaging model, these systems often achieve impressive performance results. In this work, we explore an intriguing middle ground: we define arXiv:2001.08250v3 [cs.CR] 16 Dec 2020 ACM Reference Format: access sequence indistinguishability, a notion similar to (but slightly Raymond Cheng, William Scott, Elisaweta Masserova, Irene Zhang, Vipul stronger than) the security guarantees from systems in the first Goyal, Thomas Anderson, Arvind Krishnamurthy, and Bryan Parno. 2020. Talek: Private Group Messaging with Hidden Access Patterns. In Annual group, and combine it with an “anytrust” threat model [91], which Computer Security Applications Conference (ACSAC 2020), December 7–11, is slightly weaker than the threat models of the first group, but 2020, Austin, USA. ACM, New York, NY, USA, 16 pages. https://doi.org/10. stronger than those of the second group. Intuitively, the anytrust 1145/3427228.3427231 threat model assumes that different organisations, e.g., Mozilla, EFF, and WikiLeaks, each provide servers and at least one (unknown) Permission to make digital or hard copies of part or all of this work for personal or organisation can be trusted. The hope is that we can promise the classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation strong guarantees of the first group and the strong performance of on the first page. Copyrights for third-party components of this work must be honored. the second, while defending against a strong threat model. For all other uses, contact the owner/author(s). To explore this design point, we construct Talek, a private com- ACSAC 2020, December 7–11, 2020, Austin, USA munication system targeting small groups of trusted users com- © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8858-0/20/12. municating amongst themselves (e.g., friends chatting via IRC or https://doi.org/10.1145/3427228.3427231 ACSAC 2020, December 7–11, 2020, Austin, USA R. Cheng, W. Scott, E. Masserova, I. Zhang, V. Goyal, T. Anderson, A. Krishnamurthy, andB.Parno text messaging). To support such applications, Talek offers the ab- and writer anonymity for other logs are preserved. Applications straction of a private log with a single writer and multiple readers. that require broadcasts to many untrusted users (e.g., a public blog) Clients store and retrieve asynchronous messages on untrusted are better served by anonymous broadcast systems [28, 29, 31, 90]. servers without revealing any communication metadata. If the We have implemented two versions of Talek, one entirely in Go group of friends and at least one server are uncompromised, Talek and one that offloads PIR operations to a GPU; our code is publicly prevents an adversary from learning anything about their commu- available. We evaluated the system on a 3-server deployment using nication patterns. Combined with standard message encryption, Amazon EC2. To provide a realistic group messaging workload, we Talek conceals both the contents and metadata of clients’ application replay the Ubuntu IRC message logs from 2016 [3]. Overall, we find usage without sacrificing cloud reliability and availability. that this design point is surprisingly practical. Even with 32,000 Similar to prior systems, to hide communication patterns, Talek clients actively reading and writing messages according to a fixed clients issue fix-sized, random-looking network requests at a rate schedule every second, we show that clients use 148MB per day which is independent of application-level requests. Hence, application- to achieve an average end-to-end message latency of 1.7 seconds, level requests must occasionally be delayed, and “dummy” network measured from the time a sender enters a message to the time requests must be issued when no application requests are ready. the recipient sees it. Under this workload, our server supports a As with any privacy system, careful application-specific tuning is peak throughput of 9,433 messages per second, orders of magnitude necessary to trade off between the amount of cover traffic sentand better performance than systems with similar security goals. the latency of real application requests. In summary, we make the following contributions. Unlike prior systems with Talek’s strong guarantees, to achieve (1) Talek, a system that explores an important design point within good performance, Talek leverages the anytrust threat model, which private group messaging with strong guarantees, strong threat allows us to use information theoretic private information retrieval model, and high performance. (IT-PIR) [27, 35, 45]. IT-PIR requires an anytrust assumption, but (2) A novel use of blocked cuckoo hashing for IT-PIR. in exchange it avoids the use of heavyweight crypto operations (3) Private notifications which privately encode the set ofnew required for other flavors of PIR [57]. Abstractly, PIR allows a client messages, helping clients prioritize reads. to retrieve the 8-th record from a “database” of = items held col- (4) Two open-source implementations of Talek exploring the trade- lectively by ; servers, without the servers learning which record offs between CPU and GPU-based computation. was retrieved. However, PIR alone (of any flavor) is not enough to 2 Background: PIR support efficient group messaging. In particular, it does not explain how messages are privately written to the servers, how readers find Talek uses the privacy guarantees of PIR in the context of a group messages sent to them, nor how to structure the PIR database to messaging protocol. PIR allows a single client to retrieve a block facilitate maximum efficiency. from a set of storage replicas without revealing to any server the In Talek, within a group of clients reading and writing to a mes- blocks of interest to the client. There exist two major categories of sage log, a shared secret determines a pseudorandom, deterministic PIR techniques,

Load more