XRD: Scalable Messaging System with Cryptographic Privacy
Total Page:16
File Type:pdf, Size:1020Kb
XRD: Scalable Messaging System with Cryptographic Privacy Albert Kwon David Lu Srinivas Devadas MIT MIT PRIMES MIT Abstract As such, many recent messaging systems have been tar- geting scalability as well as formal security guarantees. Sys- Even as end-to-end encrypted communication becomes tems like Stadium [52] and Karaoke [37], for instance, use more popular, private messaging remains a challenging prob- differential privacy [22] to bound the information leakage lem due to metadata leakages, such as who is communicat- on the metadata. Though this has allowed the systems to ing with whom. Most existing systems that hide commu- scale to more users with better performance, both systems nication metadata either (1) do not scale easily, (2) incur leak a small bounded amount of metadata for each message, significant overheads, or (3) provide weaker guarantees than and thus have a notion of “privacy budget”. A user in these cryptographic privacy, such as differential privacy or heuris- systems then spends a small amount of privacy budget ev- tic privacy. This paper presents XRD (short for Crossroads), ery time she sends a sensitive message, and eventually is not a metadata private messaging system that provides crypto- guaranteed strong privacy. Users with high volumes of com- graphic privacy, while scaling easily to support more users munication could quickly exhaust this budget, and there is no by adding more servers. At a high level, XRD uses multiple clear mechanism to increase the privacy budget once it runs mix networks in parallel with several techniques, including out. Scalable systems that provide stronger cryptographic a novel technique we call aggregate hybrid shuffle. As a re- privacy like Atom [34] or Pung [5], on the other hand, do sult, XRD can support 2 million users with 228 seconds of not have such a privacy budget. However, they rely heav- latency with 100 servers. This is 13:3× and 4× faster than ily on expensive cryptographic primitives such as public key Atom and Pung, respectively, which are prior scalable mes- encryption and private information retrieval [3]. As a result, saging systems with cryptographic privacy. they suffer from high latency, in the order of ten minutes or 1 Introduction longer for a few million users, which impedes their adoption. This paper presents a point-to-point metadata private mes- Many Internet users today have turned to end-to-end en- saging system called XRD that aims to marry the best aspects crypted communication like TLS [18] and Signal [40], to of prior systems. Similar to several recent works [5, 34, 52], protect the content of their communication in the face of XRD scales with the number of servers. At the same time, widespread surveillance. While these techniques are starting the system cryptographically hides all communication meta- to see wide adoption, they unfortunately do not protect the data from an adversary who controls the entire network, a metadata of communication, such as the timing, the size, and constant fraction of the servers, and any number of users. the identities of the end-points. In scenarios where the meta- Consequently, it can support virtually unlimited amount of data are sensitive (e.g., a government officer talking with a communication without leaking privacy against such an ad- journalist for whistleblowing), encryption alone is not suffi- versary. Moreover, XRD only uses cryptographic primitives cient to protect users’ privacy. that are significantly faster than the ones used by prior works, Given its importance, there is a rich history of works and can thus provide lower latency and higher throughput that aim to hide the communication metadata, starting with than prior systems with cryptographic security. mix networks (mix-nets) [10] and dining-cryptographers net- A XRD deployment consists of many servers. These works (DC-Nets) [11] in the 80s. Both works provide for- servers are organized into many small chains, each of which mal privacy guarantees against global adversaries, which acts as a local mix-net. Before any communication, each user has inspired many systems with strong security guaran- creates a mailbox that is uniquely associated with her, akin tees [14, 55, 35, 53]. However, mix-nets and DC-nets re- to an e-mail address. In order for two users Alice and Bob to quire the users’ messages to be processed by either central- have a conversation in the system, they first pick a number ized servers or every user in the system, making them dif- of chains using a specific algorithm that XRD provides. The ficult to scale to millions of users. Systems that build on algorithm guarantees that every pair of users intersects at one them typically inherit the scalability limitation as well, with of the chains. Then, Alice and Bob send messages addressed overheads increasing (often superlinearly) with the number to their own mailboxes to all chosen chains, except to the of users or servers [14, 55, 35, 53]. For private communi- chain where their choices of chains align, where they send cation systems, however, supporting a large user base is im- their messages for each other. Once all users submit their perative to providing strong security; as aptly stated by prior messages, each chain shuffles and decrypts the messages, works, “anonymity loves company” [20, 49]. Intuitively, the and forwards the shuffled messages to the appropriate mail- adversary’s goal of learning information about a user natu- boxes. Intuitively, XRD protects the communication meta- rally becomes harder as the number of users increases. data because (1) every pair of users is guaranteed to meet at a chain which makes it equally likely for any pair of users to 2 Related work be communicating, and (2) the mix-net chains hide whether In this section, we discuss related work by categorizing a user sent a message to another user or herself. the prior systems primarily by their privacy properties, and One of the main challenges of XRD is addressing active at- also discuss the scalability and performance of each system. tacks by malicious servers, where they tamper with some of Systems with cryptographic privacy. Mix-nets [10] and the users’ messages. This challenge is not new to our system, DC-Nets [11] are the earliest examples of works that provide and several prior works have employed expensive crypto- cryptographic (or even information theoretic) privacy guar- graphic primitives like verifiable shuffle [14, 55, 35, 52, 34] antees against global adversaries. Unfortunately, they have or incurred significant bandwidth overheads [34] to prevent two major issues. First, they are weak against active attack- such attacks. In XRD, we instead propose a new technique ers: adversaries can deanonymize users in mix-nets by tam- called aggregate hybrid shuffle that can verify the correctness pering with messages, and can anonymously deny service in of shuffling more efficiently than traditional techniques. DC-Nets. Second, they do not scale to large numbers of users XRD has two significant drawbacks compared to priorp sys- because all messages must be processed by either a small tems. First, with N servers, each user must send O( N) number of servers or every user in the system. Many systems messages in order to ensure that every pairp of users inter- that improved on the security of these systems against active sects. Second, because each user sends Op( N) messages, attacks [14, 55, 35, 53] suffer from similar scalability bottle- the workload of each XRD server is O(M= N) for M users, necks. Riposte [13], a system that uses “private information rather than O(M=N) like many prior scalable messaging sys- storage” to provide anonymous broadcast, also requires all tems [34, 52, 37]. Thus, prior systems could outperform servers to handle a number of messages proportional to the XRD in deployment scenarios with large numbers of servers number of users, and thus faces similar scalability issues. and users, since the cost of adding a single user is higher and A recent system Atom [34] targets both scalability and adding servers is not as beneficial in XRD. strong anonymity. Specifically, Atom can scale horizon- Nevertheless, our evaluation suggests that XRD outper- tally, allowing it to scale to larger numbers of users simply forms prior systems with cryptographic guarantees if there by adding more servers to the network. At the same time, are less than a few thousand servers in the network. XRD it provides sender anonymity [46] (i.e., no one, including can handle 2 million users (comparable to the number of the recipients, learns who sent which message) against an daily Tor users [1]) in 228 seconds with 100 servers. For adversary that can compromise any fraction of the servers Atom [34] and Pung [5, 4], two prior scalable messaging and users. However, Atom employs expensive cryptography, systems with cryptographic privacy, it would take over 50 and requires the message to be routed through hundreds of minutes and 15 minutes, respectively. (These systems, how- servers in series. Thus, Atom incurs high latency, in the or- ever, can defend against stronger adversaries, as we detail in der of tens of minutes for a few million users. x2 and x7.) Moreover, the performance gap grows with more Pung [5, 4] is a system that aims to provide metadata pri- users, and we estimate that Atom and Pung require at least vate messaging between honest users with cryptographic pri- 1,000 servers in the network to achieve comparable latency vacy. This is a weaker notion of privacy than that of Atom, with 2 million or more users. While promising, we find that as the recipients (who are assumed to be honest) learn the XRD is not as fast as systems with weaker security guaran- senders of the messages.