PIR-Tor: Scalable Anonymous Communication Using Private Information Retrieval ∗
Total Page:16
File Type:pdf, Size:1020Kb
PIR-Tor: Scalable Anonymous Communication Using Private Information Retrieval ∗ Prateek Mittal1 Femi Olumofin2 Carmela Troncoso3 Nikita Borisov1 Ian Goldberg2 1University of Illinois at Urbana-Champaign 2University of Waterloo 3K.U.Leuven/IBBT 1308 West Main Street 200 University Ave W Kasteelpark Arenberg 10 Urbana, IL, USA Waterloo, ON, Canada 3001 Leuven Belgium fmittal2,[email protected] ffgolumof,[email protected] [email protected] Abstract consists of about 2 000 relays and currently serves hun- dreds of thousands of users a day [45]. Tor is widely used Existing anonymous communication systems like Tor do by whistleblowers, journalists, businesses, law enforce- not scale well as they require all users to maintain up-to- ment and government organizations, and regular citizens date information about all available Tor relays in the sys- concerned about their privacy [46]. tem. Current proposals for scaling anonymous commu- Tor requires each user to maintain up-to-date infor- nication advocate a peer-to-peer (P2P) approach. While mation about all available relays in the network (global the P2P paradigm scales to millions of nodes, it pro- view). As the number of relays and clients increases, vides new opportunities to compromise anonymity. In the cost of maintaining this global view becomes pro- this paper, we step away from the P2P paradigm and ad- hibitively expensive. In fact, McLachlan et al. [22] vocate a client-server approach to scalable anonymity. showed that in the near future the Tor network could be We propose PIR-Tor, an architecture for the Tor net- spending more bandwidth for maintaining a global view work in which users obtain information about only a few of the system than for anonymous communication itself. onion routers using private information retrieval tech- Existing approaches to improving Tor’s scalability ad- niques. Obtaining information about only a few onion vocate a peer-to-peer approach. While the peer-to-peer routers is the key to the scalability of our approach, while paradigm scales to millions of relays, it also provides the use of private retrieval information techniques helps new opportunities for attack. The complexity of the de- preserve client anonymity. The security of our architec- signs makes it difficult for the authors to provide rigorous ture depends on the security of PIR schemes which are proofs of security. The result is that the security commu- well understood and relatively easy to analyze, as op- nity has been very successful at breaking the state-of-art posed to peer-to-peer designs that require analyzing ex- peer-to-peer anonymity designs [4, 6, 7, 23, 47, 48]. tremely complex and dynamic systems. In particular, we In this paper, we step away from the peer-to-peer demonstrate that reasonable parameters of our architec- paradigm and propose PIR-Tor, a scalable client-server ture provide equivalent security to that of the Tor net- approach to anonymous communication. The key obser- work. Moreover, our experimental results show that the vation motivating our architecture is that clients require overhead of PIR-Tor is manageable even when the Tor information about only a few relays (3 in the current network scales by two orders of magnitude. Tor network) to build a circuit for anonymous commu- nication. Currently, clients download the entire database 1 Introduction of relays to protect their anonymity from compromised directory servers. In our proposal, on the other hand, As more of our daily activities shift online, the issue of clients use private information retrieval (PIR) techniques user privacy comes to the forefront. Anonymous com- to download information about only a few relays. PIR munication is a privacy enhancing technology that en- prevents untrusted directory servers from learning any ables a user to communicate with a recipient without re- information about the clients’ choices of relays, and thus vealing her identity (IP address) to the recipient or a third mitigates route fingerprinting attacks [6, 7]. party (for example, Internet routers). Tor [10] is a de- We consider two architectures for PIR-Tor, based on ployed network for anonymous communication, which the use of computational PIR and information-theoretic PIR, and evaluate their performance and security. We ∗An extended version of this paper is available [26]. find that for the creation of a single circuit, the archi- tecture based on computational PIR provides an order plications of clients not having the global system view, of magnitude improvement over a full download of all and show that reasonable parameters of PIR-Tor provide descriptors, while the information-theoretic architecture equivalent security to Tor. provides two orders of magnitude improvement over a full download. However, in the scenario where clients 2.1 Distributed hash table based architec- wish to build multiple circuits, several PIR queries must be performed and the communication overhead of the tures computational PIR architecture quickly approaches that Distributed hash tables (DHTs), also known as struc- of a full download. In this case, we propose to perform tured peer-to-peer topologies, assign neighbor relation- only a few PIR queries and reuse their results for cre- ships using a pseudorandom but deterministic mathemat- ating multiple circuits, and discuss the security implica- ical formula based on IP addresses or public keys of tions of the same. On the other hand, for the information- nodes. theoretic architecture, we find that even with multiple cir- Salsa [29] is built on top of a DHT, and uses a spe- cuits, the communication overhead is at least an order of cially designed secure lookup operation to select random magnitude smaller than a full download. It is therefore relays in the network. The secure lookups use redundant feasible for clients to perform a PIR query for each de- checks to mitigate attacks that try to bias the result of the sired circuit. In particular, we show that, subject to cer- lookup. However, Mittal and Borisov [23] showed that tain constraints, this results in security equivalent to the Salsa is vulnerable to information leak attacks: as the at- current Tor network. With our improvements, the Tor tackers can observe a large fraction of the lookups in the network can easily sustain a 10-fold increase in both re- system, a node’s selection of relays is no longer anony- lays and clients. PIR-Tor also enables a scenario where mous and this observation can be used to compromise all clients convert to middle-only relays, improving the user anonymity [6,7]. Salsa is also vulnerable to a selec- security and the performance of the Tor network [9]. tive denial-of-service attack, where nodes break circuits The remainder of this paper is organized as follows. that they cannot compromise [4, 47]. We discuss related work in Section 2. We present a brief Panchenko et al. proposed NISAN [35] in which overview of Tor and private information retrieval in Sec- information-leak attacks are mitigated by a secure iter- tion 3. In Section 4, we give an overview of our system ative lookup operation with built-in anonymity. The se- architecture, and present the full protocol in Section 5. cure lookup operation uses redundancy to mitigate active We discuss the traffic analysis implications of our archi- attacks, but hides the identity of the lookup destination tecture in Section 6. Sections 7 and 8 contain our perfor- from the intermediate nodes by downloading the entire mance evaluation for the computational and information- routing table of the intermediate nodes and processing theoretic PIR proposals respectively. We discuss the the lookup operation locally. However, Wang et al. [48] ramifications of our design in Section 9, and finally con- were able to drastically reduce the lookup anonymity by clude in Section 10. taking into account the structure of the topology and the deterministic nature of the paths traversed by the lookup 2 Related Work mechanism. Torsk, introduced by McLachlan et al. [22], uses secret In contrast to our client-server approach, prior work buddy nodes to mitigate information leak attacks. Instead mostly advocates a peer-to-peer approach for scalable of performing a lookup operation themselves, nodes can anonymous communication. We can categorize existing instruct their secret buddy nodes to perform the lookup work on peer-to-peer anonymity into architectures that on their behalf. Thus, even if the lookup process is not are based on random walks on unstructured or structured anonymous, the adversary will not be able to link the topologies, and architectures that use a lookup operation node with the lookup destination (since the relationship in a distributed hash table. between a node and its buddy is a secret). However, the Besides these peer-to-peer approaches Mittal et aforementioned work of Wang et al. [48] also showed al. [25] briefly considered the idea of using PIR queries some vulnerabilities in the mechanism for obtaining se- to scale anonymous communication. However, their de- cret buddy nodes. scription was not complete, and their evaluation was very preliminary. In this paper, we build upon their work and 2.2 Random walk based architectures present a complete system architecture based on PIR. In contrast to prior work, we also consider the use of In MorphMix [38] the scalability problem in Tor is al- information-theoretic PIR, and show that it outperforms leviated by organizing relays in an unstructured peer-to- computational PIR based Tor architecture in many scal- peer overlay, where each relay has knowledge of only a ing scenarios. We also provide an analysis of the im- few other relays in the system. For building circuits, an initiator performs a random walk by first selecting a ran- of clients, and carries terabytes of traffic per day [45]. dom neighbor and building an onion routing circuit to The network is comprised of approximately 2 000 relays it.