<<

TorPolice: Towards Enforcing Service-Defined Access Policies for Anonymous Communication in the Network

Zhuotao Liu∗, Yushan Liu†, Philipp Winter†, Prateek Mittal†, Yih-Chun Hu∗ ∗University of Illinois at Urbana-Champaign, †Princeton University ∗{zliu48, yihchun}@illinois.edu, †{yushan, pwinter, pmittal}@princeton.edu

Abstract—Tor is the most widely used network, of circuit creation requests from is a heavy burden currently serving millions of users each day. However, there on Tor relays, causing significant performance degradation is no access control in place for all these users, leaving the for legitimate Tor users (e.g., frequent Tor circuit failures). network vulnerable to abuse and attacks. For example, criminals frequently use exit relays as stepping stones for attacks, Other types of botnet abuse include paralyzing Tor relays via causing service providers to serve CAPTCHAs to exit relay IP relay flooding attacks [4, 5] and performing large-scale traffic addresses or blacklisting them altogether, which leads to severe analysis via throughput or congestion fingerprinting [23, 25]. usability issues for legitimate Tor users. To address this problem, Contributions. In this paper, we present TorPolice, the first we propose TorPolice, the first -preserving access control privacy-preserving access control framework for the Tor net- framework for Tor. TorPolice enables abuse-plagued service providers such as Yelp to enforce access rules to police and work. Leveraging cryptographically computed network capa- throttle malicious requests coming from Tor while still providing bilities, TorPolice enables service providers to define access service to legitimate Tor users. Further, TorPolice equips Tor policies for Tor connections, allowing them to throttle Tor- with global access control for relays, enhancing Tor’s resilience emitted abuse while still serving legitimate Tor users. Thus, to botnet abuse. We show that TorPolice preserves the privacy TorPolice offers a more viable alternative to abuse-plagued of Tor users, implement a prototype of TorPolice, and perform extensive evaluations to validate our design goals. service providers than simply blocking all Tor connections. Further, TorPolice improves the Tor network’s resilience to I.INTRODUCTION various botnet abuses by enabling global access control for Tor relays. Crucially, TorPolice achieves these benefits while Counting almost two million daily users, the Tor network still retaining Tor’s anonymity guarantees. is among the most popular digital privacy tools. As of May TorPolice’s design introduces a set of fully distributed and 2017, the network consists of over 7,000 volunteer-run relays, partially trusted access authorities (AAs) to manage and carrying over 100 Gbps of traffic [2]. Tor clients build a path certify capabilities. To request capabilities from AAs, Tor (also known as Tor circuit) consisting of three relays (guard, clients must first obtain anonymous capability seeds which middle and exit) to reach service providers such as Yelp are types of resources that are costly to scale. Both service or WikiLeaks. Tor is used by law enforcement, intelligence providers and the Tor network provide differentiated service agencies, political dissidents, journalists, whistle-blowers, and to Tor clients that possess valid capabilities so to enforce ordinary citizens to enhance their online privacy [3]. self-defined access rules. The AAs generate capabilities using Today’s Tor network does not implement any access control blind signatures [6] to break the linkability between capability mechanism, meaning that anyone with a Tor client can use requesting and capability spending. We conduct a rigorous the network without limitation. While the lack of access security analysis to prove that TorPolice does not weaken control fosters network growth, it has also caused various privacy guarantees offered by the current Tor network. problems, most importantly botnet abuse [15]. In practice, We implement a prototype of TorPolice to demonstrate botnets use Tor to attack third-party services, spam comment its practicality and evaluate the prototype extensively on our sections on websites, scrape content, and scan services for testbed, in the Shadow simulator [17], and via simulations. Our vulnerabilities [27]. In response, many service providers and results show that TorPolice can effectively enforce service- content delivery networks (CDNs) have started to treat Tor selected access policies and mitigate large-scale botnet abuses users as “second-class” Web citizens [19], by either forcing against Tor at the cost of negligible overhead. Tor users to solve numerous CAPTCHAs or blocking Tor exit relay IP addresses altogether. II.PROBLEM FORMULATION Another type of botnet-related abuse of Tor arises from A. Tor Background command and control (C&C) servers run as Tor onion services Tor clients anonymously connect to service providers (e.g., (used be to known as hidden services) [12, 15]. In the WikiLeaks) by building three-hop circuits consisting of a past, such events caused a rapid spike in the number of guard, middle, and exit relay. Tor’s use of layered Tor clients [15]. Besides the reputational issue of Tor being ensures that each relay only knows the identities of its direct associated with botnet infrastructure, the massive number neighbors (i.e., the previous and next hop in the circuit). 978-1-5090-6501-1/17/$31.00 c 2017 IEEE Clients randomly select these relays, weighted by the relays’ bandwidth and their positions in the circuit. A list of all AA1 AA2 TTP AA Tor relays—the network consensus—is published hourly by … Fully-distributed AAs a set of nine globally-distributed directory authorities that 1. Capability 5. Site-Specific are run by volunteers trusted by . While the Seed 2. Pre-capabilities directory authorities and guard relays learn a Tor client’s Capability Verification i.e., Tor TorPolice-enhanced network identity ( its IP address), they cannot observe the Client Service Provider client’s online activity. Exit relays, however, can monitor the 3. Site-Specific 4. Site-Specific Tor Network client’s activity, but do not know its identity. Tor’s anonymity Capability Computation capability Spending Tor Network stems from unlinking network identity from activity. 4. Relay-Specific Further, Tor allows servers to host their service anony- Tor Capability Spending Vanilla Client Service Providers mously over Tor onion services (OS). Once an OS is set up, it 3. Relay-Specific 5. Relay-Specific creates circuits to at least three relays serving as its introduc- Capability Computation Capability Verification tion points (IPs). Then, the OS publishes its descriptor—which Fig. 1: The architecture of TorPolice. A Tor client (step 1) contains the IPs—to a distributed hash table that consists of anonymously sends its capability seed to an randomly selected a subset of all Tor relays. To connect to the OS, a Tor client AA to request pre-capabilities (step 2), based on which the client first fetches the OS’s descriptor using its onion address, and computes site(relay)-specific capabilities (step 3). The client then then builds two circuits: one to an IP and another one to a spends capabilities on either service access or Tor circuit creation (step 4). The capability recipients validate capabilities before randomly-selected relay called the rendezvous point (RP). The allowing services (step 5). client instructs the IP to send the identity of the RP to the OS, which then creates a circuit to the RP to be able to finally or a trusted third party. Since Tor clients are free to choose communicate with the client. any AA to request capabilities using their capability seeds, no single AA has a global view on all Tor clients. Further, B. Design Goals each AA is only partially trusted and a service provider can TorPolice adds access control to the anonymous communi- blacklist any misbehaving or compromised AA. cation in Tor, benefiting both service providers and the Tor Incrementally Deployable. TorPolice must be incremen- network. Different from prior capability based schemes [20, tally deployable. Up-to-date Tor clients, relays, and service 21, 26, 31], TorPolice’s design needs to address a unique providers can benefit from a partially-deployed TorPolice im- combination of the following three challenges: (i) preserving mediately while outdated entities can continue their operations. Tor’s anonymity guarantees, (ii) avoiding central points of Elided Design Goals. Various attacks seek to break Tor’s control, and (iii) being incrementally deployable. unlinkability. For instance, an AS-level adversary may de- Service-defined Access Policies. Project Honey Pot lists anonymize a Tor user’s activities if the adversary nearly 70% of all Tor exit relays as comment spammers [27], can monitor both ingress and egress traffic [30]. TorPolice is causing many service providers and CDNs to block and filter not designed to mitigate attacks on unlinkability. Instead, we traffic originating from the Tor network. To reduce this tension preserve the unlinkability currently offered by Tor. between Tor users and service providers, TorPolice must allow service providers to define and enforce access rules for Tor C. Adversary Model and Assumptions connections, allowing them to effectively throttle Tor-emitted We consider a Byzantine adversary that deviates from our abuse while still serving legitimate Tor users. protocol and abuses Tor in arbitrary ways. The adversary can Mitigate Botnet Abuse Against Tor. Being a service provider use Tor to abuse third-party services, e.g., by scraping content, itself, the Tor network is also subject to botnet abuse, such comments, and scanning for vulnerabilities. The C&C servers hosted as onion services, and (D)DoS attacks adversary may also abuse the Tor network directly, e.g., by against (select) relays. TorPolice allows the Tor network to using Tor OSes as C&C servers, performing traffic analysis, control the network usage of Tor clients, making it possible or launching (D)DoS attacks against Tor relays. The adversary to throttle the abuse. In contrast to local rate limiting by each may control many bots, and hence a significant amount of relay, TorPolice’s access control mechanism is global, meaning resources. The bots can act passively (e.g., monitor Tor traffic) that an adversary cannot circumvent our defense by simply or actively (e.g., spoof and manipulate packets). We assume connecting to all Tor relays. that the AAs are well-connected to the Internet backbone so Preserving Tor User Privacy. TorPolice must not degrade that volumetric DDoS attacks against the whole set of AAs can Tor’s anonymity guarantees. While we add a new layer of be mitigated. Tor’s existing directory authorities are subject functionality to Tor (access control), this layer—like Tor to the same assumption. In practice, one way to assure this itself—unlinks a client’s identity from its activity. assumption is relying on DDoS prevention vendors [21]. Fully Distributed and Partially Trusted Authorities. In accordance with Tor’s design philosophy of distributing trust, III.DESIGN OVERVIEW TorPolice relies on a set of fully distributed and partially- In a nutshell, TorPolice is a generic access control frame- trusted access authorities (AAs) to manage capabilities. An work based on capabilities. TorPolice enables both service AA is operated either by the Tor Project, a service provider, providers and the Tor network itself to enforce access control on Tor clients to mitigate various types of botnet abuse caused further discuss how TorPolice can incorporate more types of by the lack of access control. To this end, we consider two seeds in § IV-D. One key challenge of using anonymous types of capabilities: site-specific capabilities for accessing capability seeds is to ensure that clients do not have to solve TorPolice-enhanced service providers through Tor, and relay- endless challenges while browsing the web and meanwhile specific capabilities for creating TorPolice-enhanced Tor cir- ensure their activities are unlinkable. TorPolice proposes a cuits. Both types of capabilities are signed by a set of fully- capability renewal protocol to address this challenge (§ V-A). distributed Access Authorities (AAs) that are deployed either Although CAPTCHAs are readily deployable using publicly by the Tor Project, service providers, or trusted third parties. available libraries like ’s reCAPTCHA, TorPolice needs To request capabilities from a particular AA, a Tor client is additional components to support computational puzzles. At a required to possess a capability seed—basically a costly-to- very high level, TorPolice’s puzzle system design is similar scale resource—accepted by the AA. Each AA accepts only to Portcullis [26]. However, TorPolice’s puzzle system does a single type of capability seed. Since Tor clients are free make a great improvement over Portcullis: it can explicitly to choose their AAs, no single AA has a global view on all bound the percentage of CPU cycles that any client can spend Tor clients. TorPolice employs blind signatures [6] to unlink on solving puzzles. Thus, the puzzle system can bring all the requesting and spending of capabilities. When requesting bots down to the percentage that normal clients prefer to capabilities from an AA, Tor clients express what kind of use for puzzle computation, which significantly reduces the capability they request because the issuing process for two computation disparity between the normal clients and bots. capability types differs. An AA maintains separate signing For more details, please refer to our technical report [22]. keys and rate limiters for two capability types. B. Per-seed Rate Limiting Figure 1 illustrates the capability requesting and spending process. While both capability types have in common step one Each AA accepts only one type of capability seed. The and two, the subsequent steps differ. A site-specific capability rate at which a seed can request pre-capabilities is limited. In can only be spent at the service provider specified in the particular, an AA publishes two rate limiters: one determines capability to request service while a relay-specific capability the maximum rate at which a capability seed can request is spent at a specific Tor relay to build a TorPolice-enhanced pre-capabilities used for accessing TorPolice-enhanced service circuit through the relay. A Tor client can use both capability providers and the other one determines the maximum rate at types simultaneously by visiting a TorPolice-enhanced site via which a seed can request pre-capabilities used for TorPolice- a TorPolice-enhanced Tor circuit. In Figure 1, we intentionally enhanced circuit creation. Based on these per-seed rate limiters separate our two capability use cases for clear presentation. published by all AAs, both service providers and Tor can con- figure rules to fulfill their access policies. This paper presents IV. THE ACCESS AUTHORITIES two concrete examples. In § V-C, we elaborate on a design TorPolice relies on fully distributed and partially trusted ac- that enables a site to bound an adversary’s achievable service cess authorities (AAs) to manage capabilities. We assume AAs request rate through Tor using self-defined parameters. In to be honest-but-curious, meaning that they follow protocol, § VI-A, we present a design that allows Tor to prevent botnets but seek to derive additional information about Tor clients. from creating numerous Tor circuits to conduct various abuses. An AA can be deployed by the Tor Project, service providers To improve readability, detailed settings of these rate limiters (e.g., large CDNs like Cloudflare), or third parties. Each AA will be discussed when presenting these access policies. is a conceptually centralized entity, but it can distribute its C. Key Management operations among multiple servers to achieve high availability. Each AA maintains two pairs of keys for signing pre- A. Capability Seeds capabilities, and each of them is dedicated for one capability type. Each AA must publish the public key of both key pairs, AAs expect valid capability seeds from Tor clients to issue for instance, via the Tor network consensus, to ensure other pre-capabilities, which are the basis for deriving spendable entities (e.g., Tor clients, relays, and service providers) can capabilities. For flexibility, we intentionally keep the definition verify the AA’s signatures. An AA can periodically renew of capability seeds broad: any resource that is readily available its keys, but at any time only two key pairs from the AA to Tor users, but costly to scale, can be adopted as capability are valid. After receiving signed pre-capabilities from an AA, seeds. Reasonable choices include proof-of-work schemes Tor clients must verify that the AA uses proper keys before (e.g., solutions to CAPTCHAs or computational puzzles) and using the pre-capabilities for accessing service providers or anonymous monetary resources. TorPolice does not assume Tor. This prevents a malicious AA from using more keys that capability seeds can distinguish bots from humans. Rather, simultaneously to partition the anonymity set. Finally, each botnets can still obtain more capability seeds than legitimate AA is associated with a long-term fingerprint to uniquely Tor users. Instead, TorPolice employs capability seeds as identity the AA, similar to the fingerprint of a Tor relay. a form of anonymous identities that enable both service providers and the Tor network to control access by each client. D. Extending the Access Authorities In this paper, we elaborate on two types of capability seeds Besides Tor, content delivery networks (e.g., Cloudflare or (i.e., solutions to CAPTCHAs and computational puzzles) and Akamai) also have direct incentives to deploy and control their own set of access authorities to mitigate Tor-emitted abuses a universally agreed timestamp to indicate the freshness of while serving anonymous connections. In fact, Cloudflare is the information and F is fingerprint of the selected AA. All working on an independent implementation of a system whose information is blinded [6] by the client to avoid information design goals are similar to our AAs [8], although they focus leakage to the selected AA. on addressing the usability issues for Tor users when visiting The set of information is designed to prevent abuse. In par- Cloudflare-powered websites. ticular, S is used to make the capability site-specific to prevent Finally, semi-trusted third parties such as social network capability double-spending at different sites. The nonce n is operators (e.g., Google, Twitter, and ), may also run added to ensure the uniqueness of each pre-capability, which access authorities (shown as TTP AA in Figure 1) based on pre- in turn ensures the uniqueness of each capability. The Ts agreed terms. To prevent account information leakage to Tor indicates the freshness of pre-capabilities so that expired ones and service providers, Tor users only authenticate themselves are nullified automatically. The client is required to use Tor’s to the social network operators. Service providers or Tor only daily generated fresh random number [13] as Ts such that at learn a single bit of information: whether a Tor client has a any time all valid capabilities have the exact same timestamp. valid account (i.e., capability seed) or not. This design eliminates the possibility of information leakage cased by timestamp abuse. F is added to allow other entities V. TORPOLICE-ENHANCED SITE ACCESS (i.e., clients, Tor relays and service providers) to use correct We now elaborate on the capability design for accessing public keys to verify signatures. TorPolice-enhanced service providers such as websites. To Computation. Upon validation of the client’s pre-capability mitigate the tension between service providers and Tor users, request, the AA computes pre-capabilities using the blinded our key observation is that service providers should not treat information provided by the client. Pre-capabilities computed all connections from one Tor exit relay equally since each by an AA Ai are denoted by PAi . Then we have exit relay is shared by many Tor users. Instead, accountability should be enforced at the granularity of Tor clients so that b b PAi = {S | n | Ts | FAi } | SA , (1) each service provider can throttle malicious Tor clients without i where F is A ’s fingerprint, Sb is A ’s blind signature blocking legitimate Tor users. To this end, TorPolice designs Ai i Ai i b site-specific capabilities that allows a service provider to en- over the set of blinded information {S | n | Ts | FAi } , and force self-selected access rules on anonymous Tor connections. | represents concatenation throughout the paper. Pre-capability Renewal. One key design challenge for pre- A. Pre-capability Design capabilities is to ensure that Tor clients do not have to repeat- Before visiting a TorPolice-enhanced site, a Tor client must edly solve challenges when browsing the web. A strawman first request pre-capabilities from an AA. The client is free to design is that an AA can issue many (i.e., a few hundred) pre- choose any AA based on what capability seed the client prefers capabilities for each solved challenge. However, this strawman to give. To request a pre-capability, the client (i) provides design has at least two shortcomings: (i) it breaks the site- a valid capability seed to its selected AA and (ii) provides specific pre-capability design since the client may not be blinded information for the AA to compute pre-capabilities. able to forecast the sites that it is going to visit so as to The client can hide its network identity from the AA, for provide these blinded information immediately after solving instance, by using Tor. challenges; (ii) the design makes it easier for automated bots Capability Seed Validation. Depending on the accepted to accumulate pre-capabilities, weakening the entire system. type of capability seed, an AA performs corresponding seed To combat these problems, we propose a pre-capability verification. For instance, if an AA accepts proof-of-work renewal protocol. In particular, when a client first presents schemes, it needs to verify that solutions to the presented its challenge solution (i.e., capability seed) to an AA, the AA challenge are correct. Further, an AA needs to ensure that the issues the client an unforgeable pseudonym I = {r | φ} where pre-capability request rates by any capability seed does not r is a random 128-bit nonce and φ is the AA’s signature over exceed the two rate limiters discussed in § IV-B. Since each r. Later on, the client presents I as a proof of validation AA maintains separate rate limiters and signing keys for two when requesting new pre-capabilities from the AA, allowing pre-capability types (i.e., either for TorPolice-enhanced service the client to bypass future challenges. Not only does the site- access or for TorPolice-enhanced Tor circuit creation), Tor specific pre-capability design hold with this design, but also clients must specify the pre-capability type in their requests the AA can account each pre-capability request on a specific (in this section, it is for accessing service providers). In § V-C, solved challenge (i.e., capability seed) to enforce the per- we will explain how a site defines its access policies based on seed rate limiting described in § IV-B. Each pseudonym has a those pre-capability release rate limiters published by all AAs. validation period determined by the AA. Clients with expired Information Required to Compute Pre-capabilities. To pseudonyms are required to solve new challenges to obtain request pre-capabilities, the client provides its selected AA new pseudonyms that are unlinkable to previous ones. the following set of information {S, n, Ts, F}, where S is the Impact of the Pseudonym on Anonymity. Different from domain name of site that the client is going to visit, n is prior pseudonym-based anonymous blacklisting systems [7], a 128-bit cryptographic nonce generated by the client, Ts is in which a user interacts with a service provider using a persistent pseudonym, the pseudonym in our pre-capability requires a new site-specific capability for subsequent service renewal protocol is transient and never presented to service requests. Recall that the pre-capability request rate by each providers and Tor relays. The pseudonym in our protocol is client is limited by the AAs through the per-seed rate limiting only linked with a specific challenge solution served as an design in § IV-B. Thus, together with these rate limiters, it is anonymous capability seed. Since a Tor client anonymously possible for the site to design access policies so as to bound sends its pseudonym to an AA using Tor, the AA cannot a strategic adversary’s service request rate using self-selected link the pseudonym with the client. Further, since all site- parameters, as detailed below. related information sent to the AA is blinded, the AA cannot Policy Definition. Assume the following set of access link the pseudonym with any site access as well. Thus, using authorities {A0, A1, ..., An} are deployed, and each authority pseudonym in our protocol has no impact on user anonymity. accepts one type of capability seed. In this context, the site defines its access policy as {w , w , ..., w } where w is the B. Site-Specific Capability Design 0 1 n i number of service requests allowed by one valid site-specific

After obtaining PAi , the Tor client unblinds the signature capability issued by the access authority Ai. using its secret blind factor to produce the unblinded version We now formulate {w0, w1, ..., wn} mathematically. We de- of the pre-capability, which is the capability spendable at a note the set of capability seeds by {s0, s1, ..., sn} and authority specific site. In particular, C = | n | T | F | S . The S s Ai Ai Aj accepts seed sj. Let cj denote the cost of obtaining a capability C contains a set of unblinded information that allows capability seed sj. We denote the cost of obtaining a network the site to perform capability verification when the client S identity (i.e., IP address) by λ. Let rj denote the maximum rate presents C to access the site, as detailed in § V-C. at which a seed sj can request pre-capabilities (for accessing Employing blind signature is the key to ensure that TorPo- service providers) from authority Aj. Assume that for any lice preserves Tor’s privacy guarantee. First, signatures from client connecting to the site directly without using Tor, the the AAs prevent unauthorized entities from issuing capabili- site allows a maximum service request rate Oe before either ties. Second, using blind signature avoids disclosing any site- blocking the client or forcing the client to solve challenges. related information to the AAs since the blinded information Then to bound a strategic adversary’s service request rate sent to the AAs is unlinkable with the “plain” information by using Tor, the site derives {w0, w1, ..., wn} to ensure that produced by the client. Such unlinkability further ensures the the following condition is satisfied for any set of parameters unlinkability between the client and its capability spending Pn [α0, α1, ..., αn] where αi ∈ [0, 1] and i=0 αi = 1. even if the AAs could collude with the site, which preserves n Tor users’ privacy. We provide a formal security proof in § VII. X αi · λ · ri · wi ≤  · Oe, (2) c C. Site-Specific Capability Spending i=0 i Capability Validation. Tor clients spend site-specific capa- where  is a site-defined parameter. bilities at TorPolice-enhanced sites to request services. Upon Policy Correctness. The parameters [α0, α1, ..., αn] repre- receiving capabilities, a TorPolice-enhanced site first validates sent the adversary’s strategy of purchasing various types of them before subsequent processing. A site-specific capability capability seeds. Thus, if formula (2) holds for any strategy, is valid if (i) it encloses an authentic signature from an AA; the site can guarantee that the maximum Tor-emitted service (ii) it encloses a domain name that is consistent with the site; request rate achieved by an adversary when spending λ on (iii) the capability is not expired (i.e., Ts is the fresh random purchasing capability seeds is no greater than  · Oe. Thus, number released by Tor); and (iv) the capability is not nullified if an adversary that spends a certain amount of resources on by the site. If any of these conditions does not hold, the site obtaining network identities can access the site with rate O rejects this capability to deny access. If a CDN provider (e.g., without using Tor, then the maximum rate that the adversary Cloudflare) processes capabilities on behalf of its powered can request service from the site by using Tor is no greater sites, the second rule is passed as long as the enclosed domain than ·O, given that the adversary spends the same amount of is owned by one of the CDN provider’s customers. In the resources on acquiring capability seeds. Equivalently, in order fourth rule, whether a capability is nullified or not is decided to achieve the same service request rate, the adversary has to by the site’s access policies, as detailed below. spend 1/ times as many resources when launching attacks Site-Defined Access Policies. Once a site-specific capability is through Tor as it spends when launching attacks natively validated, the site accepts the Tor client’s service request. Since without using Tor. To ensure that formula (2) holds for any the major form of Tor abuse is that automated bots use Tor to attacker strategy, we choose conduct various malicious activities against the site [27] (e.g., ci · Oe content scraping, vulnerability scanning, comment spamming wi ≤  · , ∀i ∈ [0, n] (3) and so forth), the site needs to further control the number λ · ri of service requests (e.g., HTTP requests) allowed by each Policy Enforcement. If wi = 1, then each capability is capability. We clarify that each site can have its own definition usable for exactly one service request. The site can enforce this of service requests. Once a Tor client’s service request count by suppressing service requests with duplicate capabilities, for exceeds a threshold, the site nullifies the current capability and example, through the use of a Bloom filter. If wi > 1, then statistically more than one service request should be allowed accepted capability, the site allows only one service request. for each capability. To enforce this, the site stops accepting a VI.TORPOLICE-ENHANCED TOR ACCESS capability with probability 1/wi, and then adds the capability to the duplicate suppressor. However, multiple service requests In this section, we detail the capability design for creating carrying the same capability can trivially be linked by the TorPolice-enhanced Tor circuits. The current Tor network site. We discuss how to address this issue through system suffers from a variety of botnet abuses such as large scale parameterization below. Finally, if wi < 1, then each capability C&C abuse [12, 15], relay flooding attacks [4, 5] and traffic is accepted with probability wi, and exactly one service request analysis [23, 25]. These abuses lead to various bad results, is allowed for each accepted capability. including poor system performance for legitimate Tor users, Policy Parameterization. We now discuss the parameteri- de-anonymization threats and bad reputation for Tor. The root zation of wi. First, to compute wi, the site does not need cause of these attacks is that botnets can create an arbitrary to exactly know ci. Instead, the site simply assigns specific number of Tor circuits without any limitation. Enforcing local weights to all types of capability seeds based on its policies. rate limiting for circuit creation at each relay is unlikely to stop Further, with an ideal parameterization, wi should be exactly these attacks since a strategic botnet can instruct each bot to one since (i) no capability is spendable on more than one enumerate all relays to circumvent the local rate limiting. service request to ensure unlinkability and (ii) no additional With TorPolice, Tor can globally control circuit creations by capabilities are required for a single service request to avoid any client. In particular, when TorPolice is activated, clients extra overhead. However, it is difficult to reach the ideal are required to possess valid capabilities in order to create parameterization since ri is chosen by the authority Ai that TorPolice-enhanced circuits (to be incrementally deployable, is unaware of the site’s configurations  and Oe. In addition, circuit creation requests without valid capabilities are de- configurations may vary among different sites so that an ideal prioritized in case of congestion). Then, by controlling the rate parameterization for one site could be undesirable for others. at which a client can obtain capabilities, the Tor network can To address the above problem, TorPolice sets ri such control the client’s circuit creation rate, as described below. that (with high probability) a Tor client can obtain enough capabilities so that it is feasible for the client to present A. Relay-Specific Capability Design a unique capability for each TCP connection to the site. To create a three-hop TorPolice-enhanced circuit, a Tor This ensures that the client can achieve the highest level of client U needs to obtain three capabilities, each of them being unlinkability offered by Tor, i.e., service providers only see specific to a relay on the circuit. The design of relay-specific TCP connections from Tor exit relays. We clarify that it is the capabilities is similar to that of site-specific capabilities, except client who determines how to spend its capabilities across TCP for the following. (i) During pre-capability requesting, the connections (as described below). The above parameterization client specifies the proper pre-capability type, i.e., it is for Tor- is adopted only to ensure that spending a unique capability enhanced circuit creations. Further, to request a pre-capability for each TCP connection is a feasible strategy for the client. specific to a relay R, the client encloses the fingerprint of relay A reasonable setting of ri can be estimated based on the R (rather than any site domain) in the blinded information sent live Tor measurement in [18], which finds that during a 10- to its selected AA. (ii) Relay-specific capabilities are spendable minute interval, each Tor client opens about 24 web streams. at TorPolice-enhanced relays (not at any sites) for creating Tor- In practice, the authority Ai should enforce ri over a longer enhanced circuits through the relays. The relays first validate period of time (e.g., few hours) to accommodate usage burst. received capabilities (based on a set of rules similar to those Note that when an AA Ak is deployed by the site itself, sys- defined in § V-C) before extending circuits. tem parameterization for Ak is easier since the site determines We clarify that to request pre-capabilities, Tor clients do the rate limiters for issuing pre-capabilities. not have to use TorPolice-enhanced circuits to reach the AAs. Capability Spending by Tor Clients. Given ri, some sites Thus, there is no deadlock for bootstrapping TorPolice. An- may end up with rules wi > 1, i.e., one capability is allowed other alternative is pre-installing few relay-specific capabilities for multiple service requests. In this case, the site needs to on Tor clients so that using TorPolice-enhanced circuits to send a response to indicate whether a capability is nullified bootstrap the system is viable. or not. Tor clients are free to determine their capability Policy Definition. Relay-specific capabilities enable Tor to spending strategies. For instance, a Tor client can send wi enforce access rules for its relays. In this paper, we propose to service requests using the same capability within a single TCP use capabilities to control the circuit creation rate by any Tor connection (due to HTTP keep-alive), which still ensures the client so as to mitigate those aforementioned botnet abuses highest level of unlinkability. Or the client may choose to against Tor. In particular, assume the following set of AAs spend one capability across multiple TCP connections to allow {A0, A1, ..., An} are deployed and authority Ai accepts a type trans-TCP linkability. We note that if a Tor client uses the of capability seed si. In this context, Tor defines its access default setting of Tor Browser, it already allows trans-TCP policies as {q0, q1, ..., qn}, where qi is the maximum rate at linkability since the Tor Browser uses session cookies. For which a capability seed si can request pre-capabilities (for a site that has wi less than 1, it can enforce such policies creating Tor-enhanced circuits) from authority Ai. Then in or- by accepting one capability with probability wi and for each der to bound a Tor client’s circuit creation rate, {q0, q1, ..., qn} should satisfy the following condition for any attacker strategy linkable with the pre-capability P (based on the above worst- Pn [α0, α1, ..., αn] where αi ∈ [0, 1] and i=0 αi = 1. case assumption), designing the algorithm K is equivalent to 0 n designing another algorithm K that enables the AAs and W X αi · λ · qi ≤ T , (4) to link the capability C with the pre-capability P. 3 · c i=0 i In TorPolice’s design, P is the blinded message signed by the AA A˜ (i.e., the blind-signer), and C is the unblinded where λ is the cost of getting one network identity, c is i version of P produced by the client using a secret factor the cost for obtaining one capability seed s and T is the U i unknown to the blind-signer. Thus, the problem of designing maximum circuit creation rate allowed for a client, which is K0 to link P with C is the same as designing another algorithm a parameter controlled by Tor. We note that the constant 3 K00 that allows a blind-signer to link the blinded message it appears in above formula since a standard Tor circuit contains signs to the unblinded message without knowing the secret 3 relays and each of them consumes a relay-specific capability. factor, which is impossible in a blind signature [6]. This To ensure the correctness of formula (4) for any attacker 3·ci·T contradiction proves that the hypothetical algorithm K does strategy, we choose qi ≤ , ∀i ∈ [0, n]. λ not exist, indicating both AAs and W gain only negligible Parameterization. To compute q , the Tor Project assigns i advantages of linking the site access V with client U. certain weight to each type of capability seed. Further, Tor can properly configure T based on the live Tor measure- Similarly, we can prove the following lemma. ments in [18]. In particular, during an 10-minute interval, Lemma 2. Consider any client ∈ . By colluding with PrivCount [18] estimates that a Tor client opens about 4 Tor U NT each other, both the AAs and Tor relays gain only negligible circuits. Thus, the maximum rate T at which one Tor client can advantage over random guessing when trying to link a specific create circuits should be around 4 per 10 minutes. In practice, relay access (i.e., TorPolice-enhanced circuit creation) with U. each AA should enforce qi over a longer period of time (e.g., few hours) to accommodate usage bursts and relay churn. Please refer to our technical report [22] for the capability B. Information Leakage Analysis exchange protocol designed for Tor onion services. Given the above two lemmas, we now analyze the impact of TorPolice on Tor user anonymity. We measure the possible VII.SECURITY ANALYSIS information leakage to an arbitrary service provider W based In this section, we perform a formal security analysis for the on degree of anonymity [9, 28]. Our analysis uses information- impact of TorPolice on Tor users’ anonymity. Let NT denote theoretic entropy [29] as the measure of information contained the set of Tor clients that request pre-capabilities from the in a probability distribution. Recall that NT denotes the set of AAs, and subsequently present capabilities to access service TorPolice-upgraded Tor clients. Given an arbitrary capability- providers or Tor relays. We first present two useful lemmas. enhanced site access (i.e., an access supported by a valid p A. Lemmas capability), W believes that with probability i, the access originates from client i in NT . Thus, W maintains a probability Lemma 1. Consider any client U ∈ NT . By colluding with distribution I for all anonymous accesses. Then, the entropy each other, both the AAs and a service provider W gain only (i.e., the information contained in the distribution I) is defined negligible advantage over random guessing when trying to link P as H = − pi · log (pi). W i∈NT 2 a specific Tor-emitted site access with the client U. Based on the unlinkability proven in Lemma 1, we have 1 pi = , where NT is the size of the anonymity set T . Thus, Proof. We first specify the notations used in the proof. Let NT N the entropy after introducing TorPolice is H = log N . V denote a Tor-emitted site access to W initiated by the Tor W 2 T client U. Note that the definition of a site access is decided by Next, we analyze the entropy before introducing TorPolice. W. Let C denote the service-specific capability that U sends to Let N denote the entire set of anonymous Tor clients. Notice W to support the site access V. Let P denote the pre-capability that the current Tor network protects W from linking a (native) used by U to compute C. site access with a specific Tor client. Thus, given the entire Since the client U can use Tor to connect to the AAs anonymous client set N, the maximum entropy HM is HM = when requesting the pre-capability P, in the ideal case, U log2 N, where N is the size of the anonymity set N. Then, is unlinkable with P. However, to ensure that our lemma still based on the definition in [9, 28], the degree of anonymity d H −H log NT holds in the worst case when Tor’s unlinkability is broken by after introducing TorPolice is d = 1 − M W = 2 . HM log2 N adversaries, we assume the AA A˜ that issues P can link P Anonymous Set Analysis. Given the above anonymity degree, with the client U. Thus, the service provider W and other AAs information leakage is affected by the size of the anonymous can have such linkability as well by colluding with A˜. client set before and after TorPolice is introduced. Thus, once Next, we prove the lemma by contradiction. Assume that all Tor clients are upgraded to support TorPolice, there is the AAs and W can design an algorithm K that enables the no information leakage at all. Thus, eventually, TorPolice AAs and W to link the site access V with the client U. Since completely preserves the privacy guarantee offered by the the site access V is linkable with the capability C (as C is Tor network. To mitigate the one-time privacy issue during presented to the site to support the access V) and the client U is the early deployment phase of TorPolice, the Tor Project can TABLE I: The computational time for capability-related crypto- The CAA Client (CapJS) Site graphic operations. Operation Language Mean (µs) Median (µs) Std. Dev. (µs) Access-Control-Allow-Headers: X-Capability C 232.0 232.0 0.1 b AJAX GET X-Capability: S n s I|{ | |T |F} Generation Python 253.7 253.6 0.3 Access-Control-Allow-Headers: Validate the pseudonym X-Capability JavaScript 27,320.0 27,240.0 245.5

b b AJAX Response data: S n s { | |T |F} |SA C 25.6 25.6 0.0

Unblind the pre-capability Verification Python 32.0 32.0 0.1

HTTP(s) Request X-Capability: S n s JavaScript 355.5 354.3 5.3 | |T |F|SA

Validate the capability C 3.5 3.5 0.0 Response Blinding Python 46.3 46.3 0.1 JavaScript 18.1 18.1 0.3 Fig. 2: Site access by a client with CapJS installed. C 2.4 2.4 0.0 require mandatory client upgrades from a certain time point Unblinding Python 7.0 7.0 0.0 to “force” all active clients to serve as TorPolice initiators. JavaScript 64.8 64.7 6.8

VIII.IMPLEMENTATION proxy (OP) on a Tor client further sends a valid relay- For better readability, we cover the high-level implemen- specific capability to each hop. Each relay first verifies the tation of TorPolice in this section and defer details to our received capability before processing the onionskin carried in technical report [22]. the remaining . To validate our implementation, we test the modified Tor source code in Shadow [17], a safe Capability Implementation. We implement capability-related development environment to run real Tor source code in a computation using C, Python and JavaScript to consider private Tor network. Via log analysis, our test experiments various usage scenarios. For instance, the capability design can show that our implementation properly embeds relay-specific be directly built into the Tor software written in C, or it can be capabilities into the workflow of Tor circuit creation. implemented as a plugin for the Tor browser, which executes capability-related computation in JavaScript. Websites may IX.EVALUATION compute capabilities using any language. Thus, we use Python A. Capability Computation Overhead as an example due to its popularity in web applications. We use This section benchmarks the overhead of capability-related the RSA algorithm to perform capability-related cryptographic computation. All results are obtained using a single 3.30 GHz operations such as blind signing. Intel i3-3120 core. We perform 10, 000 runs to learn the mean, AA Implementation. For an AA accepts CAPTCHAs as ca- median, and standard deviation of the computation times pability seeds (referred to as CAA), we implement it as a web for a single capability generation, verification, information server that deploys Google’s reCAPTCHA service. For an AA blinding and unblinding. Results shown in Table I are obtained accepts computational puzzles (referred to as PAA), it accepts when the RSA key length is 1024. The overall computational puzzle solutions over HTTP requests. These AA servers define overhead is small. For instance, in C, it takes ∼230 µs to a customized HTTP header (X-Capability) to carry TorPolice- compute a pre-capability and ∼25 µs to verify a capability. A related cryptographic tokens such as pseudonyms and pre- blinding and an unblinding operation can be finished in ∼3 µs capabilities. To make the implementation transparent to clients, and ∼2 µs, respectively, in C. The implementations in C and the AA servers add X-Capability in the Access-Control- Python have comparable performance. Although it is more Allow-Headers HTTP header option. Although the PAA can expensive to perform signing and verifying in JavaScript, the be accessed using native HTTP libraries, the CAA needs to overhead of blinding and unblinding operations (performed by be accessed using browsers. Thus, we implement a Firefox Tor clients) in JavaScript is comparable with other languages. add-on (referred to as CapJS) to execute TorPolice-related The AAs, relays and service providers can adopt more efficient cryptographic operations in browsers. We discuss the detailed languages (e.g., C) to perform signing and verifying. design of CapJS in our technical report [22]. TorPolice-enhanced Site Access. To use our capability B. Enforcing Site-Defined Policies scheme, the deployment required at websites is lightweight: In this section, we show that TorPolice enables a site to a site only needs to add X-Capability in its Access-Control- enforce site-defined access policies. Allow-Headers HTTP header option to allow CapJS to Access Policies. For evaluation purpose, we assume that the pass site-specific capabilities in the header. Upon receiving site assigns equal weights to both types of capability seeds, i.e., capabilities, the site verifies them using the rules defined in c0 (for CAPTCHAs) and c1 (for puzzles) in Equation (3) are 0 0 §V-C to fulfill its access policies. Figure 2 depicts a site access the same. However, the actual costs, denoted by c0 and c1, can by a client with CapJS installed on its browser. be different from c0 and c1. Further, base on the measurements 0 TorPolice-enhanced Tor Circuit Creation. We modify the in [10, 24], we assume c0 is close to λ (the cost for obtaining Tor software source code to directly integrate our capability one network identity). design into Tor circuit creation. In particular, besides these We evaluate three strategies that a site may use in its access original cells sent for circuit creation, the modified onion policies. The first strategy (referred to as basic strategy) is a 70 Without TorPolice Consensus on Sep 1st 2013; Without TorPolice Basic Strategy Consensus on May 1st 2017; Without TorPolice Rate Limiting Strategy 60 With TorPolice Consensus on Sep 1st 2013; With TorPolice WFQ Strategy 60 Consensus on May 1st, 2017; With TorPolice 50 ε 50 0.9ε 40 40 0.8ε 30 0.7ε 30 ε 0.6 20 20 0.5ε 10 10 0.4ε Circuit creation failure rate (%) Circuit creation failure rate (%)

Normalized service request rate r 0 0 E0 2E0 3E0 4E0 0 5 10 15 20 25 30 10 100 1000 Adversary’s cost for capability seeds Date The number of bots (thousand) Fig. 3: With TorPolice, a site can bound an Fig. 4: Circuit creation failure rates when Fig. 5: Without TorPolice, an adversary adversary’s service request rate by Θ(). Tor faced a multi-million node botnet C&C can paralyze Tor by cell flooding attacks. E0 is the adversary’s cost at the point of abuse. TorPolice can reduce the average TorPolice can effectively mitigate this vul- diminishing returns. circuit creation failure rate by ∼74%. nerability. that the site accepts all Tor-emitted service requests with the collective Tor-emitted service request rate exceeds rmax. valid capabilities. In the second strategy (referred to as rate General Results. Our further analysis in [22] proves that for limiting strategy), the site enforces a maximum service request any k, ra ≤  if k ≤ 1 and ra ≤ k· if k ≥ 1. Thus, regardless rate rmax for all valid requests. In the third strategy, besides of the actual cost of obtaining capability seeds, the adversary’s rate limiting, the site further performs weighted fair queuing service request rate is bounded by Θ(). This result holds no (WFQ) to serve requests: rather than serving all valid requests matter which strategy the site adopts and how many types of in one FIFO queue, requests with capabilities obtained using capability seeds are accepted. CAPTCHA solutions and puzzle solutions are served in two separate FIFO queues weighted equally. The third strategy C. Mitigating Botnet Abuse Against Tor WFQ strategy (referred to as ) prevents one type of seed from We now perform Tor-scale evaluations to demonstrate (i) overwhelming the other one. TorPolice effectively mitigates large-scale botnet C&C abuse Policy Enforcement. We study an adversary’s service request against Tor by reducing circuit failure ratios by ∼74% rate through Tor when it invests a certain amount of money on (§ IX-C1) and (ii) TorPolice significantly increases Tor’s 0 0 acquiring capability seeds. Define k = c0/c1. We first present resilience against cell flooding attacks (§ IX-C2). evaluation results for k = 0.5 in Figure 3 and then extend our Tor-scale Simulator. We aim to show that TorPolice can discussion to arbitrary k. For any amount of investment, the mitigate the harm that a multi-million botnet can do to Tor. adversary’s service request rate through Tor (denoted by ra) While we have a TorPolice prototype that runs on Shadow [17] is normalized to the service request rate obtained when the (§ VIII), we would run into scalability issues with simulating adversary connects to the site directly without using Tor. millions of Tor clients. Further, Shadow is unable to help 0 0 Since c0