Understanding Usage with Privacy-Preserving Measurement

Akshaya Mani•∗ T Wilson-Brown‡∗ Rob Jansen| Aaron Johnson| Micah Sherr• •Georgetown University ‡UNSW Canberra Cyber, University of New South Wales am3227,micah.sherr @georgetown.edu [email protected] { } |U.S. Naval Research Laboratory rob.g.jansen, aaron.m.johnson @nrl.navy.mil { }

ABSTRACT store data about clients, traffic patterns, sites visited, The Tor anonymity network is difficult to measure because, etc.—information that may otherwise not be collected if not done carefully, measurements could risk the privacy for privacy reasons. Even if the entities collecting this (and potentially the safety) of the network’s users. Recent information are honest, the exposure of the data (e.g., work has proposed the use of differential privacy and secure through machine compromise or compulsion from gov- aggregation techniques to safely measure Tor, and prelimi- ernment authorities) could pose serious privacy risks. nary proof-of-concept prototype tools have been developed Additionally, the dissemination of even highly aggre- in order to demonstrate the utility of these techniques. In this gated measurement information risks users’ privacy since work, we significantly enhance two such tools—PrivCount adversaries could potentially combine this data with and Private Set-Union Cardinality—in order to support the background knowledge (e.g., when a tweet was posted, safe exploration of new types of Tor usage behavior that have ISP logs showing user accesses to the Tor network, etc.) never before been measured. Using the enhanced tools, we to deanonymize users [16]. conduct a detailed measurement study of Tor covering three In addition to user privacy and safety, monitoring major aspects of Tor usage: how many users connect to Tor communication is antithetical to the mission of the Tor and from where do they connect, with which destinations do network. For this reason, the Tor Metrics Portal [43], users most frequently communicate, and how many onion a data repository that archives information about the services exist and how are they used. Our findings include anonymity network’s size, makeup, and capacity, pro- that Tor has ∼8 million daily users (a factor of four more vides only a limited number of statistics, many of which than previously believed) while Tor user IPs turn over al- are based on indirect measurements and assumptions of most twice in a 4 day period. We also find that ∼40% of the Tor client behavior. We demonstrate in this paper that sites accessed over Tor have a torproject.org domain name, some of Tor’s estimates do not correlate well with our ∼10% of the sites have an amazon.com domain name, and own direct observations of user behavior. ∼80% of the sites have a domain name that is included in Other efforts to measure Tor, such as those that record the Alexa top 1 million sites list. Finally, we find that ∼90% packets at Tor’s ingress and egress points, can pose se- of lookups for onion addresses are invalid, and more than rious ethical and legal issues: they rely on unsafe [41] 90% of attempted connections to onion services fail. (and potentially unlawful [40]) techniques that would be difficult to repeat. As a result, such studies have arXiv:1809.08481v1 [cs.CR] 22 Sep 2018 1. INTRODUCTION been met with criticism by the privacy community [40]. Toward safe Tor measurement The Tor network has been in operation since 2003 [18], . There has been an enabling millions of daily users [43] to anonymously ac- exciting recent trend in the literature that proposes dif- cess the Internet. Despite its relative longevity, grow- ferentially private [21] statistics collection techniques ing popularity, and prominence as a privacy-enhancing for anonymity networks such as Tor. Systems such as communication tool, surprisingly little is known about PrivEx [23], PrivCount [27], HisTor [36], and PSC [24] who uses Tor and how they use it. enable useful statistics collection while providing for- mal guarantees about the privacy risks they impose Challenges in Measuring Tor. The primary chal- on the network’s users. However, to date, researchers lenge in conducting measurements on the Tor network have (rightfully) focused on developing these privacy- is that, if not done with extreme care, they can im- preserving measurement techniques rather than on per- pose significant risk to the network’s users. The col- forming exhaustive measurements of deployed anonymity lection of statistical information implies the need to networks such as Tor. ∗Equally credited authors.

1 Our Contributions. This paper presents the largest ethical validity of our study, and describe our experi- and most comprehensive measurement study of the Tor ences with institutional ethics boards and an indepen- network to date, performed using newly proposed differ- dent safety review board. entially private statistics collection techniques. To con- duct our study, we modify Tor and significantly enhance 2. BACKGROUND and improve PrivCount [27] and PSC [24] (described in We begin by presenting a brief overview of Tor and more detail in the following section) to support a larger reviewing the privacy definitions and privacy-preserving variety of measurements, and contribute our improve- measurement tools we use in our study. ments to the respective open source projects. We run 16 Tor relays that contribute approximately 3.5% of Tor’s 2.1 Tor bandwidth capacity, and deploy our measurement sys- Tor provides anonymous TCP connections through tems across these relays in order to safely collect mea- source-routed paths that originate at the Tor client (i.e., surements of the live Tor network. We extend previ- the user) and traverse through (usually) three relays. ous statistical methodology to enable unique-counting Traffic enters the Tor network through guard relays. on Tor, and extend privacy definitions (action bounds) Middle relays carry traffic from guard relays to exit to cover new types of user activity. We also describe relays, where the traffic exits the anonymity network. the challenges and tactics for selecting appropriate pri- These anonymous paths through Tor relays are called vacy parameters to ensure Tor users’ safety, as well as circuits. The unit of transport in Tor circuits is the cell: methods for inferring whole-network statistics given lo- Tor cells carry encrypted routing information and 498 cal observations at our relays. bytes of data [17]. The encryption of application-level Using our improved measurement tools and extended headers and payloads makes it difficult for an adversary techniques, we conduct a detailed measurement study to perform traffic analysis on circuits and learn the end- of Tor covering three major aspects of Tor usage: who points or content of communication. connects to Tor and from where do they connect; how Tor has a three-tiered communication architecture. is Tor used and with what services and destinations do A single circuit may carry multiple streams. A Tor Tor users communicate; and how many onion services stream is a logical connection between the client and exist and how are they used. Findings from our client- a single destination (e.g., a webserver), and is roughly based measurements suggest that Tor has ∼8 million analogous to a standard TCP connection. A request to daily users, which is a factor of four more than previ- retrieve a webpage may produce several streams (e.g., ously believed. We are the first to measure client churn to example.com and ads.example.com) that are mul- in Tor, and we find that Tor user IPs turn over almost tiplexed over a single circuit. Finally, Tor maintains twice in a 4 day period. We also find that ∼40% of the longstanding TLS-encrypted connections between re- sites accessed over Tor have a torproject.org domain lays that are adjacent on some Tor circuit. Communica- name, ∼10% of the sites have an amazon.com domain tion belonging to different circuits that share a common name, and ∼80% of the sites have a domain name that hop between two relays are multiplexed through a sin- is included in the Alexa top 1 million sites list. Fi- gle connection. Similarly, a user’s Tor client maintains nally, we find that ∼90% of onion address lookups fail a single connection to each of its guards, through which because the address is missing or the request is mal- multiple circuits (and streams) may be formed. formed, and more than 90% of attempted connections Tor clients periodically download “directory updates” to onion services fail because the server never completes that describe the consensus view of the network. Clients its side of the connection protocol. Our findings both re- send all user data through one guard by default, but inforce existing beliefs about the Tor network (e.g., that directory updates are obtained through three guards the vast majority of its use is for web browsing) and by default. Tor clients avoiding censorship may connect elucidate many aspects of the Tor network that have to the network through bridges [37], which are guards previously not been explored. In some instances, our whose identities are not public and are disseminated by measurements—which are based on direct observations Tor to users requesting them. of behavior on Tor—suggest that existing heuristically- driven estimates of Tor’s usage (including the number Onion services. Tor allows services to hide their loca- of Tor users) are highly inaccurate. tions through the use of onion services [42]. As a special Limiting the risk to Tor’s users was paramount in (but common) case, when the onion service corresponds both the design and implementation of all of our mea- to a hidden , we refer to it as an onionsite. surements, and a significant contribution of this work To operate an onion service, the operator selects six is the methodology for choosing system parameters to relays to serve as introduction points. It then (1) con- guarantee adequate levels of protection. We discuss the structs an onion service descriptor that contains its precautions we took to ensure user safety, argue for the public key and the identities of the chosen introduc- tion points, and (2) forms Tor circuits to each of the

2 introduction points. The descriptor is then stored on trolled manner such that the result of the query is nearly six or eight relays [9] (depending on Tor version) us- the same for adjacent databases. More formally, a (δ, )- ing a distributed hash table (DHT) [32]. These relays differentially private mechanism [20] is an algorithm M are called onion service directories (for historical rea- such that for all adjacent databases D1 and D2 and all sons, they are also called “hidden service directories” S Range( ), or “HSDirs”). The DHT is indexed by the descriptor ⊆ M Pr[ (D ) S] exp() Pr[ (D ) S] + δ ID, which is derived from the public key. The address M 1 ∈ ≤ × M 2 ∈ of the service is a domain with an .onion suffix, which where δ and  are parameters that can be used to trade is derived from the public key. off between privacy and accuracy. When a user wishes to access the onion service, it queries the DHT to obtain the service’s onion descrip- 2.3 PrivCount tor, which includes the identities of its introduction points. It also chooses a relay to serve as a rendezvous PrivCount [27] is a distributed measurement system point (RP), and constructs a circuit to this RP. Using that provides (, δ)-differentially private statistics about the public key in the onion descriptor, the user encrypts the Tor network. its choice of the RP and sends it via a Tor circuit to one A PrivCount deployment consists of three compo- of the onion service’s introduction points. This choice is nents: a tally server (TS), at least one data collector then forwarded to the onion service, using the existing (DC), and at least one share keeper (SK). In an ex- circuit between the introduction point and the service. ecution of PrivCount, the TS, DCs, and SKs act col- Finally, the onion service constructs a circuit to the RP, lectively to report (noisy) counts of the events that through which it can communicate with the user. were observed during a collection period. PrivCount Importantly, Tor conceals the network locations of supports both single number queries (e.g., “how many both the user and the onion service. The communica- clients connected during the collection period?”) and tions between the user and the introduction point, the multiple-counter queries (e.g., “how many Tor connec- tions went to Google, Amazon, Facebook during the user and the RP, the onion service and the introduction { } point, and the onion service and the RP are all carried collection period?”) in which the “bins” (i.e., counters) over Tor circuits. are independent. PrivCount is shown to provide (, δ)- differential privacy if (1) at least one SK is honest. 2.2 Differential Privacy 2.4 Private Set-union Cardinality (PSC) Differential privacy [21] is a privacy definition that PrivCount provides a safe method of counting ob- provides strong guarantees about how much is revealed served events across a set of Tor relays. A limitation of by answering database queries. Differential privacy is PrivCount (and one shared by other privacy-preserving defined in terms of pairs of adjacent databases and guar- measurement systems [23, 36]) is that it cannot deter- antees that query responses cannot be used to distin- mine the count of distinct values among the DCs. For guish such pairs. Typically, databases that differ only example, while PrivCount can answer queries such as in a single user’s data are considerd adjacent, and then “how many client connections were observed?”, it can- the definition ensures that query responses don’t reveal not answer “how many unique clients connected to Tor?” much about any user. To answer questions about the count of distinct val- Differential privacy has previously been proposed as a ues, we use the private set-union cardinality (PSC) pro- means to safely study the Tor network [23,24,27,36]. In tocol introduced by Fenske et al. [24]. In PSC, each DC adjacency existing work, as in this study, is defined by k maintains a set of items k. PSC computes the cardi- network activity rather than by user. For example, in I S nality of the union of those itemsets; i.e., k k +noise. its study of visits to censored sites, PrivEx defines ad- PSC achieves (, δ)-differential privacy, and| doesI | not ex- jacency as differing by at most six exit connections per pose any element of any itemset to an adversary. hour [23]. Under this definition of adjacency, differential The participants in PSC include one or more DCs privacy’s guarantees apply to connections rather than- and one or more computation parties (CPs), the latter users. We also adopt this model and consider privacy of which perform a series of noise additions, verifiable protections over particular user activities (for example, shuffles, re-randomizations, and aggregation to compute visiting onion sites) conducted over a fixed length of the final cardinality (plus noise) of the union. time rather than on a per-user basis. The latter is dif- ficult to achieve, since conceptually, a single user (e.g., a botnet) could constitute a significant fraction of the 3. METHODOLOGY activity on the network. Conducting safe measurements on Tor requires care- Differential privacy is typically achieved by adding ful experimentation design, configuration, and deploy- noise to query results. This noise is added in a con- ment. In what follows, we describe the methodology we

3 employ to collect and analyze measurements of the Tor thetical individual whose activity constitutes the ma- network. jority of network activity, which would necessarily yield inaccurate results. Fortunately, reasonable Tor activity 3.1 Deployment by an individual should for most actions constitute a We use several Tor relays and both PrivCount and small fraction of the total network activity, and so we PSC to conduct Tor measurements. We first enhanced can provide privacy while providing accurate network the PrivCount version of Tor1 so that the PrivCount measurements. events that Tor emits include additional information Our publishing mechanisms formally apply (, δ) dif- about connections, circuits, and streams as necessary ferential privacy to the space of inputs that includes to answer our research questions. We also add new all possible network traces in a given time period. In- events that report onion service directory usage infor- puts are considered “adjacent” if they differ only in the mation [9,10]. We ran 16 Tor relays with our enhanced activity of a single user within a given length of time, version of Tor (6 exit relays and 11 non-exit relays). and all differences in that activity stay within a set of We then enhance PrivCount to support the new in- action bounds. We use 24 hours as the length of time formation it receives from Tor, and to support several defining adjacency, and our action bounds are shown in new counter types as necessary for our study. Most Table1. One way to understand the privacy guarantee significantly, we add support for counting set member- that results is that, for any two reasonable sequences of ship using PrivCount histograms which we use to mea- network actions that a user could perform in a 24-hour sure domain distribution (§4), geopolitical distribution period, including nothing at all, they most likely can- in (§5), and onion site distribution in (§6). We also sig- not be distinguished based on the statistics published nificantly extend PSC: we slightly modify the original by our mechanisms. PSC design to include a TS to coordinate the actions of We derive the action bounds by considering reason- the DCs and CPs, we engineer PSC to collect the Priv- able activities that a Tor user might perform over 24 Count events emitted by our relays, and we add sup- hours. In this analysis, we consider protecting the pri- port for measuring the number of unique domains (§4), vacy of both Tor clients and onion services. We take into clients (§5), and onion sites (§6). All of our enhance- account how certain type of user actions would trans- ments have been merged into the PrivCount and PSC late into observable actions on the Tor network, such as open-source projects. creating new circuits or sending data cells, based on our We set up a PrivCount deployment containing 1 TS, understanding of the Tor protocol and software, taking 3 SKs, and 16 DCs (one for each Tor relay), and we set into account features such as caching and how streams up a PSC deployment containing 1 TS, 3 CPs, and 16 are assigned to circuits. We consider several types of DCs. We use PrivCount and PSC to repeatedly mea- common Tor activities in this analysis: web browsing sure Tor in 24 hour periods, where each period focuses with Tor Browser, chat with the P2P onion on measuring a small set of statistics. During some service [6], and running a Web server as an onionsite. measurement periods, we use only the subset of the DCs We compute the amount of network actions that re- and relays that are in a position to observe the events sult from reasonable amounts of these activities and use of interest in order to reduce operator overhead and the maximum to define the action bound. The activity reduce the likelihood of failure. Apart from the cases that provides the maximum value defining each bound where server(s) was temporarily unavailable, the num- is shown in the final column of Table1. ber of CPs/ SKs we use is greater than or equal to the For example, we choose to protect connecting to 20 number of relay operators in all our measurements. To domains through an exit circuit, which would protect maintain our privacy guarantees, PrivCount and PSC browsing two new for each of 10 hours per measurements are never conducted in parallel, and we day, as the other activities (i.e., chat and onionsites) always enforce at least 24 hours of delay between any would not create domain connections. This analysis sequential measurement of distinct statistics. also allows for additional page loads within the same site, as they would be assigned to the same circuit and 3.2 Privacy thus would not be measured as a new domain connec- We protect the privacy of individual Tor users using tion. Note that certain observable actions apply to all the methodology developed with PrivCount [27]. This Tor activities, such as creating a TCP connection to a approach applies differential privacy to protect a certain guard, and thus have no defining activity. − amount of network activity within a certain length of We use privacy parameters  = 0.3 and δ = 10 11. time. Limiting the application of differential privacy to The value of  is the same one used by Tor to protect bounded amounts of activity is required for producing its onion-service statistics [30]. We choose δ such that accurate results, as otherwise we must protect a hypo- if there are n Tor users, then δ/n is still small, which ensures that each user is simultaneously protected [22]. 1https://github.com/privcount/tor

4 Table 1: Action bounds for measurements. entire network can be a challenge. Unlike with standard counts, we need to know if the same items in our sample Daily Defining Action are seen elsewhere or not. For example, when counting bound activity unique domains it could be that the domains we count Connect to domain 20 domains Web are each visited very frequently and thus are observed Send or receive 400 MB Web exit data by all relays, which would indicate that our sample Connect to Tor 1 day: 4 IPs count is the same as the network-wide count. In several N/A from new IP address 2+ days: 3 IPs cases, we can handle this using other information we Create TCP 12 connections N/A have about the frequency distribution of the observed connection to Tor Create circuit items. For example, there is evidence that domains are 651 circuits Chat through entry guard visited following a power-law distribution [13, 33]. We Send or receive 407 MB Web also make some additional measurements to help deter- entry data Upload descriptor 450 uploads Onionsite mine likely parameters for these distributions. We can Upload descriptor of then construct confidence intervals for the network-wide 3 addresses Onionsite new onion address unique counts by considering the probability of our lo- Fetch descriptor 30 fetches Onionsite cal counts given different possible values for the overall Create rendezvous 180 connections Chat connection count. We use Monte-Carlo simulations to determine Send or receive Web or these probabilities for more complicated distributions. 400 MB rendezvous data onionsite Note that in some cases we cannot identify a likely fre- quency distribution. In these cases, with an observed

−6 −5 value of x and a fraction p of all observations, we sim- For example, with n = 10 then δ/n = 10 . ply present the range of likely network-wide values to 3.3 Statistical Analysis be [x, x/p], where the lower end of the range covers the possibility that each item is frequently and the upper Our measurement techniques do not give us a com- end covers the possibility of infrequent observations per plete and exact view of the total amount of activity item. in the Tor network. Because we make measurements from a small subset relays, we only observe a sam- 4. EXIT MEASUREMENTS ple of Tor’s overall activity. In addition, our privacy- preserving techniques intentionally include some ran- In this section, we present results and analyses of exit dom errors in the measurement. We use statistical tech- measurements taken from relays in a position to observe niques to overcome these limitations. Tor’s egress traffic. The count measurements (i.e., those made using Priv- Count) include noise generated according to a normal 4.1 Overview distribution with mean zero and known variance. Thus Recall from §2.1 that Tor clients build circuits through for those measurements we compute confidence inter- a sequence of Tor relays over which they multiplex mul- vals (CIs) that include the true value with 95% prob- tiple streams. Each stream is associated with a TCP ability. Moreover, for many values we wish to infer a connection between an exit relay and a user-specified network-wide total from our sample. We do so by divid- destination and is used to transfer data between that ing the reported values (and their CIs) by the fraction of TCP connection and the client. For each stream, the the observations that our measuring relays make. For client specifies a destination hostname or IP address example, we measure 32 million streams using relays which the exit relay requires in order to create a TCP that comprise 1.5% of the exit weight and including connection on the client’s behalf. added noise with a standard deviation of 3.1 million. To better understand which web domains Tor users We infer (3.2e7 6.2e6)/0.015 = 2.1e9 4.1e8 streams frequently visit, we instrument PrivCount and PSC to in the entire network,± where the range± provides a 95% count the initial streams that are created on circuits confidence interval. (i.e., the first stream for each circuit) given that a host- The unique-count measurements (i.e., those made us- name was provided in the stream by the client and ing PSC) include noise generated according to a bino- the destination port requested by the client is a web mial distribution with known parameters. In addition, port (either 80 or 443). We focus on initial streams hash-table collisions (which can occur within PSC’s in- because the Tor Browser uses a new circuit for each ternal data storage [24]) can cause the measured value unique domain shown in the browser address bar. The to be smaller than the true value. We adjust for these initial stream on a circuit will therefore most directly errors by computing 95% confidence intervals using an indicate the user’s intended destination, whereas sub- exact algorithm based on dynamic programming. sequent streams that are created when loading a page Extrapolating unique counts from our sample to the to fetch embedded resources (e.g., images, scripts, etc.)

5 Initial Hostname Total Subsequent IPv6 Hostname Web provided in these streams. Initial IPv4 Other 109 108 108 × × × 4.3 Exit Domains 2.5 1.50 1.50 1.25 1.25 2.0 We now describe the results from our measurements 1.00 1.00 of the number of domains observed in initial streams 1.5 0.75 0.75 that also provide a hostname and a web port. To ease 1.0 0.50 0.50 presentation, we refer to the measured values as primary 0.5 0.25 0.25 domains or domains in the remainder of this section.

Number of0 Streams. (240 hrs) Number of0 Streams. (2400 hrs) Number of0 Streams. (2400 hrs) Across our measurements, we construct sets of domain 0 1 0 1 0 1 (a) (b) (c) names and increment a counter for a set whenever we Figure 1: The number of streams of various types over 24 hours observe a primary domain that matches a domain name in Tor inferred from our exit relay observations, including 95% in that set. confidence intervals. In each sub-figure, the left bar shows the category total and the right stacked bar shows the breakdown of Alexa Top Sites. We use the Alexa top 1 millions sites that category into more specific subcategories. list [2] to help us understand which sites are visited by Tor users. We conducted two related PrivCount mea- surements. In the Alexa rank measurement, we sorted provide less useful indications of user intent. Second, the sites by rank and split them into six sets of increas- we focus on streams that provide hostnames rather than ing size: set i = 0 contains the first 101 sites and set IPv4 or IPv6 addresses because hostnames can be mapped i > 0 contains the first 10i+1 sites excluding those in to web domains much more easily than IP addresses. Fi- set i 1. We used a separate set for torproject.org since − nally, we are interested in web domains and so we do not early measurements revealed a significant number of ac- measure domains on streams that request connection to cesses to that domain. The Alexa rank measurement destination ports not traditionally associated with web was conducted between 2018-01-31 and 2018-02-01, dur- content. ing which our combined mean exit weight was 2.2%. In the Alexa siblings measurement, we created a set 4.2 Exit Stream Analysis for each of the top 10 sites in the Alexa list. For each We first measure and analyze the breakdown of such site we stripped the top level domain to produce streams that provide a hostname and web port, com- a site basename (e.g., google), and then added all en- pared to other types of streams. These measurements tries from the top 1 million sites list that contained the provide useful Tor usage information and also provide basename into the corresponding set. We also used dis- context about the overall fraction of Tor exit traffic that tinct sets for (rank 342, the default search our domain measurements will cover. engine in Tor Browser) and torproject (rank 10,244, de- Our PrivCount stream measurements were conducted veloper of Tor Browser). As a result of this process, the between 2018-01-04 and 2018-01-05 during which the google set (rank 1) was the largest (212 sites, includ- mean combined exit weight of our measurement relays ing the rank 7 site google.co.in), while the reddit (rank (taken over the consensuses that were active during 8) and qq (rank 9) sets were the smallest among top 10 the measurement) was 1.5% of the total available exit sites containing 3 sites each (duckduckgo and torproject weight in Tor. We infer the number of streams of var- contained 1 site each). The Alexa siblings measurement ious types over a 24 hour period in the entire Tor net- was conducted between 2018-02-01 and 2018-02-02, dur- work using the observations from our relays and our ing which our combined mean exit weight was 2.1%. fractional weight. The inferred values are shown in Fig- The results from the Alexa rank and Alexa siblings ure1 with 95% confidence intervals. Figure 1a shows measurements are shown in Figure2. We highlight that there are about 2 billion exit streams created in three observations from these results. First, we ob- Tor in 24 hours, and only about 5% of those streams served torproject.org in 40.1% (CI: [39.9; 40.3]%) and represent a circuit’s first stream. Figure 1b shows that 39.0% (CI: [38.8; 39.2]%) of primary domains. Sur- an insignificant number of these initial streams include prised by this result, we conducted additional measure- an IPv4 or IPv6 address: our measured values for these ments and observed the domain onionoo.torproject.org stream subcategories were negative (due to the added in 43.4% (CI: [43.1; 43.7]%) of primary domains. Onionoo noise) and therefore the most likely value of the counters is a web service that provides other applications with is zero. Figure 1c similarly shows that an insignificant access to Tor network status information. We contacted number of initial streams containing a hostname target the Onionoo maintainers from , but we a non-web port (i.e., a port other than 80 or 443). Our were unable to identify a clear reason for the high num- results confirm that almost all initial streams provide ber of accesses to Onionoo from Tor. hostnames and target a web port; the domain measure- Second, we observed sites in the amazon siblings set ments that we describe next will focus on the hostnames for 9.7% (CI: [9.5; 9.9]%) of primary domains, and sites

6 50 47.8 Sites in Rank Set 40.1 44.141.5 37.2 All Sites torproject.org

26.6 40.4 26.1 25 21.7 Alexa Top 1 Million Sites torproject.org 8.4 6.2 7.0 Log Scale 5.1 4.3 Alexa Rank 0 7.7 101

(0,10] other 7.9 (10,100] (100,1k] (1k,10k] (10k,100k](100k,1m] 50 48.1 Sites in Siblings Set 39.0 5.0 torproject.org 2.8 25 2.4 Linear Scale Primary Domain Count (%) Primary Domain Count (%) 9.7 1.1 0.7 0.5 0.4 0.40.4 0.50.4 0.5 2.4 0.3 0.2 0.2 0.2 0.1 0.30.2 0.1 0.1 0.3 0.0 0.2 0.1 0.4 Alexa Siblings 0.0 0.0 0.0 0 0.0 0.0 0 1.1 0.0

.org .net .br .cn .de .fr .in .ir .it .jp .pl .ru .uk qq (9) other .com other google (1) baidu (4) yahooreddit (6) (8) torproject youtubefacebook (2) (3)wikipedia (5) amazonduckduckgo (10) Top Level Domains (*.tld)

Figure 2: Results of PrivCount measurements of the frequency Figure 3: Results of PrivCount measurements of the frequency of membership of primary domains in subsets of the Alexa top 1 of membership of primary domains in top-level domain subsets million sites list. The top plot shows subsets containing domains constructed from all sites and from those sites present in the that are sorted by Alexa rank, and the bottom plot shows subsets Alexa top 1 million sites list. containing domains of each of the top 10 sites as well as their siblings.

which all top-level domain (TLD) names that are con- tained in more than 104 entries in the Alexa top 1 mil- in the google siblings set for 2.4% (CI: [2.2; 2.6]%) pri- lion sites list also appeared in the observed primary do- mary domains. To further explain the unexpectedly mains. The measured TLDs include three main TLDs high amazon result, we conducted an additional mea- (.com, .org, and .net) as well as 11 country-specific surement during which we observed www.amazon.com TLDs. We conducted two related PrivCount TLD mea- in 8.6% (CI: [8.3; 8.9]%) of primary domains. We con- surements, one in which we count the TLDs of all pri- tacted Amazon employees but again were unable to find mary domains using wildcards (conducted between 2018- the reason for the relatively high number of accesses 02-02 and 2018-02-03 with 2.4% exit weight), and one from Tor (here, due to lack of response). in which we count the TLDs of only those primary do- Third, we observed that domains in the Alexa top mains that appear in the Alexa list (conducted between sites list accounted for about 80% of all primary do- 2018-01-30 and 2018-01-31 with 2.3% exit weight). We mains accessed from Tor, and that the smaller but more measured torproject.org using a separate counter dur- highly ranked domain sets have roughly equal frequency ing the Alexa measurement, but our implementation as the increasingly larger but more lowly ranked sets. of wildcard matching restricted us from doing so when We conclude from these results that the Alexa top sites measuring all sites. list provides a reasonable representation of destinations The TLD measurement results are shown in Figure3. visited by Tor users. The Alexa top sites list is often Unsurprisingly, the three main TLDs make up the ma- used as a destination model in Tor research, and most jority of the primary domains accessed by Tor users, significantly in Tor website fingerprinting research [28]. though we note that .org would account for a signifi- Our results provide confirmation that using the top sites cantly smaller fraction of domains without torproject.org. list as part of Tor research is appropriate. We found that the Russian TLD .ru accounted for 2.8% Alexa Categories. We conducted a PrivCount mea- (CI: [2.6; 3.0]%) and 2.4% (CI: [2.2; 2.6]%) of country- surement of primary domains by category (e.g., news, specific TLDs, the largest observed by our relays. We science, sports, etc.) using Alexa category lists [3] which also found that ∼11% of .com, ∼3% of .org, and ∼4% are limited to 50 sites per category. The measurement of .net sites accessed in primary domains are not in was conducted between 2018-01-29 and 2018-01-30, dur- the Alexa top sites list. Only 7.9% (CI: [7.7; 8.1]%) of ing which our mean combined exit weight was 2.1%. TLDs from all primary domains did not match one of Additional insights from the measurement beyond those the TLDs that we measured, which increased to 26.1% already presented are limited: the category containing (CI: [25.9; 26.3]%) of TLDs for primary domains in the amazon.com accounted for 7.6% (CI: [7.4; 7.8]%) of pri- Alexa top 1 million sites list. mary domains while 90.6% (CI: [90.3; 90.9]%) category Second-Level Domains. We conducted two mea- (torproject.org was not in any of the categories). surements of the number of unique second-level domain Top-Level Domains. We measured the frequency with (SLD) names that we observed in Tor primary domains.

7 Table 2: Locally observed unique second level domain statistics We first measure the number of client connections measured using PSC. using our PrivCount deployment. Over the 24 hour measurement period beginning on 2018-04-07, the mean Statistic Count 95% CI probability of selecting our relays in the entry position SLDs 471,228 [470,357; 472,099] was 0.0144. Since clients typically choose their guards Alexa SLDs 35,660 [34,789; 37,393] according to guard weights, we can project our local PrivCount measurements to infer Tor-wide results by During both these measurements, we use only 5 out of dividing our observed counts by 0.0144. We summarize our 6 exit relays (1.24% mean exit weight) in order to our results in Table4. reduce operator overhead. We summarize our results in The amount of daily data being transferred across Table2. In the first measurement, we measured the Tor is 517 TiB (CI: [504; 530] TiB), which represents number of unique SLD names among those Tor pri- the sum of client uploads and downloads. We note that mary domains which contain a TLD in the public suf- this includes Tor cell overheads, so the actual amount fix list [5]. From a measurement conducted between of client payload data transferred would be 2-3% less. 2018-03-31 and 2018-04-01, we find that 471,228 (CI: We find that there are about 148 million (CI: [143; [470,357; 472,099]) unique SLDs were accessed through 153] million) client connections, through which 1,286 our exits. We also measured the number of unique million client circuits (CI: [1,246; 1,326] million) are Alexa top 1 million SLDs (i.e. the SLDs of Alexa top 1 multiplexed, per day, across the Tor network. This is million sites) among those Tor primary domains. From significantly higher than the 80.6 million client connec- a measurement conducted between 2018-03-24 and 2018- tions reported by Jansen and Johnson [27] two years 03-25, we infer that 35,660 (CI: [34,789; 37,393]) of such ago. We suspect that the discrepancy is due to ongoing unique SLDs were accessed through our exits. From distributed denial-of-service (DDoS) attacks that began these results, we find that the unique count of accessed affecting the Tor network back in December 2017 [25]. SLDs is more than ten times that of the unique count Tor Clients. We next consider the number of unique of Alexa top one million sites. From our measurements, clients access the Tor network. Here, we use PSC to we conclude that a long tail exists in the distribution of safely capture the approximate number of clients ob- sites accessed over Tor. served by our guard relays. Unlike existing work that To extrapolate the unique second-level domain mea- also attempts to quantify the number of Tor users [15, surements for the entire Tor network, we need to know 27, 35, 38], we do not store (even temporarily) IP ad- the distribution of SLDs as seen by the exits. From dresses since PSC uses oblivious counters. Similar to previous research studies, we know that domains are previous studies, we assume that there is a one-to-one visited following a power-law distribution [13, 33]. But mapping between client IPs and unique Tor clients, al- we need additional measurements to determine the ex- though this may be violated, for example, by mobile ponent of this power-law distribution. Therefore, we users with changing IP addresses or by clients behind perform a series of simulations of clients visiting ran- Network Address Translation (NAT). In addition, we dom destination sites (based on power-law distribution count bridges as clients, as their identities are private. with random exponents) and construct confidence inter- We use PSC to collect data from our relays that have vals for the network-wide unique SLD counts. We use a non-zero guard probability, for a 24 hour period be- the locally observed unique SLDs count as a self-check. ginning on 2018-04-14 during which our guards had a This method appears to work well for the unique combined weight of 1.19%. Measurement results are re- Alexa top 1 million SLDs. The results of 100 simula- ported in Table5. We observe over 313 thousand unique tions reveals an inferred network-wide unique count of client IPs using our guards (this experiment used two 513,342 (CI: [512,760; 514,693]) accesses to the Alexa CPs due to the temporary unavailability of one). This is top 1 million list. That is, slightly more than half of the a surprisingly high number, given that Tor Metrics [43] Alexa top sites are accessed over Tor over 24 hours. Un- (using a very different and unvalidated estimation tech- fortunately, the inability to closely fit SLD accesses to nique) reports 2.15 million clients per day in April 2018 a distribution prevented us from using this approach to for the entire Tor network. We would expect a typical extrapolate the number of network-wide SLD accesses. client to connect to three guards (clients currently use one guard for data but two additional guards for direc- 5. CLIENT MEASUREMENTS tory updates [8]), and so we would expect to see closer to 0.0119 3 2.15e6 = 76, 755 unique clients IPs. To We conducted a number of measurements to better × × understand who uses Tor and to determine how many the extent that each unique IP observed in a 24-hour users access the network. period represents a new user, this suggests that Tor is underestimating its total number of users by a factor 5.1 Tor Connections and Clients of 4, and the total number of daily users is closer to

8 313, 213/0.0119/3 8, 773, 473. Table 3: Network-wide promiscuous clients and client IPs To better understand≈ the network-wide number of unique client IPs in Tor, we perform additional PSC Guards per Promiscuous Network-wide client clients 95% CI client IPs 95% CI measurements using sets of relays of different sizes. We can then compare how the count grows with the mea- 3 [15,856; 21,522] [10,851,783; 11,240,709] 4 [15,129; 21,056] [8,195,072; 8,493,863] suring set to its predicted growth under different client 5 [14,428; 20,451] [6,605,713; 6,849,612] models to identify the most likely model. In particular we would like to validate a model of how many guards each client contacts, which we expect to typically be Table 4: Network-wide client usage statistics, inferred from Priv- 3, but could be less or more for several reasons, includ- Count measurements. ing guard/exit conflicts, multiple clients behind a single Statistic Count 95% CI NAT IP, Tor bridges serving many clients, and Data (TiB) 517 ± [504; 530] instances [12]. We made two 24-hour measurements, Connections (×106) 148 ± [143; 153] one starting on 2018-05-12 using DCs with 0.42% of Circuits (×106) 1,286 ± [1,246; 1,326] the guard weight and the other starting on 2018-05-13 using a disjoint set of DCs with 0.88% of the guard weight. We counted 148,174 (CI: [148, 161] thousand) seem unlikely to make up this population, as this range and 269,795 (CI: [269, 315] thousand) unique client IPs is much larger than the 1640 bridges reported by Tor during these measurements, respectively. Observe that Metrics at the time of our measurements, and in April the latter number is significantly smaller than we would 2016 Matic et al. [37] only discovered 50% more bridges expect if client contacted only one guard, in which case than reported by Tor Metrics. Table3 also shows con- we would predict 148, 174 (0.88/0.42) 310, 460. This fidence intervals for the number of network-wide client × ≈ indicates that indeed client IPs typically connect to IPs, calculated for each g as the union of CIs over all p in multiple guards. the range shown. For what we believe is the most likely To identify the number of guards that typical client number of guards per selective client (g = 3), our mea- IPs connect to, we consider a range of guards per client surements indicate about 11 million unique client IPs and analyze the resulting consistency with our measure- network-wide, which suggests a true value over 5 times ments. Let g denote the number of guards each client the number of users estimated by Tor Metrics. Even for connects to in the model. We use simulation to extrap- g = 5, the range suggests over 3 times as many daily olate from our two measurements two separate confi- users as Tor currently estimates. dence intervals for the network-wide unique client IP Client churn. Tor clients may go offline, and we expect disjoint counts. We discover that these CIs are unless that the Tor network experiences a high degree of client g is in the range [27, 34]. This is a much higher value churn. Using PSC, we recorded the count of unique of g than should be the case for typical clients, and we client IPs over a four-day period beginning on 2018- believe that this indicates that this is a poor model of 05-15. We observed 672,303 (CI: [671,781; 1,118,147]) how clients connect to guards. It does imply, however, unique client IPs. Comparing this result with our one- that there are some clients IPs that are connected to day measurement (see Table5), we can conclude that many more guards than is typical. the client churn rate is 119,697 (CI: [119,581; 247,268]) Thus we consider a refinement on this model in which client IPs per day. We can infer similar network-wide all a set of “promiscuous” clients connect to guards in behavior i.e.client IPs turn over almost twice over a a 24-hour period, and the remaining “selective” clients four-day period. connect to g guards only. The promiscuous clients cap- ture the likely behavior of Tor bridges, tor2web clients, 5.2 Client Composition and Diversity and clients behind a NAT IP. We then determine a range for the number p of promiscuous clients by consider- We additionally explore the geopolitical distribution ing values of g that are likely given our understanding of Tor clients. We resolve each client IP address to of the Tor protocol: g 3, 4, 5 (all clients should its host country using the MaxMind GeoLite2 coun- connect to 3 guards for directory∈ { updates,} some clients try database, and use PrivCount to count the num- may connect to 1 or 2 more due to guard churn). For ber of client connections, bytes transferred, and cir- each of these values of g, there were some values for cuits created, broken down by country. For most of the number p of promiscuous clients that was consis- the world’s 250 countries, we were unable to determine tent with our two client-IP measurements, that is, that useful counts since the added noise (to achieve differen- yielded a non-empty intersection of the two CIs. Table3 tial privacy) overwhelm the actual count. We show the shows these ranges, which indicates about 14–22 thou- countries with significant inferred counts in Figure4. sand promiscuous clients. Observe that bridges alone The United States (US), Russia (RU), and Germany (DE) have both the greatest client connection counts

9 107 1014 108 3.5 × × × 1.4 3.5 3.0 1.2 3.0

2.5 1.0 2.5

2.0 0.8 2.0

1.5 Bytes 0.6 1.5 Circuits

Connections 1.0 0.4 1.0

0.5 0.2 0.5

0.0 0.0 0.0 US RU DE UA FR VE NA NZ BV CA US RU DE UA GB FR CA SC MX IM BR SK ES AR SE US FR RU DE PL AE CA ES VG PR NI BM NL SS Figure 4: Tor per country client usage statistics inferred from PrivCount measurements - the counts of client connections (left), amount of client data transferred (center), and the counts of client circuits (right). Error bars show 95 percent confidence intervals.

Table 5: Locally observed unique client statistics measured using effect of noise, we average the country counts between PSC. two consecutive one-day measurements, beginning on 2018-05-09. We find that clients from 203 (CI: [141; Statistic Count 95% CI 250]) different countries access Tor. Comparing our re- IPs 313,213 [313,039; 376,343] sults to earlier studies from McCoy et al. [38] (2008) and Countries 203 [141; 250] ASes 11,882 [11,708; 12,053] Chaabane et al. [15] (from 2010) that report clients from approximately 125 countries, we can conclude that the IPs (4-day) 672,303 [671,781; 1,118,147] Churn 119,697/day [119,581; 247,268]/day locations from which Tor is accessed has significantly diversified in the past decade. We were unable to extrapolate the unique country and data transfer amounts (see Figure4, left and cen- count for the entire Tor network since the frequency ter). Assuming that the distribution of client connec- distribution of the countries, as seen by the guards, is tion counts reflects the distribution of client usage, we difficult to determine. However, we can still roughly can surmise that these three countries use Tor the most. estimate the range of network-wide values to be [141, Interestingly, our results disagree with those from the 250], where the lower bound is the least possible ob- Tor Metrics Portal, which ranks the United Arab Emi- served count and the upper bound is the total number rates (UAE) as contributing the second largest number of countries that exists worldwide. of Tor users; in contrast, we did not find the UAE to Network Diversity. We also measure the network di- be among the most significant contributors. Although versity of Tor users by mapping each client IP address we cannot fully explain this discrepancy, we are some- to its autonomous system (AS) using the IPv4 and IPv6 what skeptical of the Tor Metric Portal’s results: the datasets (dated 26th November 2017) from CAIDA [7]. value reported by the Tor Metrics Portal implies that Using PSC, we first measure the unique AS count of nearly 4% of the entire population of the UAE accesses the client IPs, for a 24 hour period beginning on 2018- Tor on a daily basis, which seems unlikely. However, al- 04-18. We observe clients from about 11,882 ASes (CI: though the UAE has neither a significant client connec- [11,708; 12,053])—approximately 20% of the number of tion count nor client data transferred, it ranks sixth in defined ASes. the number of client circuits (see Figure4, right; marked To extrapolate the unique AS count for the entire as “AE”). One intriguing possibility is that the majority Tor network, we must first identify the frequency dis- of the Tor clients from the UAE are partially blocked tribution of the ASes, as seen by the guards. This is from using Tor: while they are able to construct direc- challenging. Therefore, we present the range of most- tory circuits (recall that Tor’s user statistics are esti- likely network-wide values as [11,708; 59,597], where the mated from the number of directory requests), they are lower bound is the least possible observed count and the prevented from establishing regular Tor circuits, caus- upper bound is the total number of ASes. ing the Tor software to repeatedly perform directory To determine whether there are “hotspot” ASes, we fetches. We leave the exploration of this strange Tor also ran PrivCount and recorded counts for each of client behavior in the UAE as a future research direc- CAIDA’s 1000 highest ranked ASes [4], where an AS’s tion. rank is based on the number of ASes in its customer To determine whether Tor has a global userbase, we cone. Over the 24 hour measurement period begin- use PSC to count the number of unique countries from ning on 2018-05-01, we found that none of the Top 1000 which clients originate. Since the actual count can only ASes had results that were statistically significant (i.e., be at most 250 (i.e., the total number of countries clearly distinguishable from noise). Although, we did worldwide), invariably the differentially private noise not find any individual AS that contributes a large frac- overwhelms the actual count. Therefore to reduce the

10 Table 6: Network-wide number of unique onion addresses in- Table 7: Network-wide onion service descriptor statistics in- ferred from our HSDir PSC measurements. ferred from our HSDir PrivCount measurements.

Statistic Count 95% CI Statistic Count 95% CI Addresses published 70,826 [65,738; 76,350] Fetched 134 million [117; 150] million Addresses fetched 74,900 [34,363; 696,255] Succeeded 12.2 million [10.6; 13.7] million Failed 121 million [103; 140] million Fail rate 1,400 failed/s [1,192; 1,620] failed/s tion of Tor users, we found that the ASes outside the Public 56.8% [36.9; 83.6]% top 1000 have about 53% of the client connections, 52% Unknown 47.6% [28.8; 72.7]% of the client data, and 62% of the client circuits.

6. ONION-SERVICE MEASUREMENTS 2018-04-30 during which our mean combined HSDir fetch weight was 0.534%, we observed 2,401 (CI: [1,101; 3,718]) We conducted a number of measurements to deter- unique v2 onion addresses in descriptors fetched from mine how many Tor onion services exist, which kinds of our HSDirs. We extrapolate these results based on HS- services are popular, and how these services are used. Dir replication to infer that the total number of unique onion service addresses fetched for the entire Tor net- 6.1 Unique Onion Addresses work is 74,900 (CI: [34,363; 696,255]). By comparing Recall from §2.1 that Tor onion services store their the number of unique onion addresses published and descriptors at onion service directories (HSDirs) in a fetched, we estimate that between 45% and 100% of ac- DHT. To measure the number of unique onion services tive onion services are used by clients. Note that the in the Tor network, we instrument our HSDirs to report Tor Metrics Portal [43] does not estimate the number the onion service address for each version 2 (v2) [9] de- of unique onion addresses fetched by clients, and its es- scriptor that is published to and fetched from the DHT. timate of the number of unique v2 onion addresses pub- (We don’t measure version 3 (v3) onion service descrip- lished occasionally varies significantly (e.g., it increased tors because the onion address is obscured using key by 40 thousand during our measurement). blinding [10].) We use the PSC deployment described in §3.1 to safely capture the approximate number of v2 6.2 Onion Service Directory Usage onion services observed by our HSDirs. Unlike existing While the number of unique onion addresses (§6.1) work that also attempts to quantify the number of onion informs our understanding of how many onion services services [26,30,31,39], we avoid the need to store (even exist, the number of descriptor actions informs our un- temporarily) onion addresses, since PSC uses oblivious derstanding of usage. Therefore, we use the PrivCount counters. The results of our onion service address mea- deployment described in §3.1 to safely measure the num- surements are summarized in Table6. ber of v2 onion service descriptors that are fetched from Unique Onion Service Addresses Published. From our HSDirs. We also count the number of requests for a PSC measurement conducted between 2018-04-23 and descriptors that are not stored in the DHT, typically be- 2018-04-24 during which our mean combined HSDir pub- cause the service is inactive. Our onion service descrip- lish weight was 2.75%, we observed 3,900 (CI: [3,769; tor measurement results are summarized in Table7. 4,045]) unique v2 onion addresses in descriptors pub- Onion Service Descriptors Fetched. From a Priv- lished to our HSDirs. We extrapolate these results Count measurement conducted between 2018-05-20 and based on HSDir replication to infer that the total num- 2018-05-21 during which our mean combined HSDir fetch ber of unique onion service addresses published for the weight was 0.465%, we infer that there were a total entire Tor network is 70,826 (CI: [65,738; 76,350]). Ac- of 134 million (CI: [117; 150] million) v2 descriptors cordingly, our relays observed at least 4.93% of Tor’s fetched in the entire Tor network. Surprisingly, 90.9% unique onion addresses. Using a different extrapolation (CI: [87.8; 93.2]%) of these fetches failed, i.e., the cor- method [26], the Tor Metrics Portal [43] estimates a to- responding descriptor was not present in the HSDir’s tal of 79 thousand unique v2 onion services published cache or the request was malformed. Our results in- in the Tor network at the time of our measurement. dicate that there are at least 103 million failures per The Tor metrics estimate does not include a confidence day, or about 1,192 failures per second. To be sure we interval which we expect to be significant since every did not observe a network anomaly, we conducted this reporting relay adds noise (due to the lack of secure ag- measurement multiple times and observed consistent re- gregation of Tor metrics reports). Therefore, we expect sults. We speculate that this large failure rate may be that our confidence intervals would overlap. due to botnets or onion site scanners with outdated Unique Onion Service Addresses Fetched. From onion address lists. Tor’s current statistics reporting a PSC measurement conducted between 2018-04-29 and infrastructure is unable to collect HSDir lookup fail-

11 Table 8: Network-wide rendezvous statistics inferred from our of the observed rendezvous circuits failed because the RP PrivCount measurements. connection to the RP was closed before the service com- pleted the rendezvous protocol, and 84.9% (CI: [77.0; Statistic Count 95% CI 93.5]%) of the observed rendezvous circuits failed be- Total Circuits 366 million [351; 380] million cause the circuit expired (timed-out) before the service Succeeded 8.08% [3.47; 13.1]% Failed: conn. closed 4.37% [0.0; 9.23]% completed the rendezvous protocol. Failed: circuit expired 84.9% [77.0; 93.5]% Rendezvous Circuit Data. From the same PrivCount Cell payload 20.1 TiB [15.2; 24.9] TiB measurement as above, we infer a total of 20.1 TiB (CI: Cell payload / second 2.04 Gbit/s [1.55; 2.53] Gbit/s [15.2; 24.9] TiB) of cell payload data on rendezvous cir- Cell payload / circuit 730 KiB/circ. [341; 2,070] KiB/circ. cuits in the entire Tor network. Our inference corre- sponds to a mean of 2.04 Gbit/s (CI: [1.55; 2.53] Gbit/s) of payload data. By combining our circuit and cell ob- ures [29], but we have demonstrated the feasibility of servations, we calculate that the mean amount of data using the PrivCount protocol to collect this statistic in per active Tor rendezvous circuit is 730 KiB (CI:[341; a privacy-preserving manner. 2070] KiB). From the measurement described above, we infer that there were a total of 12.2 million (CI: [10.6; 13.7] mil- lion) v2 descriptors successfully fetched in the entire 7. RELATED WORK Tor network. For every successful descriptor fetch, we Several efforts have attempted to improve our under- checked if the address was available in the latest standing of the live Tor network. Most notably, the onion site search index [1]. We found that 56.8% (CI: Tor Project has allowed researchers to access aggregate [36.9; 83.6]%) of descriptors fetched were present in the and longitudinal data about the Tor network through ahmia list at the time of our measurement, while 47.6% its Tor Metrics Portal [43] since 2010, with directory (CI: [28.8; 72.7]%) were not. Our results indicate that data spanning back to May 2004 [34]. a majority of successful onion service accesses are visits Tor Metrics Portal. The Metrics Portal indirectly es- to onion sites with publicly available addresses. timates the number of Tor users by counting the num- ber of requests to the subset of Tor directory mirrors 6.3 Rendezvous Point Usage that participate in statistics collection, and then ex- Recall from §2.1 that in order for a client to commu- trapolating to the entire network by dividing by the nicate with an onion service, both parties build circuits fraction of participating directory mirrors. This tech- to a relay called a rendezvous point (RP), and that this nique was originally proposed by Loesing et al. [35]. In rendezvous circuit carries all end-to-end encrypted ap- contrast with the Tor Metrics Portal, our measurements plication payloads. While we cannot measure onion ser- are based on direct observations of connecting users and vice usage by counting streams (onion service streams do not depend on heuristics about how often clients ac- are unobservable by our RPs because data cells are end- cess directory mirrors. (Indeed, as described in §5, our to-end encrypted), we can infer rendezvous circuit ac- direct measurements do not align well with the Metric tivity by counting the number of cells on the circuit. Portal’s indirect estimates.) We instrument our RPs to report the number of cells Towards safe Tor measurements. McCoy et al. per- sent on v2 [9] and v3 [10] rendezvous circuits, and use formed one of the first studies that attempted to dis- the PrivCount deployment described in §3.1 to safely cern how users used Tor [38]. There, the authors per- measure the approximate number of onion service ren- formed packet capture at a small subset of Tor ingress dezvous circuits and cells relayed at our RPs. Table8 and egress points to determine the distributions of user summarizes the results of our RP measurements. locations and exit traffic by port (i.e., by application). Rendezvous Circuit Activity. From a PrivCount Chaabane et al. used similar methods to examine Bit- measurement conducted between 2018-05-22 and 2018- Torrent behavior on Tor [15]. Approaches such as these 05-23 during which our mean combined rendezvous weight that collect unobfuscated and potentially sensitive net- was 0.88%, we infer that there were a total of 366 mil- work traces have been criticized on both ethical and lion (CI: [351; 380] million) rendezvous circuits in the legal grounds [40, 41]. entire Tor network. Note that since a successful ren- There has been an increasing effort in developing tech- dezvous involves a client and service circuit, each such niques to safely measure the Tor network. Elahi et rendezvous is counted as 2 circuits at the RP. Surpris- al. [23] were the first to propose the use of differential ingly, only 8.08% (CI: [3.47; 13.1]%) of the circuits that privacy to provide privacy-preserving aggregate data we observed succeeded and were active, i.e., they were about Tor. Specifically, they propose two schemes known used to transfer at least one cell containing an applica- collectively as PrivEx. PrivCount [27], which we use tion payload. We found that 4.37% (CI: [0.0; 9.23]%) in our measurements, extends the secret-sharing vari-

12 ant of PrivEx by supporting repeatable measurement methods: we use open source tools [24,27] for statistics phases. HisTor [36] also proposes the use of differen- collection and we submitted our research plan (prior to tial privacy to safely measure statistics on anonymity its implementation) to several review bodies. We pre- networks, and adds integrity protections by bounding sented it to the Tor Research Safety Board [11], which the influence of malicious data collectors (DCs). How- concluded that our plan “provides a plausible strategy ever, compared to PrivCount, HisTor is limited in the for safely measuring trends on the Tor network.”2 (Al- types of queries that it supports and lacks PrivCount’s though some of the authors serve on the TRSB, we mature implementation. The private set-union cardi- were not involved in the TRSB’s deliberations of our nality (PSC) protocol of Fenske et al. [24] also provides research plan.) Our measurement study was approved differentially private measurements. We review differ- by the ethics board at the University of New South ential privacy in §2.2. Sections §2.3 and §2.4 describe Wales and certified as being exempt as non-human sub- PrivCount and PSC in more detail. jects research by the Institutional Review Board (IRB) Understanding Onion Services. Goulet et al. [26] at Georgetown University Additionally, aspects of this describe the benefits and privacy risks of statistics col- study were discussed with members of Georgetown’s Of- lection for onion services. They conclude with sugges- fice of General Counsel, who raised no concerns. tions for privacy-preserving statistics collection, includ- ing the use of differential privacy. Their proposal partly 9. CONCLUSION AND DISCUSSION inspired our work, in which we use differential privacy This paper presents the most comprehensive privacy- and other techniques [24] to conduct measurements of preserving measurement study of the live Tor network onion sites. Biryukov et al. demonstrate how the design to date. Our findings confirm that Tor is used predomi- of onion services allows an attacker to gauge the popu- nantly for web browsing and that Tor users tend to visit larity of an onion service and potentially deanonymize the same popular (i.e., top Alexa) sites as do ordinary it [14]. However, their envisioned attacks are not suit- (and non-anonymous) Internet users. We observe a sur- able for conducting privacy-preserving measurements. prisingly large number of Tor connections and unique Recently, Jansen et al. show how to use PrivCount to client IP addresses, suggesting that the heuristically- safely measure the Tor network to discover the popular- derived estimates from the Tor Metrics Portal are sig- ity of onion services [28]. Their work focused on traffic nificantly underreporting Tor’s userbase. We find that analysis attacks and the popularity study of a single so- the network’s clients are highly distributed, connecting cial networking onion service. Finally, Owen and Sav- from more than 200 countries and nearly 12,000 ASes. age perform empirical measurements of Tor’s onion ser- These clients construct more than 1.2 billion anony- vices [39]. We apply similar techniques—operating Tor mous circuits per day, carrying approximately 517 TiB relays and observing HSDir lookups—but also protect of data (6.1GiB/s). Unexpectedly, our measurements user privacy by using differentially private techniques. show that more than 90% of onion service descriptor fetches fail, suggesting the presence of botnets on Tor 8. ETHICAL AND SAFETY or aggressive onion site crawlers with outdated address CONSIDERATIONS lists. We report 20 TiB of data transferred daily over Statistics collection in the setting of an anonymity circuits to onion services, representing roughly 3.9% of network inherently carries risk since the exposure of in- all Tor traffic. formation could degrade the privacy of the network’s Our study does not attempt to distinguish between users and potentially subject them to harm. Guiding human-driven and automated activity on the Tor net- our study were the four criteria for ethical network re- work. An intriguing open problem is the construction of search established by the Menlo Report [19]: we up- techniques to identify when measurements are heavily hold the principle of respect for persons through the influenced by automated activities (for example, when careful and principled application of data protections, swarms of infected hosts belonging to a botnet connect including the use of differentially private techniques and to Tor). Indeed, the enormously high failure rate of the avoidance of collecting personally identifiable infor- onion service descriptor fetches strongly suggests that mation (e.g., IP addresses). Following the principle of automated behavior likely does occur on Tor. Our hope beneficience, we balance the (low) risk of harm with the is that measurement studies such as this will help in- potential benefits of the research—namely, an increased form developers and security researchers of unexpected understanding of the Tor network that could inform re- automated activities—a necessary prerequisite to un- search on improving the network’s security and perfor- derstanding and potentially defending against malicious mance. Our techniques achieve the Menlo Report’s no- automated processes. tion of justice, since our statistics do not target a specific 2The full text of the TRSB’s findings is available subpopulation of Tor’s users. Finally, we achieve respect at ://security.cs.georgetown.edu/measurement-study/ for law and public interest through transparency in our trsb-feedback.txt.

13 Acknowledements https://cbsboston.files.wordpress.com/2013/12/ This work has been partially supported by the Office of kimeldoharvard.pdf. Naval Research, the National Science Foundation un- [17] and Nick Mathewson. Tor der grant number CNS-1527401, and the Department Protocol Specification. of Homeland Security Science and Technology Direc- https://spec.torproject.org/tor-spec. torate, Homeland Security Advanced Research Projects [18] Roger Dingledine, Nick Mathewson, and Paul Agency, Cyber Security Division under agreement num- Syverson. Tor: The Second-Generation Onion ber FTCY1500057. The views expressed in this work Router. In USENIX Security, 2004. are strictly those of the authors and do not necessarily [19] David Dittrich, Erin Kenneally, et al. The Menlo reflect the official policy or position of any employer or Report: Ethical Principles Guiding Information funding agency. and Communication Technology Research. Technical report, US Department of Homeland 10. REFERENCES Security, August 2012. [1] Ahmia. https://ahmia.fi. [20] Cynthia Dwork, Krishnaram Kenthapadi, Frank [2] Alexa Top 1 Million Global Sites. McSherry, Ilya Mironov, and Moni Naor. Our https://www.alexa.com/topsites. Data available Data, Ourselves: Privacy via Distributed Noise at http: Generation. In Advances in Cryptology //s3.amazonaws.com/alexa-static/top-1m.csv.zip. (EUROCRYPT), 2006. Accessed 2017-12-21. [21] Cynthia Dwork, Frank McSherry, Kobbi Nissim, [3] Alexa Top Sites by Category. and Adam Smith. Calibrating Noise to Sensitivity https://www.alexa.com/topsites/categories. in Private Data Analysis. In Theory of Accessed 2018-01-11. Conference, 2006. [4] CAIDA AS Rank Service. [22] Cynthia Dwork and Aaron Roth. The Algorithmic http://as-rank.caida.org/api/v1/. Foundations of Differential Privacy. Foundations [5] Public Suffix List. https://publicsuffix.org. and Trends® in Theoretical Computer Science, [6] Ricochet. https://ricochet.im/. 9(3–4):211–407, 2014. [7] Routeviews Prefix-to-AS mappings (pfx2as) for [23] Tariq Elahi, George Danezis, and . IPv4 and IPv6. http://data.caida.org/datasets/ PrivEx: Private Collection of Traffic Statistics for routing/routeviews-prefix2as/. Anonymous Communication Networks. In ACM [8] Tor directory protocol, version 3. Conference on Computer and Communications https://gitweb.torproject.org/torspec.git/plain/ Security (CCS), 2014. dir-spec.txt. [24] Ellis Fenske, Akshaya Mani, Aaron Johnson, and [9] Tor Rendezvous Specification (version 2). Micah Sherr. Distributed Measurement with https://gitweb.torproject.org/torspec.git/plain/ Private Set-Union Cardinality. In ACM rend-spec-v2.txt. Conference on Computer and Communications [10] Tor Rendezvous Specification (version 3). Security (CCS), 2017. https://gitweb.torproject.org/torspec.git/plain/ [25] David Goulet. tor-project listserv post, available rend-spec-v3.txt. at https://lists.torproject.org/pipermail/tor- [11] Tor Research Safety Board. project/2017-December/001604.html, 2017. https://research.torproject.org/safetyboard.html. [26] David Goulet, Aaron Johnson, George [12] Tor2web: Browse the tor onion services. Kadianakis, and Karsten Loesing. Hidden-service https://www.tor2web.org/. Statistics Reported by Relays. Technical Report [13] Lada A Adamic and Bernardo A Huberman. 2015-04-001, The Tor Project, Inc., April 2015. Zipf’s Law and the Internet. Glottometrics, [27] Rob Jansen and Aaron Johnson. Safely Measuring 3(1):143–150, 2002. Tor. In ACM Conference on Computer and [14] Alex Biryukov, Ivan Pustogarov, and Ralf-Philipp Communications Security (CCS), 2016. Weinmann. Trawling for Tor Hidden Services: [28] Rob Jansen, Marc Juarez, Rafael Galvez, Tariq Detection, Measurement, Deanonymization. In Elahi, and Claudia Diaz. Inside Job: Applying IEEE Symposium on Security and Privacy Traffic Analysis to Measure Tor from Within. In (Oakland), 2013. Network and Distributed System Security [15] A. Chaabane, P. Manils, and M.A. Kaafar. Symposium (NDSS), 2018. Digging into Anonymous Traffic: A Deep Analysis [29] George Kadianakis. Some Statistics about Onions. of the Tor Anonymizing Network. In IEEE https://blog.torproject.org/some-statistics-about- Network and System Security, 2010. onions. [16] Thomas M. Dalton. Affidavit of Special Agent Thomas M. Dalton. Available at

14 [30] George Kadianakis, David Goulet, Karsten Differentially Private and Robust Statistics Loesing, and Aaron Johnson. Better Hidden Collection for Tor. In Network and Distributed Service Stats from Tor Relays. System Security Symposium (NDSS), 2017. https://gitweb.torproject.org/torspec.git/tree/ [37] Srdjan Matic, Carmela Troncoso, and Juan proposals/238-hs-relay-stats.txt, 2014. Caballero. Dissecting Tor Bridges: a Security [31] George Kadianakis and Karsten Loesing. Evaluation of Their Private and Public Extrapolating Network Totals from Infrastructures. In Network and Distributed Hidden-Service Statistics. Technical Report Systems Security Symposium, pages 1–15, 2017. 2015-01-001, The Tor Project, January 2015. [38] Damon McCoy, Kevin Bauer, Dirk Grunwald, [32] David Karger, Eric Lehman, Tom Leighton, Rina Tadayoshi Kohno, and Douglas Sicker. Shining Panigrahy, Matthew Levine, and Daniel Lewin. Light in Dark Places: Understanding the Tor Consistent Hashing and Random Trees: Network. In Privacy Enhancing Technologies Distributed Caching Protocols for Relieving Hot Symposium (PETS), 2008. Spots on the World Wide Web. In ACM [39] Gareth Owen and Nick Savage. Empirical Symposium on Theory of Computing (STOC), Analysis of Tor Hidden Services. IET Information 1997. Security, 10(3):113–118, 2016. [33] Serge A Krashakov, Anton B Teslyuk, and Lev N [40] Chris Soghoian. Researchers Could Face Legal Shchur. On the Universality of Rank Distributions Risks for Network Snooping. of Website Popularity. Computer Networks, http://www.cnet.com/news/researchers-could- 50(11):1769–1780, 2006. face-legal-risks-for-network-snooping/, July 2008. [34] Karsten Loesing. 10 Years of Collecting Tor [41] Christopher Soghoian. Enforced Community Directory Data. Tor Blog post. Available at Standards for Research on Users of the Tor https://blog.torproject.org/10-years-collecting- Anonymity Network. In Workshop on Ethics in tor-directory-data. Research (WECSR), 2011. [35] Karsten Loesing, Steven J. Murdoch, and Roger [42] Tor: Onion Service Protocol. https:// Dingledine. A Case Study on Measuring www.torproject.org/docs/onion-services.html.en. Statistical Data in the Tor Anonymity Network. [43] Tor Project, Inc. Tor Metrics Portal. In Financial Cryptograpy and Data Security, 2010. https://metrics.torproject.org/. [36] Akshaya Mani and Micah Sherr. Histor:

15