Measuring and Monitoring the Tor Network Aaron Johnson
August 19th, 2018 Encryption and Surveillance Workshop References and Acknowledgements
Understanding Tor Usage with Privacy-Preserving Measurement Akshaya Mani (Georgetown University), T Wilson-Brown (UNSW Canberra Cyber, University of New South Wales), Rob Jansen (U.S. Naval Research Laboratory) Aaron Johnson (U.S. Naval Research Laboratory) Micah Sherr (Georgetown University), To appear in the 2018 Internet Measurement Conference.
Tunable Transparency: Secure Computation in the Tor Network Ryan Wails (U.S. Naval Research Laboratory) Aaron Johnson (U.S. Naval Research Laboratory) Daniel Starin (George Mason University, Vencore Labs) Arkady Yerukhimovich (MIT Lincoln Laboratory) S. Dov Gordon (George Mason University) In preparation (draft available).
2 Background: Tor Tor Background
Users Destinations
Tor is a popular system for anonymous, censorship-resistant Internet communication.
4 Tor Background: Onion Routing
Users Relays Destinations
Circuit Stream 5 Tor Background: Onion Routing
Users Relays Onion Services (e.g. nytimes3xbfgragh.onion) Circuit Stream 6 Tor Background: Who Uses Tor
• Over 2,000,000 daily users • Over 6000 relays in over 75 countries • 100Gbps aggregate traffic 7 Tor Measurement and Monitoring
Do network privacy and transparency conflict?
8 Problem: Privacy & Transparency Tor Measurement and Monitoring
Privacy risks of measuring Tor § Deanonymizing individual connections § Storing sensitive data at relays risks leaks from compromise § Revealing “interesting” users (e.g. from censored locations) § Revealing private onion services
10 Tor Measurement and Monitoring
Problems without some transparency § Level of anonymity unknown § Network subject to silent attack and abuse § Network can be covertly used for attack and abuse § Network management and improvement difficult
11 Tor Measurement
https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in
12 Tor Measurement
https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in
Inaccurate 13 Tor Measurement
https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in
Unsafe 14 Tor Measurement
https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in
Incomplete 15 Secure Aggregation Secure Aggregation n
Data Collectors x1 x2 x3 (DCs) / Relays
Output is noisy Data Aggregators aggregate, hiding the (DAs) inputs xi. m Data Collection: Developed two systems: 1. DCs store data obliviously during § PrivCount: Computes sums measurement period. § PSC: Computes private 2. DCs secret-share inputs to DAs at set-union cardinality end of measurement period. § Tolerate m-1 malicious DAs 3. DAs run protocol to aggregate and § Transitioning PrivCount add differentially-private noise. into Tor: Proposal 288 17 Tor Measurement Study
§ Performed Tor measurements § Exit, entries, and onion-service statistics § 24-hour measurements § January – May 2018 § Ran 16 Tor relays § 1.5% total exit, 1.2% guard, 2.8% onion lookup § Canada, France, US § Used PrivCount and PSC § 3 Data Aggregators (DAs) § 3 DA operators § Located in US and Australia
18 Tor Measurement Study: Exit Statistics
50 47.8 Sites in Rank Set 40.1 torproject.org 25 21.7
8.4 6.2 7.0 5.1 4.3 Alexa Rank 0 7.7
(0,10] other (10,100] (100,1k] (1k,10k] (10k,100k](100k,1m] 50 48.1 Sites in Siblings Set 39.0 torproject.org 25 Primary Domain Count (%) 9.7 2.4 0 0.1 0.3 0.0 0.0 0.2 0.0 0.1 0.4 Alexa Siblings
qq (9) other google (1) baidu (4) yahooreddit (6) (8) torproject youtubefacebook (2) (3)wikipedia (5) amazonduckduckgo (10) Tor Web connections to popular domains (Alexa top 1M)
19 Tor Measurement Study: Entry and Onion Services
§ Daily client activity (95% CI inferred network-wide) § Unique client IPs: 6.61 – 11.2 million § “Promiscuous” clients: 14,400 – 21,500
§ Daily onion-service activity (95% CI inferred network-wide) § 1,350 – 1,740 lookups/second § 1,192 – 1,620 failed lookups/s ~93% failure rate
20 Secure Multiparty Computation Secure Multiparty Computation
Flexible transparency with MPC § Robust statistics to limit effect of malicious § Improved client-size estimation § Measure abuse of and with Tor § Botnets on onion services § Denial-of-service attacks § Hacking attempts (e.g. vulnerability scanning) § Site scraping
22 Secure Multiparty Computation n
Data Collectors x1 x2 x3 (DCs) / Relays
Output is some Computation Parties function f(x1,x2,x3), (CPs) hiding the inputs xi. m Data Collection: Tor MPC design 1. DCs store data obliviously during § TinyOT (Burra et al. 2015) for measurement period. offline/online Boolean-circuit 2. DCs secret-share inputs to CPs at evaluation. end of measurement period. § Secure against malicious, 3. CPs run protocol to compute some dishonest majority. function f on the inputs. 23 Secure Multiparty Computation
TinyOT performance estimates § 7,000 Data Collectors § 5 Computation Parties § 40-bit statistical security Median Count Distinct
Offline communication 12.7 GB 31.43 GB
Offline time (1Gbps BW) 1.69 minutes 4.19 minutes
Offline throughput 852/day 344/day
Online time (200ms RTT) 5 minutes 2 seconds
32-bit median values, count-distinct error 5.8% (LogLog)
24 Conclusions
§ Tor is developing privacy-focused mechanisms for measurement and monitoring. § Flexible transparency mechanisms raise new issues § If Tor can reveal information, will it become obligated to do so? § Where should the line between transparency and privacy be drawn? § What governance mechanisms can handle making these decisions? § Other systems may face similar measurement questions § Privacy-enhanced cryptocurrencies (Zcash, Monero) § Privacy-enhanced cloud services
25