<<

Measuring and Monitoring the Network Aaron Johnson

August 19th, 2018 Encryption and Surveillance Workshop References and Acknowledgements

Understanding Tor Usage with Privacy-Preserving Measurement Akshaya Mani (Georgetown University), T Wilson-Brown (UNSW Canberra Cyber, University of New South Wales), Rob Jansen (U.S. Naval Research Laboratory) Aaron Johnson (U.S. Naval Research Laboratory) Micah Sherr (Georgetown University), To appear in the 2018 Internet Measurement Conference.

Tunable Transparency: Secure Computation in the Tor Network Ryan Wails (U.S. Naval Research Laboratory) Aaron Johnson (U.S. Naval Research Laboratory) Daniel Starin (George Mason University, Vencore Labs) Arkady Yerukhimovich (MIT Lincoln Laboratory) S. Dov Gordon (George Mason University) In preparation (draft available).

2 Background: Tor Tor Background

Users Destinations

Tor is a popular system for anonymous, censorship-resistant Internet communication.

4 Tor Background: Onion Routing

Users Relays Destinations

Circuit Stream 5 Tor Background: Onion Routing

Users Relays Onion Services (e.g. nytimes3xbfgragh.onion) Circuit Stream 6 Tor Background: Who Uses Tor

• Over 2,000,000 daily users • Over 6000 relays in over 75 countries • 100Gbps aggregate traffic 7 Tor Measurement and Monitoring

Do network privacy and transparency conflict?

8 Problem: Privacy & Transparency Tor Measurement and Monitoring

Privacy risks of measuring Tor § Deanonymizing individual connections § Storing sensitive data at relays risks leaks from compromise § Revealing “interesting” users (e.g. from censored locations) § Revealing private onion services

10 Tor Measurement and Monitoring

Problems without some transparency § Level of anonymity unknown § Network subject to silent attack and abuse § Network can be covertly used for attack and abuse § Network management and improvement difficult

11 Tor Measurement

https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in

12 Tor Measurement

https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in

Inaccurate 13 Tor Measurement

https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in

Unsafe 14 Tor Measurement

https://metrics.torproject.org Some current Tor measurements Data How measured Privacy techniques Relay bandwidth capacity Self, BW Authorities Test measurements Relay used bandwidth Per relay Report every 4 hrs Total daily users Per relay Inferred from consensus downloads Users per country Per relay Report every 24 hrs, round, opt-in # onion services Per relay Differential privacy, round Exit traffic per port Per relay Report every 24 hrs, opt-in

Incomplete 15 Secure Aggregation Secure Aggregation n

Data Collectors x1 x2 x3 (DCs) / Relays

Output is noisy Data Aggregators aggregate, hiding the (DAs) inputs xi. m Data Collection: Developed two systems: 1. DCs store data obliviously during § PrivCount: Computes sums measurement period. § PSC: Computes private 2. DCs secret-share inputs to DAs at set-union cardinality end of measurement period. § Tolerate m-1 malicious DAs 3. DAs run protocol to aggregate and § Transitioning PrivCount add differentially-private noise. into Tor: Proposal 288 17 Tor Measurement Study

§ Performed Tor measurements § Exit, entries, and onion-service statistics § 24-hour measurements § January – May 2018 § Ran 16 Tor relays § 1.5% total exit, 1.2% guard, 2.8% onion lookup § Canada, France, US § Used PrivCount and PSC § 3 Data Aggregators (DAs) § 3 DA operators § Located in US and Australia

18 Tor Measurement Study: Exit Statistics

50 47.8 Sites in Rank Set 40.1 torproject.org 25 21.7

8.4 6.2 7.0 5.1 4.3 Alexa Rank 0 7.7

(0,10] other (10,100] (100,1k] (1k,10k] (10k,100k](100k,1m] 50 48.1 Sites in Siblings Set 39.0 torproject.org 25 Primary Domain Count (%) 9.7 2.4 0 0.1 0.3 0.0 0.0 0.2 0.0 0.1 0.4 Alexa Siblings

qq (9) other (1) (4) yahooreddit (6) (8) torproject youtubefacebook (2) (3) (5) amazonduckduckgo (10) Tor Web connections to popular domains (Alexa top 1M)

19 Tor Measurement Study: Entry and Onion Services

§ Daily client activity (95% CI inferred network-wide) § Unique client IPs: 6.61 – 11.2 million § “Promiscuous” clients: 14,400 – 21,500

§ Daily onion-service activity (95% CI inferred network-wide) § 1,350 – 1,740 lookups/second § 1,192 – 1,620 failed lookups/s ~93% failure rate

20 Secure Multiparty Computation Secure Multiparty Computation

Flexible transparency with MPC § Robust statistics to limit effect of malicious § Improved client-size estimation § Measure abuse of and with Tor § Botnets on onion services § Denial-of-service attacks § Hacking attempts (e.g. vulnerability scanning) § Site scraping

22 Secure Multiparty Computation n

Data Collectors x1 x2 x3 (DCs) / Relays

Output is some Computation Parties function f(x1,x2,x3), (CPs) hiding the inputs xi. m Data Collection: Tor MPC design 1. DCs store data obliviously during § TinyOT (Burra et al. 2015) for measurement period. offline/online Boolean-circuit 2. DCs secret-share inputs to CPs at evaluation. end of measurement period. § Secure against malicious, 3. CPs run protocol to compute some dishonest majority. function f on the inputs. 23 Secure Multiparty Computation

TinyOT performance estimates § 7,000 Data Collectors § 5 Computation Parties § 40-bit statistical security Median Count Distinct

Offline communication 12.7 GB 31.43 GB

Offline time (1Gbps BW) 1.69 minutes 4.19 minutes

Offline throughput 852/day 344/day

Online time (200ms RTT) 5 minutes 2 seconds

32-bit median values, count-distinct error 5.8% (LogLog)

24 Conclusions

§ Tor is developing privacy-focused mechanisms for measurement and monitoring. § Flexible transparency mechanisms raise new issues § If Tor can reveal information, will it become obligated to do so? § Where should the line between transparency and privacy be drawn? § What governance mechanisms can handle making these decisions? § Other systems may face similar measurement questions § Privacy-enhanced cryptocurrencies (Zcash, Monero) § Privacy-enhanced cloud services

25