Website Fingerprinting at Internet Scale
Andriy Panchenko1, Fabian Lanze1, Andreas Zinnen2, Martin Henze3, Jan Pennekamp1, Klaus Wehrle3, Thomas Engel1
1Interdisciplinary Centre for Security, Reliability and Trust (SnT), Luxembourg 2RheinMain University of Applied Sciences, Germany 3RWTH Aachen University, Germany Background
Why people use Tor...
Privacy has become a general concern Access to the Internet is censored in many countries ?
Website Fingerprinting
OR OR
OR OR OR Client Server
OR OR
Tor: The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information ?
Website Fingerprinting
OR OR
EntryOR MiddleOR ExitOR Client Server
OR OR
Tor: The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information ?
Website Fingerprinting
OR OR
EntryOR MiddleOR ExitOR Client Server
OR OR
Tor: The Onion Router Most popular low-latency anonymization network Many users rely on Tor to access unfiltered information Website Fingerprinting
OR ? OR
EntryOR MiddleOR ExitOR Client Server
OR OR
What is website fingerprinting? Identify website accessed without breaking cryptography Attacker is a passive observer Features based on packet size, direction, ordering, timing Website Fingerprinting - state of the art
Widely discussed and hot topic in anonymity research
State-of-the-art approach: Wang et al.(Usenix Sec’14) k-Nearest Neighbor approach manually selected features (e.g., bursts, unique lengths) about 4,000 features recognition rates > 90%
2 scenarios for evaluation Closed world: user visits only a fixed number of websites Open world: monitor set of sites (user may visit unknown sites) Our method
Idea Don’t try to guess which characteristics may be relevant Use a representation that implicitly covers all characteristics
Our feature set: (Nin,Nout,Sin,Sout,C1, ··· ,Cn ) | {z } | {z } basic properties cumulative features
7000
C(T1) 6000 i sampled for T1 5000 C C(T2)
4000 i sampled for T2 C 3000
2000
1000
Cumulative Sum of Packet Sizes 0
1000 − 0 2 4 6 8 10 12 14 16 18 Packet Number Example
200 about.com google.de 150
100
50 Feature Value [kByte]
0 20 40 60 80 100 Feature Index Fixed number of distinctive characteristics from traces with varying lengths Fingerprints can be visualized Used as input for a Support Vector Machine Layers of data representation
Tor cells Cell 1 Cell 2 Cell 3 Cell 4 Cell 5
TLS records Record 1 * Record 2
TCP packets Packet 1 Packet 2 Packet 3
Information src for feature extraction: Cell vs. TLS vs. TCP Practically nigligible effect on the classification accuracy Comparison with state of the art – classification
Closed world Accuracy [%] for 100 most popular websites 90 instances 40 instances k-NN (3736 features) 90.84 89.19 Our method (104 features) 91.38 92.03
Open world Foreground: 100 blocked websites, background: 9,000 popular websites
TPR FPR k-NN 90.59 2.24 Our method 96.92 1.98 Comparison of computational performance
103
102
101
100
1 10−
2 10− k-NN
Average Processing Time [h] 3 CUMUL 10− CUMUL (parallelized) 4 10− 0 10000 20000 30000 40000 50000 Background Set Size Computation time for 100 random monitored pages in open world Website fingerprinting in reality
Critique Data sets used are not representative! too small, only popular websites / index pages Simplified assumptions, wrong metrics for evaluation
RND-WWW: How do people access the world wide web? Twitter Alexa-one-click Googling the trends > 120,000 web pages Googling at random Censored in China
Tor-Exit: Which pages do users actually access over Tor? Monitor a Tor Exit node ⇒ 211,148 web pages Webpage fingerprinting at Internet scale
Question: Does the attack scale under realistic assumptions?
Which metric to evaluate? Accuracy: fraction of true results True Positive rate / Recall: fraction of monitored pages detected False Positive Rate: fraction of false alarms Problem: misleading interpretation ⇒ base rate fallacy Precision: probability that the classifier is correct given it has detected a monitored page
Focus of evaluation Precision and recall for increasing background set sizes Random subset as foreground Webpage fingerprinting at Internet scale
Question: Does the attack scale under realistic assumptions?
Results for RND-WWW 100 100
80 80
60 60
b = 1000 b = 1000 40 40 b = 5000 b = 5000 b = 9000 b = 9000 20 b = 20000 20 b = 20000 b = 50000 b = 50000
Fraction of Foreground Pages [%] b = 111884 Fraction of Foreground Pages [%] b = 111884 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision Webpage fingerprinting at Internet scale
Question: Does the attack scale under realistic assumptions?
Results for Tor-Exit 100 100
80 80
60 60
b = 1000 b = 1000 40 b = 5000 40 b = 5000 b = 9000 b = 9000 b = 20000 b = 20000 20 b = 50000 20 b = 50000 b = 111884 b = 111884
Fraction of Foreground Pages [%] b = 211148 Fraction of Foreground Pages [%] b = 211148 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision Webpage fingerprinting at Internet scale
Question: Does the attack scale under realistic assumptions?
Results for Tor-Exit 100 100
80 80
60 60
b = 1000 b = 1000 40 b = 5000 40 b = 5000 b = 9000 b = 9000 b = 20000 b = 20000 20 b = 50000 20 b = 50000 b = 111884 b = 111884
Fraction of Foreground Pages [%] b = 211148 Fraction of Foreground Pages [%] b = 211148 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Precision Answer: No. Webpage fingerprinting at Internet scale
Question: Is it at least possible for certain pages? Webpage fingerprinting at Internet scale
Question: Is it at least possible for certain pages?
Minimum number of mistakenly confused pages
100 b=20 000 b=50 000 80 b=100 000
60
40
20 Fraction of Foreground Pages [%]
0 0 50 100 150 200 250 300 350 400 Number of Webpage Confusions
No single page without a confusingly similar page in a realistic universe. How about fingerprinting websites? (1/2)
A website is a collection of web pages served under the same domain Is it possible to fingerprint a website when only a subset of its pages are available for training? Experiment: 20 websites
1.0 ALJAZEERA 51 ALJAZEERA 47 1 2 1 AMAZON 51 AMAZON 28 5 1 1 4 3 1 1 3 3 1 0.9 BBC 50 1 BBC 43 1 1 4 2 CNN 51 CNN 2 45 1 3 0.8 EBAY 51 EBAY 2 1 32 3 1 2 2 1 2 2 2 1 FACEBOOK 50 1 FACEBOOK 41 2 1 1 1 2 3 0.7 IMDB 51 IMDB 49 2 KICKASS 51 KICKASS 1 49 1 0.6 LOVESHACK 49 1 1 LOVESHACK 1 45 2 2 1 RAKUTEN 51 RAKUTEN 1 2 2 44 1 1 0.5 REDDIT 51 REDDIT 3 48 RT 51 RT 4 1 44 1 1 0.4 SPIEGEL 1 1 48 1 SPIEGEL 1 2 1 47 STACKOVERFLOW 51 STACKOVERFLOW 1 3 2 1 2 3 31 1 1 2 2 2 0.3 TMZ 1 50 TMZ 1 2 1 46 1 TORPROJECT 51 TORPROJECT 1 1 3 7 31 1 7 0.2 TWITTER 50 1 TWITTER 4 2 1 1 1 5 1 1 1 1 33 WIKIPEDIA 51 WIKIPEDIA 1 3 1 1 5 3 37 0.1 XHAMSTER 1 50 XHAMSTER 3 1 47 XNXX 51 XNXX 1 50 0.0 RT RT BBC BBC TMZ TMZ CNN CNN EBAY EBAY IMDB IMDB XNXX XNXX REDDIT REDDIT AMAZON AMAZON SPIEGEL SPIEGEL KICKASS KICKASS TWITTER TWITTER RAKUTEN RAKUTEN FACEBOOK FACEBOOK WIKIPEDIA WIKIPEDIA XHAMSTER XHAMSTER ALJAZEERA ALJAZEERA LOVESHACK LOVESHACK TORPROJECT TORPROJECT STACKOVERFLOW STACKOVERFLOW (a) only index pages (b) different pages How about fingerprinting websites? (2/2)
Transition of results from closed-world to the realistic open-world setting is typically not trivial Website fingerprinting scales better than webpage fingerprinting
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2 Precision Precision Recall Recall 0.0 0.0 0 20000 40000 60000 80000 100000 120000 0 20000 40000 60000 80000 100000 120000 Background Set Size Background Set Size Summary
Our classifier with 104 features outperforms state of the art Alarming results under simplified assumptions can’t be generalized Webpage fingerprinting does not scale for appropriate universe sizes for any webpage Website fingerprinting is not only more realistic and also significantly more effective Conclusions drawn need to be reconsidered
Scripts and RND-WWW dataset: http://lorre.uni.lu/~andriy/zwiebelfreunde/ We are hiring!
Our lab within the Interdisciplinary Centre for Security, Reliability and Trust (Uni Luxembourg) is looking for PhD candidates and PostDocs in the area of anonymity and privacy
More information: http://secan-lab.uni.lu/jobs