Towards an Evaluation of a Recommended Tor Browser Configuration in Light of Website Fingerprinting Attacks
Total Page:16
File Type:pdf, Size:1020Kb
Towards an Evaluation of a Recommended Tor Browser Configuration in Light of Website Fingerprinting Attacks by Fayzah, Alshammari Thesis submitted In partial fulfillment of the requirements For the M.Sc. degree in Computer Science School of Electrical Engineering and Computer Science Faculty of Engineering University of Ottawa © Fayzah Alshammari, Ottawa, Canada, 2017 Abstract Website Fingerprinting (WF) attacks have become an area of concern for advocates of web Privacy Enhancing Technology (PET)s as they may allow a passive, local, eaves- dropper to eventually identify the accessed web page, endangering the protection offered by those PETs. Recent studies have demonstrated the effectiveness of those attacks through a number of experiments. However, some researchers in academia and Tor com- munity demonstrated that the assumptions of WF attacks studies greatly simplify the problem and don't reflect the evaluation of this vulnerability in practical scenarios. That leads to suspicion in the Tor community and among Tor Browser users about the efficacy of those attacks in real-world scenarios. In this thesis, we survey the literature of WF showing the research assumptions that have been made in the WF attacks against Tor. We then assess their practicality in real-world settings by evaluating their compliance to Tor Browser threat model, design requirements and to the Tor Project recommen- dations. Interestingly, we found one of the research assumptions related to the active content configuration in Tor Browser to be a reasonable assumption in all settings. Dis- abling or enabling the active content are both reasonable given the fact that the enabled configuration is the default of the Tor Browser, and the disabled one is the configuration recommended by Tor Project for users who require the highest possible security and anonymity. However, given the current published WF attacks, disabling the active con- tent is advantageous for the attacker as it makes the classification task easier by reducing the level of a web page randomness. To evaluate Tor Browser security in our proposed more realistic threat model, we collect a sample of censored dynamic web pages with Tor Browser in the default setting, which enables active content such as Javascript, and in the recommended setting by the Tor Project which disables the active content. We use Panchenko Support Vector Machine (SVM) classifier to study the identifiability of this ii sample of web pages. For pages that are very dynamic, we achieve a recognition rate of 42% when JavaScript is disabled, compared to 35% when turned on. Our results show that the recommended "more secure" setting for Tor Browser is actually more vulnerable to WF attacks than the default and non-recommended setting. iii Acknowledgements Profound thanks to Allah for his favor, grace, mercies and unrelenting love. To King Abdullah bin Abdulaziz Alsaud, may Allah have mercy on him, for his ex- ceptional role in promoting woman education and rights in Saudi Arabia. To King Saud University for sponsoring me during my studies. To my supervisor, the incomparable Professor Carlisle Adams. I'm lucky and honored to have this opportunity to work closely under his supervision and with him. I'm grateful for every single moment I spent in his office discussing problems, solutions, struggles, and ideas. I owe a lot of my professional, academic and personal development to Carlisle Adams. His patience, endless support, encouragement, and guidance have helped me to cross a lot of boundaries. This thesis wouldn't have seen the light without his unlimited support and encouragement. I would also like to mention the great opportunities I have had as one of his graduate students to work with industry and government software engineers, designers, and team leaders to solve real world problems. Such as our work with Canada Border Service Agency CBSA, Trend Micro, and the chance to volunteer in security related projects, such as the IBM Watson for Cyber Security project. That gave me the fascinating opportunity to witness and participate in the sharing and development of ideas which evolved into real world implementations, and experienced real world problems decomposed back into their constituent theoretical elements. All of this would not have been possible without Carlisle's support. To the awesome people in Carlisle Adam's research group; David Bissessar, Maryam Hezaveh, Ali Noman, Xiaomei Zhang, Mike Wakim, Alain Tambay and Dr.David Knox; iv I'm grateful for the wonderful and challenging times we went through together while solving problems and achieving milestones. To Michael Mann from Wireshark, for taking the time to code for Tor Dissector, and leaving useful comments and suggestions for integrating Tor Dissector with Wireshark plug-ins. To Nick Mathewson from the Tor project for answering my questions about the baseline code for Tor. To Ian Goldberg's research group at the University of Waterloo, for having me in a workshop discussing website fingerprinting attacks. To Marc Juarez for gracefully sharing their datasets and code with us. To all my friends, particularly Kenniy Olorunnimbe, Riyas Valiya, Dela De Yongester; thank you so much for always being there when I needed you. To my brother "Rfytzy" Saad, who came with me to Canada and showed me how a person can play the role of a whole family; I love you and I will be always grateful for you. To my family, thank you for always loving me unconditionally. To the "Anonymous" who doesn't want to be named; thank you so much for everything! v Dedication In loving memory of Hammed Rahal Alshammari and Fayez Hammed Alshammari. You are gone, but never forgotten. Father and Brother, I will always love you. vi Contents Acronyms xi I INTRODUCTION 1 1 Introduction 2 1.1 Thesis Hypothesis . .4 1.2 Thesis Contribution . .5 1.3 Thesis Outline . .6 II BACKGROUND AND LITERATURE REVIEW 8 2 Private Web Browsing over Tor 9 2.1 Web Browsing . 10 2.1.1 Web Page Loading . 10 2.1.2 Web Pages Types . 12 2.2 Web Browsing Privacy Concerns . 13 2.3 Tor....................................... 14 2.3.1 Tor Architecture . 14 2.3.2 Tor Protocol . 16 vi 2.3.3 Tor Dissector . 19 2.4 Web Browsing over Tor . 20 2.4.1 Tor Browser . 21 2.4.2 The Active Content Setting in Tor Browser . 21 2.5 Summary . 22 3 Website Fingerprinting 24 3.1 Website Fingerprinting Threat Model . 25 3.2 Website Fingerprinting Procedure . 26 3.3 Website Fingerprinting Survey . 27 3.3.1 Website Fingerprinting Attacks . 29 3.3.2 Website Fingerprinting Defenses . 33 3.3.3 Practicality of Website Fingerprinting Attacks . 35 3.4 Summary . 45 III ADDRESSING THE PROBLEM 47 4 Experimental Design 48 4.1 Prior Work Datasets . 50 4.2 Our Web Pages List . 52 4.2.1 Web Pages Selection . 53 4.2.2 Web Pages Cleaning . 54 4.2.3 Web Pages Categorization . 55 4.3 Data Collection . 56 4.3.1 Software and Libraries . 57 4.3.2 Data Generator Script . 60 vii 4.4 Traffic Pre-Processing . 62 4.5 Features Extraction . 62 4.6 Classification . 63 4.7 Summary . 64 5 Evaluation and Experimental Results 65 5.1 Evaluation Metric . 65 5.2 Evaluation Results . 66 5.3 Comparison with the existing works . 68 5.4 Limitations . 68 5.5 Summary . 69 6 Conclusions 71 6.1 Future Work . 73 A CensoredURLs List 75 viii List of Tables 4.1 Summary of the datasets used in the WF attacks against Tor. The "X" indicates not available and the "X" indicates available while the "?" indi- cates the authors didn't specify . 51 4.2 CensoredDynaimcURLs list . 57 5.1 The Accuracy results for CensoredDynaimcURLs (%) . 67 ix List of Figures 2.1 Web Page Request Flow Example . 11 2.2 Tor Network Architecture . 15 2.3 Tor Network Architecture [68] . 18 2.4 Tor Dissector . 19 3.1 Website Fingerprinting Attacks Scenario . 26 3.2 Website Fingerprinting Procedure . 27 3.3 Website Fingerprinting Survey . 29 4.1 Traffic Generator Components . 58 x Acronyms HMM Hidden Markov Model HTML Hyper Text Markup Language HTTP Hyper Text Transfer Protocol HTTP Obfuscation HTTPOS IP Internet Protocol ISP Internet Service Provider JAP JonDonym Anonymous Proxy K-NN k Nearest Neighbor ONI OpenNet Initiative OP Onion Proxy OR Onion Router OSAD Optimal String Alignment Distance PET Privacy Enhancing Technology PII Personally Identifiable Information xi RBF Radial Basis Function RWB Reporters Without Borders SSL Secure Socket Layer SVM Support Vector Machine TCP Transport Control Protocol TLS Transport Layer Security Tor The Onion Router UNESCO United Nations Educational, Scientific and Cultural Organization URL Uniform Resource Locator VOIP Voice-over-IP WF Website Fingerprinting xii Part I INTRODUCTION 1 Chapter 1 Introduction \My computer was arrested before I was." Syrian Activist [51] The Internet, specifically, the web, has become a major platform for freedom of opinion and expression [21] as it enables individuals who have access to it to seek, receive and impart opinions, ideas and information of all forms, instantly and inexpensively across the globe [21]. It was built fundamentally without any regard for protecting the privacy of its users. The absence of personal privacy protections built into the Internet and the constantly increasing threat of unlawful data tracking, censorship, and surveillance of web users' lives, force individuals around the world to seek and rely on privacy and anonymity technologies that can be used with the web to compensate for its lack of security and safety. Anonymity technologies have appeared as a potential good candidate to enhance people's human rights over the Internet so they could seek, receive and impart information without risk of disclosure, tracking and surveillance as mentioned in the United Nations Educational, Scientific and Cultural Organization (UNESCO) report [21] on the promotion and protection of the right to freedom of opinion and expression.