Traffic Analysis of Anonymity Systems Ryan Craven Clemson University, [email protected]

Total Page:16

File Type:pdf, Size:1020Kb

Traffic Analysis of Anonymity Systems Ryan Craven Clemson University, Crm810@Gmail.Com Clemson University TigerPrints All Theses Theses 5-2010 Traffic Analysis of Anonymity Systems Ryan Craven Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Part of the Computer Engineering Commons Recommended Citation Craven, Ryan, "Traffic Analysis of Anonymity Systems" (2010). All Theses. 837. https://tigerprints.clemson.edu/all_theses/837 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Traffic Analysis of Anonymity Systems A Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Electrical Engineering by Ryan Michael Craven May 2010 Accepted by: Dr. Richard R. Brooks, Committee Chair Dr. Timothy Burg Dr. Christopher Griffin Abstract This research applies statistical methods in pattern recognition to test the privacy capabil- ities of a very popular anonymity tool used on the Internet known as Tor. Using a recently developed algorithm known as Causal State Splitting and Reconstruction (CSSR), we can create hidden Markov models of network processes proxied through Tor. In con- trast to other techniques, our CSSR extensions create a minimum entropy model without any prior knowledge of the underlying state structure. The inter-packet time delays of the network process, preserved by Tor, can be symbolized into ranges and used to construct the models. After the construction of training models, detection is performed using Confidence Intervals. New test data can be fed through a model to determine the intervals and estimate how well the data matches the model. If a match is found, the state sequence, or path, can be used to uniquely describe the data with respect to the model. It is by comparing these paths that Tor users can be identified. Packet data from any two computers using the Tor network can be matched to a model and their state sequences can be compared to give a statistical likelihood that the two systems are actually communicating together over Tor. We perform experiments on a private Tor network to validate this. Results showed that communicating systems could be identified with a 95% accuracy in our test scenario. This attack differs from previous maximum likelihood-based approaches in that it can be performed between just two computers using Tor. The adversary does not need to be a global observer. The attack can also be performed in real-time provided that a matching model had already been constructed. ii Dedication This thesis is dedicated to my family, especially my wife Heather. I am forever grateful for the unwavering love and support they have given me throughout my life. iii Acknowledgments Most importantly, I would like to thank my advisor, Dr. Richard R. Brooks. The creation of this document and the underlying research would not have been possible without your capable guidance and direction. Our frequent discussions this past year and a half have not just helped me complete this work, but have imparted a deeper understanding of the challenging issues that are faced in security and privacy. I would like to thank Dr. Timothy Burg and Dr. Christopher Griffin for serving on my committee. I would also like to thank Dr. Burg for helping me on my path to graduate school. It was during my time working on the EE senior project and our creative inquiry undergraduate research that I made the decision to apply for entry into the Master’s program. During my work on this thesis, I have also had the pleasure of working with some very intelligent and helpful students in our research group. My interactions with them have made me a better researcher and their contributions have been invaluable. In particular, a large degree of appreciation is owed to Jason Schwier, Hari Bhanu, and Chen Lu. I would also like to acknowledge my employer, the Space and Naval Warfare Systems Center in Charleston, SC. Their flexibility and support for my continued education have set them apart from other organizations. My time in graduate school would not have been possible without the monetary assistance of the Holcombe Department of Electrical and Computer Engineering. I received a teaching assis- tantship and the Mr. & Mrs. Alan Griffith Stanford fellowship for all four semesters of my Master’s program. I was humbled by and will forever appreciate their support. Finally, I would like to acknowledge that this material is based upon work supported by, or in part by, the Air Force Office of Scientific Research contract/grant number FA9550-09-1-0173. iv Table of Contents Title Page ............................................ i Abstract ............................................. ii Dedication ............................................ iii Acknowledgments ....................................... iv List of Tables ..........................................vii List of Figures ..........................................viii 1 Introduction ......................................... 1 1.1 Traffic Analysis ....................................... 2 1.2 Anonymity Systems .................................... 4 1.3 Research Questions ..................................... 5 1.4 Organization ........................................ 6 2 Background ......................................... 8 2.1 Tor .............................................. 8 2.2 Attacks on Tor ....................................... 13 2.3 Pattern Recognition Tools ................................. 15 2.4 Summary .......................................... 18 3 Model Construction ....................................20 3.1 Process Overview ...................................... 20 3.2 Data Collection ....................................... 21 3.3 Symbolization ........................................ 22 3.4 CSSR ............................................ 25 3.5 Proof-of-Concept ...................................... 28 3.6 Model Confidence ...................................... 30 3.7 Pruning Experiment .................................... 32 3.8 Summary .......................................... 37 4 Detection ...........................................38 4.1 Confidence Intervals .................................... 38 4.2 Protocol Detection ..................................... 40 4.3 Viterbi Path ......................................... 42 4.4 Path Matching Experiment ................................ 44 4.5 Flow Correlation ...................................... 45 4.6 Summary .......................................... 47 v 5 An Illustrative Example ..................................48 5.1 Experimental Setup .................................... 48 5.2 Data Capture ........................................ 50 5.3 Symbolization ........................................ 51 5.4 Training Model Construction ............................... 52 5.5 Pruning ........................................... 53 5.6 Detection Results ...................................... 54 5.7 Comparison to Flow Correlation ............................. 61 5.8 Summary .......................................... 63 6 Conclusions .........................................64 6.1 Concluding Summary ................................... 64 6.2 Recommendations for Further Research ......................... 65 Appendices ...........................................67 A How to Configure a Private Tor Network ......................... 68 Bibliography ...........................................73 vi List of Tables 3.1 Deltas are used to ignore constant latencies ....................... 22 3.2 Deltas do not handle variable latencies .......................... 23 3.3 Example lookup table ................................... 23 3.4 Symbol-to-delay translation table for pruning experiment ............... 32 4.1 Results for path matching experiment .......................... 45 5.1 Symbol-to-delay translation tables for ping-pong experiment .............. 50 5.2 Ranges for all ten symbols of ping pong experiment ................... 52 5.3 Detection rates for system pairs using original model .................. 57 5.4 Detection rates for system pairs using reconstructed model ............... 58 5.5 Detection rates for system pairs using reconstructed model pruned at β = 0.0005 . 58 5.6 Detection rates for system pairs using reconstructed model pruned at β = 0.013 . 59 5.7 Rejection rates for .10 – .13 ................................ 60 5.8 Rejection rates for .11 – .12 ................................ 60 5.9 True positive rates for .10 – .13 .............................. 62 5.10 True positive rates for .11 – .12 .............................. 62 5.11 Method comparison of detection rates for .10 – .13 ................... 63 5.12 Method comparison of detection rates for .11 – .12 ................... 63 vii List of Figures 1.1 Packet captured from an encrypted SSH session ..................... 3 2.1 Preparation for data transfer ............................... 10 2.2 Onion packet entering Tor circuit ............................. 11 2.3 Outer layer of onion packet is peeled away and next layer is decrypted ........ 11 2.4 Remaining onion packet going to second Tor relay in circuit .............. 12 2.5 Exit relay forwards original packet to destination .................... 12 2.6 Example of a hidden Markov model created using CSSR ................ 15 3.1 Flowchart summarizing the model construction process ................
Recommended publications
  • Poster: Introducing Massbrowser: a Censorship Circumvention System Run by the Masses
    Poster: Introducing MassBrowser: A Censorship Circumvention System Run by the Masses Milad Nasr∗, Anonymous∗, and Amir Houmansadr University of Massachusetts Amherst fmilad,[email protected] ∗Equal contribution Abstract—We will present a new censorship circumvention sys- side the censorship regions, which relay the Internet traffic tem, currently being developed in our group. The new system of the censored users. This includes systems like Tor, VPNs, is called MassBrowser, and combines several techniques from Psiphon, etc. Unfortunately, such circumvention systems are state-of-the-art censorship studies to design a hard-to-block, easily blocked by the censors by enumerating their limited practical censorship circumvention system. MassBrowser is a set of proxy server IP addresses [14]. (2) Costly to operate: one-hop proxy system where the proxies are volunteer Internet To resist proxy blocking by the censors, recent circumven- users in the free world. The power of MassBrowser comes from tion systems have started to deploy the proxies on shared-IP the large number of volunteer proxies who frequently change platforms such as CDNs, App Engines, and Cloud Storage, their IP addresses as the volunteer users move to different a technique broadly referred to as domain fronting [3]. networks. To get a large number of volunteer proxies, we This mechanism, however, is prohibitively expensive [11] provide the volunteers the control over how their computers to operate for large scales of users. (3) Poor QoS: Proxy- are used by the censored users. Particularly, the volunteer based circumvention systems like Tor and it’s variants suffer users can decide what websites they will proxy for censored from low quality of service (e.g., high latencies and low users, and how much bandwidth they will allocate.
    [Show full text]
  • Privacy and Cyber Security Emphasizing Privacy Protection in Cyber Security Activities
    Privacy and Cyber Security Emphasizing privacy protection in cyber security activities December 2014 Table of Contents Abstract ................................................................................................................................................................... 1 Introduction ............................................................................................................................................................ 1 1. Cyber Security Challenges ................................................................................................................................. 2 2. Cyber Security Policy Developments ................................................................................................................. 5 3. Conclusion: Emphasizing privacy protection in cyber security activities .......................................................... 7 ________________________________________________________________________________________________________ 30 Victoria Street – 1st Floor, Gatineau, QC K1A 1H3 • Toll-free: 1-800-282-1376 • Fax: (819) 994-5424 • TDD (819) 994-6591 www.priv.gc.ca • Follow us on Twitter: @privacyprivee Abstract This research report examines the common interests and tensions between privacy and cyber security. It explores how challenges for cyber security are also challenges for privacy and data protection, considers how cyber security policy can affect privacy, and notes how cyberspace governance and security is a global issue. Finally, it sets out key policy
    [Show full text]
  • Digital Privacy Policy
    Digital Privacy Policy This Internet Privacy Policy explains how we may collect information from you when you visit our web site or when you use our online financial services. We recognize the importance our customers place on the privacy and security of their personal information. Our goal is to protect your personal information in every way that we interact with you, whether it's on the telephone, in our lobby, at one of our ATMs, or on the Internet. We think it is important for you to be informed of the policies, procedures, and security measures that we have in place to safeguard your personal and confidential information. With that in mind, we have developed this Internet Privacy Policy to help you to understand the steps we take to protect your personal information when you utilize our online financial services. Below are several definitions of terms used within this policy: Customer Information - Customer Information refers to personally identifiable information about a consumer, customer or former customer of this Institution. Internet Protocol (IP) Address - an IP address is a unique address that devices use in order to identify and communicate with each other on a computer network. An IP address can be thought of as a street address or a phone number for a computer or other network device on the Internet. Just as each street address and phone number uniquely identifies a building or telephone, an IP address can uniquely identify a specific computer or other network device on a network. We may use IP addresses to monitor login activity and for identification purposes when necessary for security investigations.
    [Show full text]
  • In Computer Networks, A
    Practical No.1 Date:- Title:- Installation of Proxy-Server Windows Server 2003 What is proxy server? In computer networks, a proxy server is a server (a computer system or an application program) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource, available from a different server. The proxy server evaluates the request according to its filtering rules. For example, it may filter traffic by IP address or protocol. If the request is validated by the filter, the proxy provides the resource by connecting to the relevant server and requesting the service on behalf of the client. A proxy server may optionally alter the client's request or the server's response, and sometimes it may serve the request wit hout contacting the specified server. In this case, it 'caches' responses from the remote server, and returns subsequent requests for the same content directly . Most proxies are a web proxy, allowing access to content on the World Wide Web. A proxy server has a large variety of potential purposes, including: To keep machines behind it anonymous (mainly for security).[1] To speed up access to resources (using caching). Web proxies are commonly used to cache web pages from a web server.[2] To apply access policy to network services or content, e.g. to block undesired sites. To log / audit usage, i.e. to provide company employee Internet usage reporting. To bypass security/ parental controls. To scan transmitted content for malware before delivery.
    [Show full text]
  • Threat Modeling and Circumvention of Internet Censorship by David Fifield
    Threat modeling and circumvention of Internet censorship By David Fifield A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor J.D. Tygar, Chair Professor Deirdre Mulligan Professor Vern Paxson Fall 2017 1 Abstract Threat modeling and circumvention of Internet censorship by David Fifield Doctor of Philosophy in Computer Science University of California, Berkeley Professor J.D. Tygar, Chair Research on Internet censorship is hampered by poor models of censor behavior. Censor models guide the development of circumvention systems, so it is important to get them right. A censor model should be understood not just as a set of capabilities|such as the ability to monitor network traffic—but as a set of priorities constrained by resource limitations. My research addresses the twin themes of modeling and circumvention. With a grounding in empirical research, I build up an abstract model of the circumvention problem and examine how to adapt it to concrete censorship challenges. I describe the results of experiments on censors that probe their strengths and weaknesses; specifically, on the subject of active probing to discover proxy servers, and on delays in their reaction to changes in circumvention. I present two circumvention designs: domain fronting, which derives its resistance to blocking from the censor's reluctance to block other useful services; and Snowflake, based on quickly changing peer-to-peer proxy servers. I hope to change the perception that the circumvention problem is a cat-and-mouse game that affords only incremental and temporary advancements.
    [Show full text]
  • Is the Market for Digital Privacy a Failure?1
    Is the Market for Digital Privacy a Failure?1 Caleb S. Fuller2 Abstract Why do many digital firms rely on collecting consumer information–a practice that survey evidence shows is widely disliked? Why don’t they, instead, charge a fee that would protect privacy? This paper empirically adjudicates between two competing hypotheses. The first holds that firms pursue this strategy because consumers are ill-informed and thus susceptible to exploitation. The second holds that this strategy reasonably approximates consumer preferences. By means of survey, I test a.) the extent of information asymmetry in digital markets, b.) consumers’ valuation of privacy, and c.) whether government failure contributes to consumer mistrust of information collection. My results indicate that a.) the extent of information asymmetry is minimal, b.) there is significant divergence between “notional” and “real” demand for privacy and c.) that government contributes to consumer distrust of information collection by private firms. Significantly, almost 82% of Google users are unwilling to pay anything for increased digital privacy. JEL-Classification: D23, K29, Z18 Keywords: privacy paradox, digital privacy, survey, market failure 1 I wish to thank Alessandro Acquisti, Peter Leeson, Chris Coyne, Peter Boettke, David Lucas, Noah Gould, and Nicholas Freiling for helpful suggestions. All errors are my own. I am also indebted to the Mercatus Center for providing funding for the survey conducted by Haven Insights LLC. 2 Assistant professor of economics, Grove City College, Email: 1 INTRODUCTION Google’s motto is “Don’t Be Evil.” But the fact that the company surreptitiously collects the information of over one billion individuals annually leads some to question whether the firm’s business model runs afoul of its dictum (Hoofnagle 2009).
    [Show full text]
  • State of the Art in Lightweight Symmetric Cryptography
    State of the Art in Lightweight Symmetric Cryptography Alex Biryukov1 and Léo Perrin2 1 SnT, CSC, University of Luxembourg, [email protected] 2 SnT, University of Luxembourg, [email protected] Abstract. Lightweight cryptography has been one of the “hot topics” in symmetric cryptography in the recent years. A huge number of lightweight algorithms have been published, standardized and/or used in commercial products. In this paper, we discuss the different implementation constraints that a “lightweight” algorithm is usually designed to satisfy. We also present an extensive survey of all lightweight symmetric primitives we are aware of. It covers designs from the academic community, from government agencies and proprietary algorithms which were reverse-engineered or leaked. Relevant national (nist...) and international (iso/iec...) standards are listed. We then discuss some trends we identified in the design of lightweight algorithms, namely the designers’ preference for arx-based and bitsliced-S-Box-based designs and simple key schedules. Finally, we argue that lightweight cryptography is too large a field and that it should be split into two related but distinct areas: ultra-lightweight and IoT cryptography. The former deals only with the smallest of devices for which a lower security level may be justified by the very harsh design constraints. The latter corresponds to low-power embedded processors for which the Aes and modern hash function are costly but which have to provide a high level security due to their greater connectivity. Keywords: Lightweight cryptography · Ultra-Lightweight · IoT · Internet of Things · SoK · Survey · Standards · Industry 1 Introduction The Internet of Things (IoT) is one of the foremost buzzwords in computer science and information technology at the time of writing.
    [Show full text]
  • The Right to Privacy in the Digital Age
    The Right to Privacy in the Digital Age April 9, 2018 Dr. Keith Goldstein, Dr. Ohad Shem Tov, and Mr. Dan Prazeres Presented on behalf of Pirate Parties International Headquarters, a UN ECOSOC Consultative Member, for the Report of the High Commissioner for Human Rights Our Dystopian Present Living in modern society, we are profiled. We accept the necessity to hand over intimate details about ourselves to proper authorities and presume they will keep this information secure- only to be used under the most egregious cases with legal justifications. Parents provide governments with information about their children to obtain necessary services, such as health care. We reciprocate the forfeiture of our intimate details by accepting the fine print on every form we sign- or button we press. In doing so, we enable second-hand trading of our personal information, exponentially increasing the likelihood that our data will be utilized for illegitimate purposes. Often without our awareness or consent, detection devices track our movements, our preferences, and any information they are capable of mining from our digital existence. This data is used to manipulate us, rob from us, and engage in prejudice against us- at times legally. We are stalked by algorithms that profile all of us. This is not a dystopian outlook on the future or paranoia. This is present day reality, whereby we live in a data-driven society with ubiquitous corruption that enables a small number of individuals to transgress a destitute mass of phone and internet media users. In this paper we present a few examples from around the world of both violations of privacy and accomplishments to protect privacy in online environments.
    [Show full text]
  • How to Download Torrent Anonymously How to Download Torrent Anonymously
    how to download torrent anonymously How to download torrent anonymously. Completing the CAPTCHA proves you are a human and gives you temporary access to the web property. What can I do to prevent this in the future? If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware. If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices. Another way to prevent getting this page in the future is to use Privacy Pass. You may need to download version 2.0 now from the Chrome Web Store. Cloudflare Ray ID: 66b6c3aaaba884c8 • Your IP : 188.246.226.140 • Performance & security by Cloudflare. Download Torrents Anonymously: 6 Safe And Easy Ways. Who doesn’t want to know how to download torrents anonymously? The thing is, in order to download torrents anonymously you don’t need to have a lot of technical know-how. All you need to download torrents anonymously is some grit and a computer with an internet connection. The technology world never remains the same. In fact, new development and discoveries come to the surface of this industry every day. They also come into the attention of online users every year. Moreover, this allows us to do much more than we could do in the past, in faster and easier ways. A highly relevant aspect to mention at this stage is that: Now we can also download torrents anonymously from best torrent sites.
    [Show full text]
  • World-Wide Web Proxies
    World-Wide Web Proxies Ari Luotonen, CERN Kevin Altis, Intel April 1994 Abstract 1.0 Introduction A WWW proxy server, proxy for short, provides access to The primary use of proxies is to allow access to the Web the Web for people on closed subnets who can only access from within a firewall (Fig. 1). A proxy is a special HTTP the Internet through a firewall machine. The hypertext [HTTP] server that typically runs on a firewall machine. server developed at CERN, cern_httpd, is capable of run- The proxy waits for a request from inside the firewall, for- ning as a proxy, providing seamless external access to wards the request to the remote server outside the firewall, HTTP, Gopher, WAIS and FTP. reads the response and then sends it back to the client. cern_httpd has had gateway features for a long time, but In the usual case, the same proxy is used by all the clients only this spring they were extended to support all the within a given subnet. This makes it possible for the proxy methods in the HTTP protocol used by WWW clients. Cli- to do efficient caching of documents that are requested by ents don’t lose any functionality by going through a proxy, a number of clients. except special processing they may have done for non- native Web protocols such as Gopher and FTP. The ability to cache documents also makes proxies attrac- tive to those not inside a firewall. Setting up a proxy server A brand new feature is caching performed by the proxy, is easy, and the most popular Web client programs already resulting in shorter response times after the first document have proxy support built in.
    [Show full text]
  • Annual Privacy Report
    U.S. DEPARTMENT OF JUSTICE ANNUAL PRIVACY REPORT THE CHIEF PRIVACY AND CIVIL LIBERTIES OFFICER AND THE OFFICE OF PRIVACY AND CIVIL LIBERTIES OCTOBER 1, 2016 – SEPTEMBER 30, 2020 1 (MULTI) ANNUAL PRIVACY REPORT MESSAGE FROM THE CHIEF PRIVACY AND CIVIL LIBERTIES OFFICER I am pleased to present the Department of Justice’s (Department or DOJ) Annual Privacy Report, describing the operations and activities of the Chief Privacy and Civil Liberties Officer (CPCLO) and the Office of Privacy and Civil Liberties (OPCL), in accordance with Section 1174 of the Violence Against Women and Department of Justice Reauthorization Act of 2005. This report covers the period from October 1, 2016, through September 30, 2020. The Department’s privacy program is supported by a team of dedicated privacy professionals who strive to build a culture and understanding of privacy within the complex and diverse mission work of the Department. The work of the Department’s privacy team is evident in the care, consideration, and dialogue about privacy that is incorporated in the daily operations of the Department. During this reporting period, there has been an evolving landscape of technological development and advancement in areas such as artificial intelligence, biometrics, complex data flows, and an increase in the number of cyber security events resulting in significant impacts to the privacy of individuals. Thus, the CPCLO and OPCL have developed new policies and guidance to assist the Department with navigating these areas, some of which include the following:
    [Show full text]
  • Piracy Versus Privacy: an Analysis of Values Encoded in the Piratebrowser
    International Journal of Communication 9(2015), 818–838 1932–8036/20150005 Piracy Versus Privacy: An Analysis of Values Encoded in the PirateBrowser BALÁZS BODÓ University of Amsterdam, Institute for Information Law The Netherlands The PirateBrowser is a Web browser that uses Tor privacy-enhancing technology to circumvent nationally implemented Internet filters blocking access to The Pirate Bay. This article analyzes the possible consequences of a mass influx of copyright pirates into the privacy domain. The article addresses the effects of the uptake of strong privacy technologies by pirates on copyright enforcement and on free speech and privacy technology domains. Also discussed are the norms and values reflected in the specific design choices taken by the developers of the PirateBrowser. Keywords: piracy, privacy, Tor, privacy-enhancing technologies, policy Introduction Tor (The Onion Router), “endorsed by Egyptian activists, WikiLeaks, NSA, GCHQ, Chelsea Manning, Snowden” (Dingledine & Appelbaum, 2013), is a volunteer network of computers that relays Web traffic through itself to provide anonymous, unobserved, and uncensored access to the Internet. It has about 4,000 relays and about 1,000 exit nodes. Tor users connect to the network, and their Web traffic is channeled through the internal relays to reach its final destination through one of the exit nodes. This arrangement makes the identification and surveillance of Tor users difficult. Anonymity is promised by the difficulty of tracing the Web traffic that appears on the exit node back to the individual who initiated the traffic, as long as there is a sufficient number of internal hops in between. Protection from surveillance is granted by the fact that each link in the communication chain is encrypted.
    [Show full text]