Traffic Analysis of Anonymity Systems Ryan Craven Clemson University, [email protected]

Clemson University TigerPrints All Theses Theses 5-2010 Traffic Analysis of Anonymity Systems Ryan Craven Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Part of the Computer Engineering Commons Recommended Citation Craven, Ryan, "Traffic Analysis of Anonymity Systems" (2010). All Theses. 837. https://tigerprints.clemson.edu/all_theses/837 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Traffic Analysis of Anonymity Systems A Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Electrical Engineering by Ryan Michael Craven May 2010 Accepted by: Dr. Richard R. Brooks, Committee Chair Dr. Timothy Burg Dr. Christopher Griffin Abstract This research applies statistical methods in pattern recognition to test the privacy capabil- ities of a very popular anonymity tool used on the Internet known as Tor. Using a recently developed algorithm known as Causal State Splitting and Reconstruction (CSSR), we can create hidden Markov models of network processes proxied through Tor. In con- trast to other techniques, our CSSR extensions create a minimum entropy model without any prior knowledge of the underlying state structure. The inter-packet time delays of the network process, preserved by Tor, can be symbolized into ranges and used to construct the models. After the construction of training models, detection is performed using Confidence Intervals. New test data can be fed through a model to determine the intervals and estimate how well the data matches the model. If a match is found, the state sequence, or path, can be used to uniquely describe the data with respect to the model. It is by comparing these paths that Tor users can be identified. Packet data from any two computers using the Tor network can be matched to a model and their state sequences can be compared to give a statistical likelihood that the two systems are actually communicating together over Tor. We perform experiments on a private Tor network to validate this. Results showed that communicating systems could be identified with a 95% accuracy in our test scenario. This attack differs from previous maximum likelihood-based approaches in that it can be performed between just two computers using Tor. The adversary does not need to be a global observer. The attack can also be performed in real-time provided that a matching model had already been constructed. ii Dedication This thesis is dedicated to my family, especially my wife Heather. I am forever grateful for the unwavering love and support they have given me throughout my life. iii Acknowledgments Most importantly, I would like to thank my advisor, Dr. Richard R. Brooks. The creation of this document and the underlying research would not have been possible without your capable guidance and direction. Our frequent discussions this past year and a half have not just helped me complete this work, but have imparted a deeper understanding of the challenging issues that are faced in security and privacy. I would like to thank Dr. Timothy Burg and Dr. Christopher Griffin for serving on my committee. I would also like to thank Dr. Burg for helping me on my path to graduate school. It was during my time working on the EE senior project and our creative inquiry undergraduate research that I made the decision to apply for entry into the Master’s program. During my work on this thesis, I have also had the pleasure of working with some very intelligent and helpful students in our research group. My interactions with them have made me a better researcher and their contributions have been invaluable. In particular, a large degree of appreciation is owed to Jason Schwier, Hari Bhanu, and Chen Lu. I would also like to acknowledge my employer, the Space and Naval Warfare Systems Center in Charleston, SC. Their flexibility and support for my continued education have set them apart from other organizations. My time in graduate school would not have been possible without the monetary assistance of the Holcombe Department of Electrical and Computer Engineering. I received a teaching assis- tantship and the Mr. & Mrs. Alan Griffith Stanford fellowship for all four semesters of my Master’s program. I was humbled by and will forever appreciate their support. Finally, I would like to acknowledge that this material is based upon work supported by, or in part by, the Air Force Office of Scientific Research contract/grant number FA9550-09-1-0173. iv Table of Contents Title Page ............................................ i Abstract ............................................. ii Dedication ............................................ iii Acknowledgments ....................................... iv List of Tables ..........................................vii List of Figures ..........................................viii 1 Introduction ......................................... 1 1.1 Traffic Analysis ....................................... 2 1.2 Anonymity Systems .................................... 4 1.3 Research Questions ..................................... 5 1.4 Organization ........................................ 6 2 Background ......................................... 8 2.1 Tor .............................................. 8 2.2 Attacks on Tor ....................................... 13 2.3 Pattern Recognition Tools ................................. 15 2.4 Summary .......................................... 18 3 Model Construction ....................................20 3.1 Process Overview ...................................... 20 3.2 Data Collection ....................................... 21 3.3 Symbolization ........................................ 22 3.4 CSSR ............................................ 25 3.5 Proof-of-Concept ...................................... 28 3.6 Model Confidence ...................................... 30 3.7 Pruning Experiment .................................... 32 3.8 Summary .......................................... 37 4 Detection ...........................................38 4.1 Confidence Intervals .................................... 38 4.2 Protocol Detection ..................................... 40 4.3 Viterbi Path ......................................... 42 4.4 Path Matching Experiment ................................ 44 4.5 Flow Correlation ...................................... 45 4.6 Summary .......................................... 47 v 5 An Illustrative Example ..................................48 5.1 Experimental Setup .................................... 48 5.2 Data Capture ........................................ 50 5.3 Symbolization ........................................ 51 5.4 Training Model Construction ............................... 52 5.5 Pruning ........................................... 53 5.6 Detection Results ...................................... 54 5.7 Comparison to Flow Correlation ............................. 61 5.8 Summary .......................................... 63 6 Conclusions .........................................64 6.1 Concluding Summary ................................... 64 6.2 Recommendations for Further Research ......................... 65 Appendices ...........................................67 A How to Configure a Private Tor Network ......................... 68 Bibliography ...........................................73 vi List of Tables 3.1 Deltas are used to ignore constant latencies ....................... 22 3.2 Deltas do not handle variable latencies .......................... 23 3.3 Example lookup table ................................... 23 3.4 Symbol-to-delay translation table for pruning experiment ............... 32 4.1 Results for path matching experiment .......................... 45 5.1 Symbol-to-delay translation tables for ping-pong experiment .............. 50 5.2 Ranges for all ten symbols of ping pong experiment ................... 52 5.3 Detection rates for system pairs using original model .................. 57 5.4 Detection rates for system pairs using reconstructed model ............... 58 5.5 Detection rates for system pairs using reconstructed model pruned at β = 0.0005 . 58 5.6 Detection rates for system pairs using reconstructed model pruned at β = 0.013 . 59 5.7 Rejection rates for .10 – .13 ................................ 60 5.8 Rejection rates for .11 – .12 ................................ 60 5.9 True positive rates for .10 – .13 .............................. 62 5.10 True positive rates for .11 – .12 .............................. 62 5.11 Method comparison of detection rates for .10 – .13 ................... 63 5.12 Method comparison of detection rates for .11 – .12 ................... 63 vii List of Figures 1.1 Packet captured from an encrypted SSH session ..................... 3 2.1 Preparation for data transfer ............................... 10 2.2 Onion packet entering Tor circuit ............................. 11 2.3 Outer layer of onion packet is peeled away and next layer is decrypted ........ 11 2.4 Remaining onion packet going to second Tor relay in circuit .............. 12 2.5 Exit relay forwards original packet to destination .................... 12 2.6 Example of a hidden Markov model created using CSSR ................ 15 3.1 Flowchart summarizing the model construction process ................

Load more