Novel Cryptographic Primitives and Protocols for Censorship Resistance
Total Page:16
File Type:pdf, Size:1020Kb
Portland State University PDXScholar Dissertations and Theses Dissertations and Theses Summer 7-24-2015 Novel Cryptographic Primitives and Protocols for Censorship Resistance Kevin Patrick Dyer Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Information Security Commons Let us know how access to this document benefits ou.y Recommended Citation Dyer, Kevin Patrick, "Novel Cryptographic Primitives and Protocols for Censorship Resistance" (2015). Dissertations and Theses. Paper 2489. https://doi.org/10.15760/etd.2486 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Novel Cryptographic Primitives and Protocols for Censorship Resistance by Kevin Patrick Dyer A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Dissertation Committee: Thomas Shrimpton, Chair Charles Wright Wu-chang Feng John Caughman IV Portland State University 2015 Abstract Internet users rely on the availability of websites and digital services to engage in political discussions, report on newsworthy events in real-time, watch videos, etc. However, sometimes those who control networks, such as governments, censor certain websites, block specific applications or throttle encrypted traffic. Understandably, when users are faced with egregious censorship, where certain websites or applications are banned, they seek reliable and efficient means to circumvent such blocks. This tension is evident in countries such as a Iran and China, where the Internet censorship infrastructure is pervasive and continues to increase in scope and effectiveness. An arms race is unfolding with two competing threads of research: (1) network operators’ ability to classify traffic and subsequently enforce policies and (2) network users’ ability to control how network operators classify their traffic. Our goal is to un- derstand and progress the state-of-the-art for both sides. First, we present novel traf- fic analysis attacks against encrypted communications. We show that state-of-the-art cryptographic protocols leak private information about users’ communications, such as the websites they visit, applications they use, or languages used for communica- tions. Then, we investigate means to mitigate these privacy-compromising attacks. Towards this, we present a toolkit of cryptographic primitives and protocols that simultaneously (1) achieve traditional notions of cryptographic security, and (2) en- able users to conceal information about their communications, such as the protocols used or websites visited. We demonstrate the utility of these primitives and proto- cols in a variety of real-world settings. As a primary use case, we show that these new primitives and protocols protect network communications and bypass policies of state-of-the-art hardware-based and software-based network monitoring devices. i Dedication To my brilliant and beautiful wife Susannah, who made this work possible, enjoyable and meaningful. ii Acknowledgements This work is a summary of published [47, 48, 36] and unpublished results. The work in Section 3 was done with Coull and published [36] in ACM SIGCOMM Com- puter Communication Review in 2014. The work in Section 4 was presented [47] at IEEE Security and Privacy in 2012, and done in collaboration with Coull, Ristenpart and Shrimpton. The work in Section 6 was presented [48] at ACM Conference on Computer and Communications Security in 2013, and was also in collaboration with Coull, Ristenpart and Shrimpton. Finally, the work in Section 7 was presented [49] at USENIX Security 2015 and done in collaboration with Coull and Shrimpton. This dissertation was possible because of the patience, guidance and funding pro- vided by my advisor Thomas Shrimpton. The quality, consistency, and vision for this work was immeasurably elevated by my coauthors Scott Coull and Thomas Risten- part. I am grateful for the patience of the engineers I collaborated with at Google, The Tor Project, and Lantern, for supporting software deployments of the ideas in this document. In addition, this work was made possible by a number of generous sponsors, including: Eric Schmidt, Fariborz Maseeh, The NLnet Foundation, The National Science Foundation, and Portland State University. What’s more, while pursuing the ideas in this document I had the pleasure of discussing it with, and getting feedback from many wonderful people, including: Will Landecker, Lucas Dixon, Roger Dingledine, George Kadianakis, David Fifield, Shan- jian Li, Trevor Johnston, Akshay Dua, Tariq Elahi, Tim Chevalier, Charles Wright, Justin Reidy, R. Seth Terashima, and Soeren Pirk. iii Table of Contents Abstract i Dedication ii Acknowledgments iii List of Tables vii List of Figures ix 1 Introduction 1 1.1 Censorship Circumvention . 1 2 Overview: Traffic Analysis of Encrypted Communications 5 2.1 Attacks on Encrypted Messaging Services . 5 2.2 Website Fingerprinting Attacks . 6 3 Traffic Analysis of Encrypted Message Services 11 3.1 iMessage Overview . 12 3.2 Analyzing Information Leakage . 14 3.2.1 Data and Methodology . 14 3.2.2 Operating System . 16 3.2.3 User Actions . 18 3.2.4 Message Attributes . 20 3.3 Beyond iMessage . 22 4 Website Fingerprinting Countermeasures 26 4.1 Experimental Methodology . 29 4.2 Traffic Classifiers . 31 4.2.1 Liberatore and Levine Classifier . 31 4.2.2 Herrmann et al. Classifier . 32 4.2.3 Panchenko et al. Classifier . 33 4.3 Countermeasures . 33 4.3.1 Type-1: SSH/TLS/IPSec-Motivated Countermeasures . 34 4.3.2 Type-2: Other Padding-based Countermeasures . 35 4.3.3 Type-3: Distribution-based Countermeasures . 36 4.3.4 Overhead . 38 4.4 Existing Countermeasures versus Existing Classifiers . 39 4.4.1 Comparing the Datasets . 39 4.4.2 Comparison of Classifiers . 41 4.4.3 Comparison of Countermeasures . 41 iv 4.5 Exploring Coarse Features . 44 4.5.1 Total Time . 45 4.5.2 Total Per-Direction Bandwidth . 46 4.6 Variable n-gram . 47 4.6.1 Combining Coarse Features: the VNG++ Classifier . 49 4.6.2 Discussion . 50 4.7 BuFLO: Buffered Fixed-Length Obfuscator . 51 4.7.1 BuFLO Description . 52 4.7.2 Experiments . 53 4.7.3 Observations about BuFLO . 54 4.8 Concluding Discussion . 54 5 Overview: Internet Censorship and Censorship Circumvention 57 5.1 Randomization . 58 5.2 Mimicry . 59 5.3 Tunneling . 60 6 Censorship Circumvention with Format-Transforming Encryption 62 6.1 Modern DPI Systems . 64 6.2 Format-Transforming Encryption . 68 6.2.1 FTE via Encrypt-then-Unrank . 69 6.2.2 FTE Record Layer . 72 6.3 Protocol Misclassification . 75 6.3.1 DPI-Extracted Regular Expressions . 77 6.3.2 Manually-Generated Regular Expressions . 79 6.3.3 Automatically-Generated Regular Expressions . 80 6.4 Performance . 83 6.5 Censorship Circumvention . 87 6.6 Concluding Thoughts . 91 7 Marionette: A Unified Framework for Censorship Circumvention 92 7.1 Models and actions . 95 7.2 Templates and Template Grammars . 99 7.3 Proxy Architecture . 101 7.4 Implementation . 103 7.5 Record Layer . 104 7.6 Plugins . 106 7.7 The Marionette DSL . 107 7.8 Case Studies . 108 7.8.1 Regex-Based DPI . 110 7.8.2 Protocol-Compliance . 110 7.8.3 Proxy Traversal . 113 7.8.4 Traffic Analysis Resistance . 116 v 7.9 Conclusion . 118 8 Concluding Thoughts 122 References 124 Appendices 143 Appendix A BuFLO Countermeasure 143 Appendix B Experimental Results 144 Appendix C Algorithms for Ranking and Unranking a Regular Lan- guage 146 Appendix D Marionette POP3 Format 147 Appendix E Marionette FTP Format 148 vi List of Tables 1 Summary of attack results for Apple iMessage. 12 2 Language statistics for Tatoeba dataset compared to Battestini et al. text messaging study. 16 3 Confusion matrix for message type classification. 19 4 Confusion matrix for language classification. 22 5 Summary of attacks evaluated in our work. The Classifier column in- dicates the classifier used: naïve Bayes (NB), multinomial naïve Bayes (MNB) or support vector machine (SVM.) The Features Consid- ered column indicates the features used by the classifier. The k = 128 and k = 775 columns indicate the classifier accuracy for a privacy set of size k. ................................. 28 6 Bandwidth overhead of evaluated countermeasures calculated on Lib- eratore and Levine (LL) and Herrmann et al. (H) datasets. 37 7 The lowest average accuracy for each countermeasure class against LL, H, and P classifiers using the Hermann dataset. Random guessing yields 50% (k = 2) or 0.7% (k = 128) accuracy. 38 8 Statistics illustrating the presence of degenerate or erroneous traces in the Liberatore and Levine and Hermann datasets. 41 9 Accuracies (%) of P, P-NB, and VNG++ classifiers at k = 128. 48 10 A summary of blacklist strategies employed by six countries. Each citation references an empirical study that confirms that the blacklist strategy was employed for an extended period of time within the coun- try. A “-" indicates there is no published evidence to support that the blacklist strategy has been deployed in the specific country. 58 11 A summary of proposed censorship-circumvention solutions. Strategy describes the underlying strategy used for the system. Low latency indicates if the system can be used for tasks such as web browsing. Used by... listed which systems the solution has actually been de- ployed in. 58 12 Summary of evaluated DPI systems. Type indicates the kind of DPI engine used. Multi-stage pipelines chain together several passes over packet contents. Classifier complexity is the number of DFA states used for regular expressions or total lines non-whitespace/non-comment C/C++ code. 65 13 Misclassification rates for the twelve DPI-Extracted FTE formats against the six classifiers in our evaluation testbed. 78 14 Misclassification rates for the manually-generated and automatically- generated FTE formats against all six classifiers.