World Libraries Towards Efficiently Sharing Large Data Volumes in Open Untrusted Environments While Preserving Privacy
Total Page:16
File Type:pdf, Size:1020Kb
World Libraries Towards Efficiently Sharing Large Data Volumes in Open Untrusted Environments while Preserving Privacy Von der Fakult¨at f¨ur Mathematik, Informatik und Naturwissenschaften der RWTH Aachen University zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigte Dissertation vorgelegt von Diplom-Informatiker Stefan Richter aus Berlin Berichter: Universit¨atsprofessor Dr. Peter Rossmanith Universit¨atsprofessor Dr. Dr. h.c. Otto Spaniol Tag der m¨undlichen Pr¨ufung: 12. November 2009 Diese Dissertation ist auf den Internetseiten der Hochschulbibliothek online verf¨ugbar. Zusammenfassung Offentliche¨ Bibliotheken erf¨ullen seit langer Zeit eine Funktion von unsch¨atz- barem Wert. Im Zeitalter digitaler Medien verlieren jedoch traditionelle Methoden der Informationsverbreitung zunehmend an Bedeutung. Diese Arbeit stellt die Frage, ob und wie Informationstechnologie genutzt werden kann, um funktionelle Alternativen zu finden, die den durch eben diese In- formationstechnologie ver¨anderten Bed¨urfnissen der Gesellschaft gerecht wer- den. Zu diesem Zweck f¨uhren wir das neue Konzept von Weltbibliotheken ein: Stark verteilte Systeme, die zur effizienten Verbreitung von großen Daten- mengen in offenen, unsicheren Umgebungen geeignet sind und dabei die Pri- vatsph¨are ihrer Nutzer sch¨utzen. Wir grenzen unsere Definition zun¨achst von ¨ahnlichen Konzepten ab und motivieren ihren Nutzen mit philosophis- chen, ¨okonomischen und rechtlichen Argumenten. Dann untersuchen wir ex- istierende anonyme file sharing-Systeme dahingehend, inwiefern sie als Im- plementierungen von Weltbibliotheken geeignet sind. Dabei isolieren wir Gr¨unde f¨ur den vergleichsweise geringen Erfolg dieser Gruppe von Program- men und identifizieren Prinzipien und Strukturen, die in Weltbibliotheken angewandt werden k¨onnen. Eine solche Struktur von ¨ubergeordneter Wichtig- keit ist die sogenannte distributed hash table (DHT), deren Sicherheitseigen- schaften, vor allem in Hinblick auf den Schutz der Privatsph¨are, wir daher n¨aher beleuchten. Mit Hilfe von analytischen Modellen und Simulation zeigen wir Schwachstellen in bisher bekannten Systemen auf und stellen schließlich die erste praktische DHT mit skalierbaren Sicherheitseigenschaften gegen ak- tive Angreifer vor. Das vierte Kapitel der Arbeit widmet sich allgemeinen Schranken der Anonymit¨at. Anonymit¨atstechniken stellen wichtige Mittel zur Sicherung von Privatsph¨are dar. Wir untersuchen neuartige Angriffe auf solche Techniken, die unter sehr allgemeinen Voraussetzungen einsetzbar sind, und finden analytische untere Schranken f¨ur die Erfolgswahrschein- lichkeit solcher Angriffe in Abh¨angigkeit von der Anzahl der gegnerischen Beobachtungen. Dies beschr¨ankt die Anwendbarkeit von Anonymit¨atstech- niken in Weltbibliotheken und anderen privacy enhancing technologies. Ab- schließend fassen wir unsere Erkenntnisse und ihre Auswirkungen auf die Gestaltung von Weltbibliotheken zusammen. Abstract Public libraries have served an invaluable function for a long time. Yet in an age of digital media, traditional methods of publishing and distributing information become increasingly marginalized. This thesis poses the ques- tion if and how information technology can be employed to find functional alternatives, conforming to the needs of society that have been changed by information technology itself. To this end we introduce the concept of world libraries: Massively distributed systems that allow for the efficient sharing of large data volumes in open, untrusted environments while preserving pri- vacy. To begin with, we discuss our definition as it relates to similar but different notions, then defend its utility with philosophical, economical, and legal arguments. We go on to examine anonymous file sharing systems with regard to their applicability for world libraries. We identify reasons why these systems have failed to achieve popularity so far, as well as promising principles and structures for successful world library implementations. Of these, distributed hash tables (DHTs) turn out to be particularly important. That is why we study their security and privacy properties in detail. Us- ing both analytical models and simulation, weaknesses in earlier approaches are shown, leading us to the discovery of the first practical DHT that scal- ably defends against active adversaries. The fourth chapter is dedicated to general bounds on anonymity. Anonymity techniques are important tools for the preservation of privacy. We present novel attacks that endanger the anonymity provided by these tools under very general circumstances, giving analytical lower bounds to the success probability of such attacks measured in the number of adversarial observations. These results limit the applica- bility of anonymity techniques in world library implementations and other privacy enhancing technologies. Finally, we summarize our findings and their impact on world library design. Acknowledgements I am deeply grateful for the opportunity to conceive, prepare, and finish this thesis. This feeling is not limited to a specific person or group of people, and were I to personally thank everyone who has wished me well or somehow contributed to the success of this enterprise, this thesis would easily double its volume. Still, the following people have played an outstanding part in this process, which is why I want to say thank you to them in particular: First of all, my adviser Prof. Dr. Peter Rossmanith for giving me the chance to find and pursue my own interests, even when they went rather far away from canonical theoretical computer science. My co-adviser Prof. Dr. Dr. h.c. Otto Spaniol for agreeing to report on such an unusual thesis. My colleagues Joachim Kneis and Alexander Langer, who, although they had a hard time understanding what this was all about, freed me from the hard la- bor of teaching when it was most needed. My coauthors Daniel M¨olle, Dogan Kesdogan, Peter Rossmanith, Andriy Panchenko, and Arne Rache without whom most research that went into this thesis would not have been possible. Lexi Pimenidis for introducing me into the field of security research. Hossein Shafagh who supported me greatly in the practical aspects of understanding the OFF System better. Bob Way, Philipp Claves, Spectral Morning, and the rest of the Owner-Free community, for showing me a fresh perspective, and hands-on help with OFF. Johanna Houben for encouraging me to finish this thesis when I was in doubt. My parents for patiently trusting that what I did was right for me, and my entire family for their ongoing support in more ways then I could thank them for. Finally, I would like to especially thank the following people (again) for proofreading and helpful comments on draft versions of this document: Daniel M¨olle, Philipp Claves, Bertolt Meier, Bob Way, Diego Biurrun, Christoph Richter, my parents, Peter Rossmanith and Verena Simon. Contents 1 World Libraries 1 1.1 IntroductionandOverview. 1 1.2 DefinitionandRelatedConcepts. 3 1.2.1 DigitalLibraries. 5 1.2.2 Tor and Other Low-Latency AnonymityNetworks . 6 1.2.3 Censorship Resistant Publishing, Anonymous Storage, and Freenet . 7 1.2.4 Private Information Retrieval andObliviousTransfer . 8 1.2.5 AnonymousFileSharing . 8 1.3 Motivation,SocialEffects,andtheLaw . 10 1.3.1 Privacy .......................... 11 1.3.2 PossibleObjections . 15 2 Existing Systems/Case Studies 19 2.1 AntRoutingNetworks . 20 2.2 Other Anonymous File Sharing Networks . 24 2.3 TheOwner-FreeFileSystem. 26 2.3.1 IntroductiontotheOFFSystem . 27 2.3.2 AttacksonOFF...................... 29 2.3.3 ScalingOFF........................ 36 2.3.4 Conclusion......................... 42 3 Privacy and Security Aspects of Distributed Hash Table De- sign 44 3.1 Introduction............................ 44 3.1.1 DHTModel ........................ 46 3.1.2 AttackerModel . .. .. 47 3.2 RelatedWork ........................... 48 3.3 ActiveAttacks .......................... 51 i 3.3.1 Redundancy through Independent Paths Does Not Scale 52 3.3.2 Redesigning Redundancy: Aggregated Greedy Search . 53 3.3.3 HidingtheSearchValue . 55 3.3.4 Bounds Checking in Finger Tables . 57 3.4 PassiveAttacks .......................... 64 3.4.1 InformationLeakage . 64 3.4.2 Increasing Attacker Uncertainty . 65 3.4.3 Random Walks in Network Information Systems . 66 3.5 FurtherObservations . 70 3.5.1 BootstrappingProcess . 70 3.5.2 Arbitrary Positioning of Malicious Nodes . 71 3.5.3 Path Length in Aggregated Greedy Search . 73 3.6 Conclusion............................. 74 4 Learning Unique Hitting Sets — Bounds on Anonymity 76 4.1 Introduction............................ 76 4.1.1 Anonymity Techniques . 78 4.1.2 RelatedWorkandContributions . 80 4.2 The Uniqueness of Hitting Sets . 82 4.3 ASimpleLearningAlgorithm . 83 4.4 An Advanced Learning Algorithm . 86 4.5 Conclusion............................. 89 5 Conclusion 91 5.1 SummaryandContributions . 91 5.2 Requirements, Issues, Principles, and IdeasforWorldLibraries . 94 5.3 OutlookandFutureWork . 96 ii List of Figures 2.1 Number of uses found for blocks in the live OFF network . 32 2.2 Interaction frequency per node histogram in the live OFF system 36 2.3 Interaction frequency per node histogram in the live OFF sys- tem when restricted to accepted block pushes . 37 2.4 Block frequency histogram for PlanetLab OFF system on March 26, 2009 at 17:42:59: 379 nodes after storing and dispersing 684MBofdata .......................... 38 2.5 Block frequency histogram for PlanetLab OFF system on March 26, 2009 at 19:04:11: