Practical Private Information Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
Practical Private Information Retrieval by Femi George Olumofin A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2011 c Femi George Olumofin 2011 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract In recent years, the subject of online privacy has been attracting much interest, espe- cially as more Internet users than ever are beginning to care about the privacy of their online activities. Privacy concerns are even prompting legislators in some countries to de- mand from service providers a more privacy-friendly Internet experience for their citizens. These are welcomed developments and in stark contrast to the practice of Internet censor- ship and surveillance that legislators in some nations have been known to promote. The development of Internet systems that are able to protect user privacy requires private in- formation retrieval (PIR) schemes that are practical, because no other efficient techniques exist for preserving the confidentiality of the retrieval requests and responses of a user from an Internet system holding unencrypted data. This thesis studies how PIR schemes can be made more relevant and practical for the development of systems that are protective of users' privacy. Private information retrieval schemes are cryptographic constructions for retrieving data from a database, without the database (or database administrator) being able to learn any information about the content of the query. PIR can be applied to preserve the confidentiality of queries to online data sources in many domains, such as online patents, real-time stock quotes, Internet domain names, location-based services, online behavioural profiling and advertising, search engines, and so on. Typically, the database consists of r blocks, and the user query is an index i between 1 and r. The client first encodes the index i into a PIR query, and then forwards it to the database. The database subsequently performs some computations linear in r, and returns the result to the client. The client finally decodes the database response to obtain the block at index i. The main parameters of interest are the number of bits communicated in the interaction between the client and the database, and the amount of computation | usually server computation. In this thesis, we study private information retrieval and obtain results that seek to make PIR more relevant in practice than all previous treatments of the subject in the literature, which have been mostly theoretical. We also show that PIR is the most computationally efficient known technique for providing access privacy under realistic computation powers and network bandwidths. Our result covers all currently known varieties of PIR schemes. We provide a more detailed summary of our contributions below: • Our first result addresses an existing question regarding the computational practi- cality of private information retrieval schemes. We show that, unlike previously ar- gued, recent lattice-based computational PIR schemes and multi-server information- theoretic PIR schemes are much more computationally efficient than a trivial transfer iii of the entire PIR database from the server to the client (i.e., trivial download). Our result shows the end-to-end response times of these schemes are one to three orders of magnitude (10{1000 times) smaller than the trivial download of the database for realistic computation powers and network bandwidths. This result extends and clar- ifies the well-known result of Sion and Carbunar on the computational practicality of PIR. • Our second result is a novel approach for preserving the privacy of sensitive constants in an SQL query, which improves substantially upon the earlier work. Specifically, we provide an expressive data access model of SQL atop of the existing rudimentary index- and keyword-based data access models of PIR. The expressive SQL-based model developed results in between 7 and 480 times improvement in query throughput than previous work. • We then provide a PIR-based approach for preserving access privacy over large databases. Unlike previously published access privacy approaches, we explore new ideas about privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. This work addresses an important open problem about how real systems can systematically apply existing PIR schemes for querying large databases. • In terms of applications, we apply PIR to solve user privacy problem in the domains of patent database query and location-based services, user and database privacy problems in the domain of the online sales of digital goods, and a scalability problem for the Tor anonymous communication network. • We develop practical tools for most of our techniques, which can be useful for adding PIR support to existing and new Internet system designs. Supervisor: Ian Goldberg Examiners: Urs Hengartner, Doug Stinson, Alfred Menezes, and Ahmad-Reza Sadeghi iv Acknowledgements This thesis is a product of three years of work under the tutelage of my supervisor Ian Goldberg. He graciously accepted me as a transfer doctoral student from the University of Manitoba, even when I knew nothing about private information retrieval. Soon after my arrival to Waterloo, he exposed me to relevant literature in the areas of security, privacy, and applied cryptography, which eventually led to my interest in PIR. He listened to all my ideas and guided me in the pursuit of those that form the bulk of this thesis. An Oakland PIR paper [Gol07a] and its Percy++ open source implementation that Ian authored are the foundational work for this thesis. I count it a rare privilege to have worked with this brilliant, vast, highly motivated, kind, and skillful supervisor. Ian bought out my Teaching Assistantship duties to make more time available for my research, provided research assistantship support the moment my major scholarships got all used up, and gave many opportunities for professional development and networking via conference travels. For all these and many more, I am grateful. I would like to thank the other members of my thesis examining committee for their great feedback amongst others. Thanks to Urs Hengartner for our work on location pri- vacy, Doug Stinson for connecting me with my first community service opportunity, Alfred Menezes for attending and providing feedback to all my PhD Seminar talks, and Ahmad- Reza Sadeghi for coming all the way from Germany to sit as my external. I have also enjoyed collaborations from some of the brightest people from within and outside Waterloo. In particular, I am grateful to Peter Tysowski, Ryan Henry, Prateek Mittal, Carmela Troncoso, and Nikita Borisov for fruitful collaborations. I have enjoyed encouragement and/or great feedback on draft papers and talks from a number of indi- viduals that has made this thesis much better. They include current CrySP lab students and postdocs Mashael Alsabah, Andr´eCarrington, Tariq Elahi, Aleksander Essex, Kevin Henry, Ryan Henry, Atif Khan, Mehrdad Nojoumian, Sarah Pidcock, Rob Smits, Colleen Swanson, Jalaj Upadhyay, Andy Huang, Kevin Bauer, and Sherman Chow; former CrySP students Jeremy Clark, Greg Zaverucha, Qi Xie, Can Tang, Aniket Kate, Wanying Luo, Chris Alexander, Jiang Wu, Atefeh Mashatan, and Joel Reardon; friends and colleagues from the School of Computer Science John Akinyemi, Carol Fung, Jeff Pound, and Dave Loker; people from outside the University Jean-Charles Gr´egoire,and Ang`eleHamel, con- ference shepherd Meredith Patterson; and the anonymous reviewers of our papers. I am blessed to have an understanding wife, Favour, who has stuck with me all through my years of graduate school. Thank you for your support, care, patience, and faith in me. I thank my dad, Folorunso Olumofin, for always encouraging me to press forward in my academic pursuits. I am grateful for the prayers and support of my mom, Esther v Olumofin, and my mother-in-law, Rachel Sani. Words cannot express my appreciation to my brother, Olabode Olumofin, for his surprise support gestures and my other siblings for their goodwill. I am grateful to Steve and Beth Fleming and the entire body of Koinonia Christian Fellowship, for providing me with a safe place for worship, fellowship, service, and growth. I am also thankful for my spiritual dad, David Oyedepo, for being a blessing and an inspiration over the years. The funding for this research was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) through a NSERC Postgraduate Scholarship, the University of Waterloo through a President's Graduate Scholarship, the Mprime research network (formerly MITACS) through Research Assistantships, Research In Motion (RIM) through a RIM Graduate Scholarship, and The Tor Project, Inc. vi Dedication This dissertation is dedicated to Favour, Grace, Faith, and Joy. vii Table of Contents List of Tables xiv List of Figures xv List of Queries xvii 1 Introduction 1 1.1 Preamble . 1 1.2 Open Problems . 4 1.3 Contributions . 5 1.4 Thesis Organization . 10 2 Preliminaries 12 2.1 Motivation . 12 2.2 Applications . 13 2.3 Single-server PIR Schemes . 15 2.4 Multi-server PIR Schemes . 16 2.4.1 Chor et al. Scheme [CGKS95] . 16 2.4.2 Goldberg Scheme [Gol07a] . 17 2.5 Coprocessor-Assisted PIR Schemes . 18 2.6 Relationship to other Cryptographic Primitives . 19 2.7 Relation to Recent PhD Theses . 21 viii 3 Related Work 23 3.1 Multi-server and Single-server PIR . 23 3.2 Computational Practicality of PIR . 24 3.3 Data Access Models for PIR . 27 3.4 Large Database Access Privacy and Tradeoffs . 29 3.5 Application Areas for PIR .