Szekeres Washington 0250E 2
Total Page:16
File Type:pdf, Size:1020Kb
DESIGNING STORAGE AND PRIVACY-PRESERVING SYSTEMSFOR LARGE-SCALE CLOUD APPLICATIONS by Adriana SZEKERES A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2020 Reading committe: Henry M. Levy, Co-Chair Arvind Krishnamurthy, Co-Chair Dan R. K. Ports Program Authorized to Offer Degree: Computer Science & Engineering ©Copyright 2020 Adriana SZEKERES University of Washington Abstract Designing Storage and Privacy-Preserving Systems for Large-Scale Cloud Applications Adriana Szekeres Chair of Supervisory Committee: Henry M. Levy, Arvind Krishnamurthy Computer Science & Engineering Today’s mobile devices sense, collect, and store huge amounts of personal information, which users share with family and friends through a wide range of cloud applications. Many of these applications are large-scale – they must support millions of users from all over the world. These applications must manage a massive amount of user-generated content, they must ensure that this content is always available, and they must protect users’ privacy. To deal with the complexity of managing a large amount of data and ensure its availability, large-scale cloud applications rely on transactional, fault-tolerant, distributed storage systems. Latest hardware advancements and new design trends, such as in-memory storage, kernel-bypass message processing, and geo-distribution, have significantly improved the transaction processing times. However, they have also exposed overheads in the protocols used to implement these systems, which hinder their performance; specifically, existing protocols require unnecessary cross-datacenter and cross-processor coordination. This thesis introduces Meerkat, a new replicated storage system that avoids all cross-core and cross-replica coordination for transactions that do not conflict. Moreover, replica recovery protocols that do not require persistent storage devices, are proposed; these protocols can be easily integrated with existing storage systems. Almost all applications offer their users a choice of privacy policies; unfortunately, they frequently violate these policies due to bugs or other reasons. Therefore, this thesis introduces SAFE, a privacy-preserving system that automatically enforces users’ privacy policies on untrusted mobile-cloud applications. Our results demonstrate that SAFE is a practical way for users to create a chain of trust from their mobile devices to the cloud. ACKNOWLEDGEMENTS I would have not been able to write this thesis without the invaluable help I received along the way from many people who guided me, provided lots of feedback and insights, and have been my role models in the pursuit of becoming a good researcher. First of all, I am extremely fortunate that I got to spend most of my PhD time having discussions and brainstorming sessions with Irene Zhang, Jialin Li, Naveen Sharma, and Dan Ports. We all started our programs at UW at the same time and, soon after, they became my collaborators, teachers, mentors, advisors, and close friends. Thank you for your invaluable help with my research, for your numerous insights that brought clarity to my often chaotic thoughts, for sharing your knowledge and wisdom, for your support, and for all the fun activities we shared – all of these kept me motivated and excited about research. I am extremely grateful to my two PhD advisors, Hank Levy and Arvind Krishna- murthy, who gave me guidance and enough space to discover the topics in systems re- search I am most passionate about. Thank you for your constant encouragement, pa- tience, feedback, and contagious optimism – this helped both to improve my work and balance my PhD life. A huge thank you to my amazing collaborators from whom I learned a lot and whithout whom this thesis would have not been possible: Irene Zhang, Jialin Li, Naveen Sharma, Dan Ports, Ellis Michael, Michael Whittaker, Arvind Krishnamurthy, Hank Levy, Katelin Bailey, Isaac Ackerman, Haichen Shen, and Franzi Roesner. I am espe- cially grateful to Irene and Dan, who significantly helped with navigating through the hard research problems and with the writing for the papers from which this thesis de- rives, to Naveen and Jialin, who helped me debug countless performance issues, to El- lis, who significantly contributed to our work on diskless recovery, and to Michael, who significantly contributed to our Meerkat work. A big thank you to many other faculty and students at UW who enriched my PhD ex- perience: Antoine Kaufmann, Pedro Fonseca, Danyang Zhuo, Yuchen Jin, Niel Lebeck, Helgi Sigurbjarnarson, Kaiyuan Zhang, Tom Anderson, Pete Hornyack, Yoshi Kohno, Eunsol Choi, Camille Cobb, Victoria Lin, Brandon Holt, Lilian de Greef, Deepali Aneja, Simon Peter, Anna Kornfeld Simpson, Helga Gudmundsdottir, Liang Luo, Pratyush v vi Patel, Lequn Chen, Elizabeth Clark, Karl Koscher, Amrita Mazumdar, Seungyeop Han, Qiao Zhang, Ed Lazowska, Jacob Nelson, Angli Liu , Raymond Cheng, Henry Schuh, Xi Wang, Emina Torlak, Arun Byravan, Jialin Li, Ashlie Martinez, Samantha Miller, Tianyi Cui, Ming Liu, and Kevin Zhao. I am especially grateful to Xi, always a friendly pres- ence, providing useful feedback and advice, and to Antoine, Arun, Kaiyuan, Yuchen, Danyang, Niel, Pedro, Anna, Liang, for the often long and cheerful discussions. I am extremely grateful to Erik van der Kouwe, Andy Tanenbaum, and Nicolae T, ˘apus, for sharing their knowlege and encouraging me to take the PhD path. I am really grateful to all UW staff that took care of the administrative issues, so that I could focus solely on my PhD, especially Lindsay Michimoto, Melody Kadenko, Dali Grubisa, Elise deGoede Dorough, and Lisa Merlin. I would like to thank the committee members Hank Levy, Arvind Krishnamurthy, Radha Poovendran, Dan Ports, Xi Wang, whose feedback substantially improved the clarity of this thesis. Finally, I would like to thank my family for their endless support and encouragement. To my mother, Antonica, my father, Adrian, and my sister, Anca. CONTENTS Acknowledgements v 1 Introduction 1 1.1 Storage systems for large-scale cloud applications . 2 1.1.1 Desired design . 3 1.1.2 New hardware advancements and design trends. 4 1.1.3 Challenges . 6 1.2 Privacy-preserving systems for untrusted large-scale cloud applications . 8 1.2.1 Challenges . 8 1.3 Contributions . 9 1.3.1 Fast and multicore scalable replicated transactions: Meerkat . 10 1.3.2 Fast recovery for in-memory replicated storage: Diskless Recovery . 10 1.3.3 Confidentiality system for untrusted cloud applications: SAFE . 11 1.4 Outline . 11 2 Background 13 2.1 Distributed storage systems. 13 2.1.1 Concurrency control . 14 2.1.2 Atomic commitment. 15 2.1.3 Replication. 16 2.1.4 Multicore support . 18 2.1.5 Over-coordination in existing solutions . 18 2.2 Privacy-preserving systems . 20 2.2.1 Information flow control . 20 2.3 Concluding remarks . 21 3 Meerkat 23 3.1 The Zero Coordination Principle . 25 3.2 Meerkat Approach . 27 3.3 Meerkat Overview. 28 3.3.1 System Model and Architecture . 28 ix x CONTENTS 3.3.2 Meerkat Data Structures . 29 3.4 Meerkat Transaction Protocol . 30 3.4.1 Protocol Overview . 30 3.4.2 Meerkat Transaction Processing . 31 3.4.3 Meerkat Failure Handling and Record Truncation . 36 3.4.4 Correctness . 51 3.5 Evaluation . 52 3.5.1 Prototype Implementation. 52 3.5.2 Experimental Setup . 54 3.5.3 Impact of ZCP on YCSB-T Benchmark . 56 3.5.4 Impact of ZCP on Retwis Benchmark. 57 3.5.5 ZCP and Contention . 58 3.6 Related Work . 59 3.7 Summary . 61 4 Diskless Recovery 63 4.1 Background and Related Work . 65 4.2 System Model . 67 4.3 Achieving Crash-Consistent Quorums . 69 4.3.1 Unstable Quorums: Intuition . 70 4.3.2 Crash-Consistent Quorums . 71 4.3.3 Communication Primitives in DCR . 71 4.3.4 Correctness . 74 4.4 Recoverable Shared Objects in DCR. 78 4.4.1 Multi-writer, Multi-reader Atomic Register. 78 4.4.2 Virtual Stable Storage . 79 4.5 Recoverable Replicated State Machines in DCR . 80 4.5.1 Viewstamped Replication . 81 4.5.2 Paxos Made Live . 84 4.5.3 JPaxos . 86 4.6 Evaluation . 88 4.7 Summary . 90 5 A practical privacy-preserving system 91 5.1 The SAFE Framework . 93 5.1.1 SAFE System Model and Concepts . 93 5.1.2 SAFE Policies . 96 CONTENTS xi 5.1.3 Trust and Threat Model . 97 5.2 Agate, a SAFE Mobile OS. 98 5.2.1 Agate Architecture . 98 5.2.2 Agate Policies . 99 5.2.3 Agate Syscall Interface . 99 5.2.4 Agate User Interface . 100 5.2.5 Agate SAFE Enforcement . 100 5.3 Magma, a SAFE Runtime System . 101 5.3.1 Magma Architecture . 101 5.3.2 Magma IFC Model . 102 5.3.3 Magma Flow Tracking . 103 5.3.4 Magma SAFE Enforcement . 104 5.4 Geode, a SAFE Storage Proxy. 105 5.4.1 Geode Interface and Guarantees . 105 5.4.2 Geode Architecture . 105 5.4.3 Geode SAFE Enforcement . 106 5.5 Evaluation . 107 5.5.1 Implementation . 107 5.5.2 Security Analysis . 108 5.5.3 User Experience . 110 5.5.4 Programmability and Porting Experience . 111 5.5.5 Performance . 112 5.5.6 Application Performance . 113 5.6 Related Work . 115 5.7 Summary . 116 6 Conclusions and Future Research Directions 119 References . 123 1 INTRODUCTION VER the last decades, cloud computing has emerged to allow businesses to easily O develop and deploy cloud applications to make their services available to users all over the world. Today’s cloud giants – Google, Microsoft, Amazon, Facebook, VMWare – provide the necessary hardware and software resources to manage a cloud, which nor- mally spans multiple datacenters, and expose it to businesses under the form of services, such as Software-as-a-Service (SaaS), Infrastructure-as-a-Service (IaaS) and Platform- as-a-Service (PaaS).