Consumer Distributed File Systems
Total Page:16
File Type:pdf, Size:1020Kb
Consumer Distributed File Systems by Daniel N. Peek A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2009 Doctoral Committee: Associate Professor Jason N. Flinn, Chair Professor Farnam Jahanian Associate Professor Brian D. Noble Assistant Professor Mark W. Newman c Daniel N. Peek 2009 All Rights Reserved For family and friends. ii ACKNOWLEDGEMENTS This work is the result of the efforts of a group of people. The foremost of these is my advisor, Professor Jason Flinn. I have had the privilege of being Jason’s fourth graduate student and the last who started at the University of Michigan when he did. I have been the beneficiary of his enormous patience, experience, insights, and indulgence. I couldn’t have asked for a better mentor. I would like to thank my committee members: Professors Brian Noble, Farnam Jahanian, and Mark Newman. Their advice and insight improved this work. I am also grateful to the members of my research group: Manish Anand, Ed Nightingale, Ya-Yunn Su, Kaushik Veeraraghavan, Mona Attariyan, Brett Higgins, Puspesh Kumar, and Michelle Noronha. They always provided advice, support, prob- lems, and ideas. I would also like to thank my officemates and housemates: Benjamin Wester, Arnab Nandi, David Watson, Evan Cooke, Jon Oberheide, Sushant Sinha, Kaushtubh Nyalkalkar, Amy Kao, Darshan Karwat, and Ali Besharatian, who made sure there was always something interesting going on. Thanks also to my internship mentor and collaborators: Doug Terry, Tom Rode- heffer, Ted Wobber, Rama Ramasubramanian, and Meg Walraed-Sullivan. I would like to acknowledge the dozens of program committee members and anony- mous reviewers who have endured my papers over the years. Their efforts make the software systems community possible. Finally, I would like to thank my family. They have been very supportive of this academic pursuit over the years it took to complete. iii TABLE OF CONTENTS DEDICATION .................................. ii ACKNOWLEDGEMENTS .......................... iii LIST OF FIGURES ............................... vii CHAPTERS 1 Introduction ............................... 1 1.1 Principles............................. 2 1.2 Distributed file systems for consumers . 3 1.3 Adding type-awareness . 4 1.4 Connecting CE devices . 5 1.5 Theroadahead ......................... 6 2 BackgroundandMotivation. 7 2.1 Integrating consumer electronics devices . 7 2.2 Ensembles ............................ 10 2.3 Persistentqueries ........................ 10 2.4 Sprockets............................. 11 2.5 TrapperKeeper.......................... 12 3 PersistentQueries ............................ 16 3.1 Designconsiderations . 16 3.2 Implementation ......................... 19 3.3 Evaluation ............................ 20 3.4 Examples............................. 22 3.5 Conclusion ............................ 24 4 Sprockets................................. 25 4.1 Designgoals ........................... 26 4.1.1 Safety........................... 26 4.1.2 Easeofimplementation . 27 4.1.3 Performance ....................... 28 4.2 Alternativedesigns .. .. 28 iv 4.2.1 Directprocedurecall . 29 4.2.2 Address space sandboxing . 29 4.2.3 Checkpointandrollback . 30 4.3 Sprocket design and implementation . 33 4.3.1 Adding instrumentation . 33 4.3.2 Sprocketinterface . 35 4.3.3 Handling buggy sprockets . 35 4.3.4 Support for multithreaded applications . 38 4.4 Sprocketuses........................... 39 4.4.1 Transducers ....................... 39 4.4.2 Application-specific resolution . 41 4.5 Evaluation ............................ 43 4.5.1 Methodology....................... 43 4.5.2 Radioheadtransducer . 44 4.5.3 Application-specific conflict resolution . 45 4.5.4 Optimizations ...................... 47 4.5.5 Whengoodsprocketsgobad . 49 4.6 Conclusion ............................ 50 5 TrapperKeeper.............................. 52 5.1 Designgoals ........................... 53 5.1.1 Minimize work per file type . 53 5.1.2 Nocodeperfiletype .................. 54 5.1.3 Isolation ......................... 55 5.2 Implementation ......................... 56 5.2.1 Trapper: Capturing application behavior . 56 5.2.2 Keeper: Running application parsers . 58 5.2.3 Grabber: Capturingtheinterface . 59 5.2.4 Discussion ........................ 62 5.3 Features ............................. 63 5.3.1 Metadataextraction. 64 5.3.2 Documentpreview. 65 5.4 Evaluation ............................ 66 5.4.1 Methodology....................... 66 5.4.2 Trapperperformance . 67 5.4.3 Executingfeatures. 68 5.4.4 DetectingstableGUIstates. 71 5.4.5 Experiences with TrapperKeeper . 71 5.4.6 Thelongtail....................... 73 5.5 Conclusion ............................ 75 6 Consumer Electronics Device Integration . 76 6.1 Making CE devices self-describing . 77 6.2 Supporting namespace diversity . 77 6.3 Reintegrating changes from CE devices . 78 v 6.4 PropagatingupdatestoCEdevices . 80 6.5 Evaluation ............................ 81 6.5.1 Case study: Integrating a digital camera . 81 6.5.2 Case study: Integrating an MP3 player . 82 7 Ensembles ................................ 84 7.1 Designconsiderations . 84 7.2 Implementation ......................... 86 7.2.1 Trackingmodifications . 87 7.2.2 Joininganensemble . 88 7.2.3 Ensembleoperation . 89 7.2.4 Leaving an ensemble . 90 7.3 Evaluation ............................ 91 7.3.1 Ensembleformation . 91 7.4 Conclusion ............................ 93 8 RelatedWork............................... 94 8.1 Persistent queries related work . 94 8.2 Sprocketsrelatedwork . 95 8.3 TrapperKeeperrelatedwork . 97 8.4 CEdeviceintegrationrelatedwork. 99 8.5 Ensemblesrelatedwork . 99 9 Conclusion ................................ 101 BIBLIOGRAPHY ................................ 103 vi LIST OF FIGURES Figure 3.1 Persistentqueryinterface. 18 3.2 Overheadofpersistentqueries . 21 3.3 Timetocreateapersistentquery . 22 4.1 Exampleofsprocketinterface . 36 4.2 Performance of the Radiohead transducer . ... 44 4.3 Application-specific conflict resolution . ...... 46 4.4 Optimizationperformance . 48 4.5 Resultsofbuggysprockets . 49 5.1 Tiers of increasing specificity . .. 53 5.2 TrapperKeeperoverview . 57 5.3 Accessibility API view of a window . 61 5.4 Trapperperformance ........................... 67 5.5 Keeperperformance............................ 68 5.6 Difference in GUI states for different files . ... 70 5.7 Fractionoffilesleftuncovered . 73 5.8 Coverage of most popular file name extensions . ... 74 7.1 Ensembleformationtime. 91 7.2 Time to reconcile updates in an ensemble . .. 92 vii CHAPTER 1 Introduction Consumers have long stored their data on local file systems on personal computers. However, these file systems are no longer sufficient to manage consumer data due to the emergence of consumer electronics (CE) devices such as cell phones and digital cameras and the rapid increase of multimedia data. CE devices are a problem for local file systems because they store user data outside the purview of the local file system on heterogeneous devices. As a result, intervention is required to synchronize the state of the CE devices and a local file system. For instance, pictures can be downloaded from a digital camera or iPod contents can be synchronized by using iTunes. However, these management tasks burden the user. They require the user to remember which data needs to be transferred and to learn to operate new application interfaces. Our insight into simplifying this situation is that although each instance of the stored data is physically distinct, conceptually, all instances of a stored data item are a single logical entity. Distributed file systems embody a similar realization for general- purpose computers, presenting an interface in which all computers can access all data as if they were all connected to a single large shared disk and maintaining this illusion by automatically propagating data. This eases data management issues by allowing users to operate on a single conceptual data item instead of the dispersed physical instances of that data. By creating a distributed file system capable of including both 1 general-purpose computers and CE devices as clients, we can eliminate the chore of manual synchronization. We have also experienced an immense expansion of multimedia data [2, 16]. Stud- ies [61] have shown that higher level management techniques are needed to help man- age the burgeoning data. An example of this new management is the ability to list files based not on their position within a hierarchy of file system directories, but based on other properties of files such as the quality of a JPEG image file. These properties, called metadata, are typically encoded into file data in a manner governed by their file type. This arrangement is opaque to current file systems, preventing them from ac- cessing and this metadata and consequently, from providing type-aware functionality such as metadata indexing, automatic conflict resolution, and generating document previews. Our response is to build a new mechanism that allows the file system to be ex- tended with code that provides type-aware features. We have also developed a tech- nique that can provide many type-aware functions and does not require the creation of code for each file type. These two problems, the lack of CE device support and type-awareness, are an obstacle to consumer storage use. This work demonstrates techniques to address these difficulties. Thus, our thesis is that the inclusion of CE devices