Representative, Reproducible, and Practical Benchmarking of File and Storage Systems
Total Page:16
File Type:pdf, Size:1020Kb
REPRESENTATIVE, REPRODUCIBLE, AND PRACTICAL BENCHMARKING OF FILE AND STORAGE SYSTEMS by Nitin Agrawal A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON 2009 i To my parents ii ACKNOWLEDGMENTS First and foremost I would like to thank my advisors, Andrea and Remzi; I couldn’t have asked for better. During our weekly meetings, more often than not, I went in with a laundry list of frenzied thoughts and messy graphs, and came out with a refined idea. Graciously, they made me feel as if I was the one responsible for this outcome; I wish I had been better prepared to get more out of our meetings. Remzi showed me how someone could do serious research without taking oneself too seriously, though it will be a lasting challenge for me to come anywhere close to how effortlessly he does it. As an advisor, he genuinely cares for his student’s well being. On many occasions my spirits got refreshed after getting an “are you OK?” email from him. Remzi not only provided a constant source of good advice ranging from conference talk tips to tennis racquet selection, but his ability to find “altitude” in my work when none existed transformed a series of naive ideas into a dissertation. Most students would consider themselves lucky to have found one good advisor, I was privileged to have two. Andrea constantly raised the bar and challenged me to do better; the thesis is all the more thorough due to her persistence. I was always amazed how she could understand in a few minutes the implications of my work far beyond my own under- standing. Andrea’s meticulous advising was the driving force behind the completion and acceptance of much of my work. I would like to thank my thesis committee members Mike Swift, Karu Sankaralingam and Mark Craven, for their useful suggestions and questions about my research during my defense talk. I would especially like to thank Mike for his advice and interest in my work iii throughout, and for giving me an opportunity to teach his undergraduate operating systems class. During my six years in Wisconsin, I have had a wonderful set of colleagues Leo Arulraj, Lakshmi Bairavasundaram, John Bent, Nate Burnett, Tim Denehy, Haryadi Gunawi, Todd Jones, Shweta Krishnan, Andrew Krioukov, Joe Meehan, Florentina Popovici, Vijayan Prabhakaran, Abhishek Rajimwale, Meenali Rungta, Muthian Sivathanu, Sriram Subra- manian, Swami Sundararaman, and Yupu Zhang; being the last of Remzi and Andrea’s pre-tenure hires, I perhaps overlapped with a larger than usual group of students to my benefit. I will especially miss our late night chats in the hallway and lunch outings to the food carts. Muthian and Ina kindled my interest in the group and provided a patient ear to a curious (and clueless) first year graduate student, I thank them for that. I also thank Leo for taking over some of the work on Compressions while I was busy writing my dissertation. My officemates John, Todd, Shweta, and Swami have all been awesome (even taking care of my plants during my absence!). I have greatly benefited from my internships at Microsoft Research (Redmond and Sil- icon Valley) and IBM Almaden; my internship work at MSR Redmond in fact became the basis for my thesis itself. I thank my mentors and colleagues Bill Bolosky, John Douceur, Jay Lorch at MSR Redmond, Ted Wobber at MSR Silicon Valley, and Ying Chen at IBM Almaden. I am grateful to John and Ted in particular for providing a terrific internship experience as mentors. Gautham and Lakshmi, my roommates for five years, made my stay in Madison a most pleasant one. Their friendship and camaraderie provided comfort during times when I felt low, and a reason to celebrate even minor accomplishments. I will miss my discussions with Gau on all things not related to work. Lakshmi had the added burden of being both a friend and a colleague. He mentored my research in more ways than he cares to remember. I thank Anoop for being part of our Madison family during the first two years. Having iv Akash as roommate for the last year during a tough job market somehow made the process less stressful. My stay in Madison wouldn’t have been the same without the company of wonderful friends Ankit, Cindy, Jayashree, Neelam, Raman, Shweta and Swami to name a few. I con- stantly ignored Shweta and Neelam’s concern about my health and eating habits, but they continued to provide food and friendship during their years here. Evening coffee breaks with them became a daily stress buster, and our escapades during a summer internship in California will always be memorable. Cindy not only provided constant company in the department, but also pleasant musical interludes. Having Ankit live right next door made his apartment my “third place”; movie and dinner nights eventually gave way to more en- tertaining ones playing Wii. Mayur, and later on Priyank, played host to numerous east coast trips; knowing that they will be close to Princeton is a comforting thought. Deborshi and Siddhartha have been good friends throughout these years. The company of my cousins in the US, Arvind, Ashish, Saurabh and Shubha, provided me a feeling of home away from home. My sister Neha has always been the kind of loving sister who is more happy for my accomplishments than I myself. Talking to her on phone has been my refuge from the monotony of work and I have come to cherish the time that we get to spend together. Finally, I would like to thank my parents for their unconditional love and support; this Ph.D. would not have been possible without them. They have always helped me see the good in everything and shaped my perspective for the better. Whenever I doubted my abilities, their encouragement put me right back on track. The pride in their eyes on seeing me graduate made it all seem worthwhile. v TABLE OF CONTENTS Page LIST OF TABLES ................................... ix LIST OF FIGURES .................................. x ABSTRACT ......................................xiv 1 Introduction .................................... 1 1.1 Representative and Reproducible File-System BenchmarkingState . 3 1.1.1 CharacteristicsofFile-SystemMetadata . ..... 4 1.1.2 GeneratingFile-SystemBenchmarkingState . .... 6 1.2 Practical Benchmarking for Large, Real Workloads . ........ 8 1.3 Representative, Reproducible and Practical Benchmark Workloads . 11 1.4 Overview ................................... 13 2 Five Year Study of File System Metadata .................... 15 2.1 Methodology ................................. 18 2.1.1 Datacollection ............................ 18 2.1.2 Dataproperties ............................ 19 2.1.3 Datapresentation.. .. .. .. .. .. .. .. 20 2.1.4 Limitations .............................. 22 2.2 Files...................................... 22 2.2.1 Filecountperfilesystem. 22 2.2.2 Filesize................................ 23 2.2.3 Fileage................................ 28 2.2.4 File-nameextensions . 29 2.2.5 Unwrittenfiles ............................ 32 2.3 Directories................................... 34 2.3.1 Directorycountperfilesystem. 34 2.3.2 Directorysize............................. 35 2.3.3 Specialdirectories . 38 2.3.4 Namespacetreedepth . 39 vi Page 2.3.5 Namespacedepthmodel . 45 2.4 SpaceUsage.................................. 47 2.4.1 Capacityandusage. .. .. .. .. .. .. .. 47 2.4.2 Changesinusage........................... 49 2.5 RelatedWork ................................. 52 2.6 Discussion................................... 54 2.7 Conclusion .................................. 56 3 Generating Realistic Impressions for File-System Benchmarking ....... 57 3.1 ExtendedMotivation ............................. 61 3.1.1 DoesFile-SystemStructureMatter? . .. 61 3.1.2 GoalsforGeneratingFSImages . 63 3.1.3 ExistingApproaches . 64 3.2 TheImpressionsFramework . 65 3.2.1 ModesofOperation . .. .. .. .. .. .. .. 65 3.2.2 BasicTechniques.. .. .. .. .. .. .. .. 67 3.2.3 CreatingValidMetadata . 68 3.2.4 ResolvingArbitraryConstraints . .. 74 3.2.5 InterpolationandExtrapolation . .. 80 3.2.6 FileContent.............................. 84 3.2.7 DiskLayoutandFragmentation . 85 3.2.8 Performance ............................. 86 3.3 CaseStudy:DesktopSearch . 87 3.3.1 RepresentativeImages . 87 3.3.2 ReproducibleImages . 91 3.4 OtherApplications .............................. 94 3.4.1 GeneratingRealisticRulesofThumb . 94 3.4.2 TestingHypothesis . 94 3.5 RelatedWork ................................. 96 3.5.1 ToolsforGeneratingFile-SystemImages . ... 96 3.5.2 Tools and Techniques for Improving Benchmarking . ..... 97 3.5.3 ModelsforFile-SystemMetadata . 98 3.6 Conclusion .................................. 98 vii Page 4 Practical Storage System Benchmarking with Compressions .........100 4.1 Background ..................................103 4.1.1 StorageStack.............................104 4.1.2 StorageSystems .. .. .. .. .. .. .. ..105 4.2 TheCompressionsFramework . 107 4.2.1 DesignGoals .............................107 4.2.2 BasicArchitecture . .108 4.2.3 BlockClassificationandDataSquashing . 110 4.2.4 MetadataRemapping. .115 4.2.5 SyntheticContentGeneration . 116 4.3 TheStorageModel ..............................119 4.3.1 ModelExpectationsandGoals . .119 4.3.2 TheDeviceModel . .. .. .. .. .. .. ..120 4.3.3 StorageStackModel . .124 4.4 Evaluation...................................126 4.4.1 ExperimentalPlatform . .127 4.4.2 PerformanceEvaluation . .127 4.4.3 FidelityEvaluation . .130 4.5 RelatedWork .................................138 4.5.1 StorageSystemModelingandEmulation