Generating and Analyzing Synthetic Workloads Using Iterative Distillation

Generating and Analyzing Synthetic Workloads using Iterative Distillation A Thesis Presented to The Academic Faculty by Zachary A. Kurmas In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy College of Computing Georgia Institute of Technology May 2004 Generating and Analyzing Synthetic Workloads using Iterative Distillation Approved by: Dr. Umakishore Ramachandran, Adviser Dr. Kimberly Keeton (Hewlett-Packard Laboratories) Dr. Kenneth Mackenzie (Reservoir Labs, Inc.) Dr. Yannis Smaragdakis Dr. Panagiotis Manolios Date Approved: 14 May 2004 To Candi, for her love, support, and many years of patience. ACKNOWLEDGEMENTS Those who praise \thinking outside the box" have never had to deal with a gradu- ate student who chose a thesis topic outside every one of his department's \boxes." Many people have gone above and beyond the call of duty to provide me the re- sources and advice needed to finish this dissertation. First, I want to thank Kimberly Keeton, Ralph Becker-Szendy, John Wilkes, Allistar Vietch, and everybody else in the Storage Systems Department at HP Labs for granting me the privilege to join them for a summer, and for continuing to support my research after I had returned to Georgia Tech. Second, I want to thank Ann Chervenak, Ken Mackenzie, Kishore Ramachandran, and all the other faculty at Georgia Tech for providing the moral and financial support that allowed me to pursue my storage systems research. Third, I thank Chad Huneycutt, Josh Fryman, and all the other students at Georgia Tech who spent time editing my papers and critiquing my talks. Nobody was under any obligation to support my research, yet they agreed to work together and see that I was able to complete my dissertation. I would also like to acknowledge the incredible support staff in the College of Computing at Georgia Tech. Barbara Binder, Jennifer Chisholm, Cathy Dunahoo, Barbara Durham, Deborah Mitchell, Linda Williams, and the rest of the administra- tive assistants do an excellent job helping students navigate Georgia Tech's typical university bureaucracy. Neil Bright, Karen Carpenter, and the rest of the Computing and Networking Services staff do a wonderful job maintaining the College of Comput- ing's computing infrastructure and shielding students from the overhead of system administration. These two groups of people have saved me from countless hours of frustration and wasted time. iv TABLE OF CONTENTS ACKNOWLEDGEMENTS . iv LIST OF TABLES . ix LIST OF FIGURES . xi SUMMARY . xiv CHAPTER 1 INTRODUCTION . 1 1.1 Motivation . 1 1.2 Overview of the Distiller . 3 1.3 Contributions . 5 1.4 Organization . 8 CHAPTER 2 BACKGROUND AND RELATED WORK . 9 2.1 Performance evaluation . 9 2.2 Workload characterization . 11 2.3 Synthetic workload generation . 12 2.3.1 Model the source . 13 2.3.2 Reproduce the pattern . 14 2.3.3 Reproduce behavior . 18 2.4 Attribute selection . 19 2.5 Our work . 20 CHAPTER 3 WORKLOAD MODELS, ATTRIBUTES, AND PER- FORMANCE METRICS . 22 3.1 Workload . 22 3.1.1 Our workload model . 22 3.1.2 Open vs. closed model . 24 3.1.3 Describing workload attributes . 24 3.2 Disk arrays and performance . 26 3.3 Attributes and attribute-values . 30 v 3.3.1 Distributions . 30 3.3.2 State transition matrix . 36 3.3.3 Jump distance . 37 3.3.4 Run count . 40 3.3.5 Burstiness . 43 3.3.6 Attribute groups . 44 3.3.7 Attribute size . 46 3.4 Evaluation metrics . 48 3.4.1 Evaluation criteria . 48 3.4.2 Error . 51 CHAPTER 4 EXPERIMENTAL ENVIRONMENT . 53 4.1 Storage systems . 53 4.2 Workloads . 55 4.3 Software . 58 CHAPTER 5 DESIGN AND IMPLEMENTATION . 62 5.1 Overview . 62 5.2 Initial attribute list . 64 5.3 Choosing an attribute group . 65 5.3.1 Single-parameter attribute groups . 66 5.3.2 Two-parameter attribute groups . 68 5.3.3 Search order . 69 5.4 Choosing an attribute . 71 5.5 Completion of Email example . 73 5.6 Limitations . 76 5.6.1 Randomness error . 77 5.6.2 Non-static target workloads . 78 5.6.3 Exploratory workloads provide estimates only . 78 5.7 Summary . 78 vi CHAPTER 6 EVALUATION OF DISTILLER . 80 6.1 Artificial workloads . 80 6.2 Production workloads . 87 6.3 Summary . 94 CHAPTER 7 OPERATIONAL SENSITIVITY . 95 7.1 Demerit figure . 95 7.1.1 Distiller design limitations . 99 7.1.2 Demerit figure limitations . 104 7.1.3 Heavy tails . 113 7.1.4 Lessons learned . 115 7.2 Internal Threshold . 117 7.3 Size / Accuracy tradeoff . 125 7.3.1 OpenMail . 126 7.3.2 OLTP . 133 7.3.3 Lessons from tradeoff study . 138 7.4 Summary . 140 CHAPTER 8 SYNTHETIC WORKLOAD USEFULNESS . 141 8.1 Prefetch length . 141 8.1.1 OpenMail . 142 8.1.2 OLTP . 147 8.1.3 DSS . 150 8.2 Stripe unit size . 153 8.2.1 OpenMail . 154 8.2.2 OLTP . 157 8.2.3 DSS . 160 8.3 Summary . 160 CHAPTER 9 CONCLUSIONS AND FUTURE WORK . 164 APPENDIX A | GENERATION TECHNIQUES . 167 vii APPENDIX B | DEMERIT FIGURE ALGORITHMS . 175 REFERENCES . 177 VITA . 182 viii LIST OF TABLES Table 1 Example workload . 23 Table 2 Example jump distances and modified jump distances . 24 Table 3 Joint distribution of operation type and request size . 32 Table 4 State transition matrix for operation type . 36 Table 5 Jump distance within state . 38 Table 6 Groups of candidate attributes . 45 Table 7 Comparison of FC-30 and FC-60 . 54 Table 8 Summary of workloads . 56 Table 9 Summary of Pantheon configurations . 56 Table 10 Example of the subtractive method for frequest sizeg . 66 Table 11 Example of the rotated frequest sizeg workload . 67 Table 12 Request size and operation type rotated together . 68 Table 13 Request size and operation type rotated separately . 68 Table 14 Workload parameters for target artificial workloads. 82 Table 15 Results of distilling artificial workloads . 84 Table 16 Summary of final synthetic workloads . 87 Table 17 Incremental results of distilling production OpenMail workloads . 89 Table 18 Incremental results of distilling production OLTP and DSS workloads 90 Table 19 Replay errors of production workloads . 92 Table 20 Randomness errors of production workloads . 92 Table 21 Effects of modifying the demerit figure . 96 Table 22 Incremental results of distilling OM using different demerit figures . 97 Table 23 Incremental results of distilling OLTP using different demerit figures 98 Table 24 Incremental results of distilling DSS using different demerit figures 99 Table 25 Effects of subtractive workload's randomness error on flocationg evaluation . 100 Table 26 Difference between CDFs for rotated flocationg workloads . 101 ix Table 27 Attribute chosen given different rotate amounts . 102 Table 28 Effects of choosing A5 instead of A1 when using MRT . 103 Table 29 Comparison of test workloads illustrating differences between RMS and log area. 104 Table 30 Comparison of finterarrival timeg attributes chosen using log area and RMS demerit figures . 106 Table 31 Comparison of fop. type, locationg attributes chosen using RMS and log area . 110 Table 32 Comparison of fop. type, locationg attributes chosen for OLTP . 111 Table 33 Counterintuitive log area results . ..

Load more