Scalpel: Optimizing Query Streams Using Semantic Prefetching

Scalpel: Optimizing Query Streams Using Semantic Prefetching by Ivan Thomas Bowman A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2005 c Ivan Thomas Bowman 2005 AUTHOR’S DECLARATION FOR ELECTRONIC SUBMISSION OF A THESIS I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract Client applications submit streams of relational queries to database servers. For simple requests, inter-process communication costs account for a significant portion of user-perceived latency. This trend increases with faster processors, larger memory sizes, and improved database execution algorithms, and this trend is not significantly offset by improvements in communication bandwidth. Caching and prefetching are well studied approaches to reducing user-perceived latency. Caching is useful in many applications, but it does not help if future requests rarely match pre- vious requests. Prefetching can help in this situation, but only if we are able to predict future requests. This prediction is complicated in the case of relational queries by the presence of request parameters: a prefetching algorithm must predict not only a query that will be executed in the future, but also the actual parameter values that will be supplied. We have found that, for many applications, the streams of submitted queries contain patterns that can be used to predict future requests. Further, there are correlations between results of earlier requests and actual parameter values used in future requests. We present the Scalpel system, a prototype implementation that detects these patterns of queries and optimizes request streams using context-based predictions of future requests. Scalpel uses its predictions to provide a form of semantic prefetching, which involves combining a predicted series of requests into a single request that can be issued immediately. Scalpel’s semantic prefetching reduces not only the latency experienced by the application but also the to- tal cost of query evaluation. We describe how Scalpel learns to predict optimizable request patterns by observing the application’s request stream during a training phase. We also describe the types of query pattern rewrites that Scalpel’s cost-based optimizer considers. Finally, we present empirical results that show the costs and benefits of Scalpel’s optimizations. We have found that even when an application is well suited for its original configuration, it may behave poorly when moving to a new configuration such as a wireless network. The optimizations performed by Scalpel take the current configuration into account, allowing it to select strategies that give good performance in a wider range of configurations. v Acknowledgements This document marks a milestone in a course of research that began in the fall of 2000, when I had a long conversation with Kenneth Salem about problematic client applications that had been plaguing my working days. I’d like to thank Ken for provoking me to formalize proposed solutions and for helping me find a thesis-sized research problem within an unruly cloud of ideas. I owe Ken a debt of gratitude for his guidance, encouragement, and understanding. His help has been instrumental in completing this work. I would also like to thank the other members of my examining committee—Kostas Konto- giannis, Jeffrey Naughton, Tamer Ozsu,¨ and David Toman—for their thoughtful comments, ques- tions, and suggestions. This final text has been improved by changes based on their comments on earlier drafts. Over the years, I have been fortunate to have the support of many people, and I would like to thank my family, friends, and colleagues for their help and best wishes. My co-workers at iAny- where Solutions have provided useful ideas and feedback, and helped to give me the flexibility to complete my research while working full-time. In particular, I would like to thank my man- ager, Glenn Paulley, for providing me with challenging problems, useful advice, interesting ref- erences, and unflagging support. I’d also like to thank the other members of the dev qp team for their encouragement and understanding during periods where I was working on multiple tasks: Mike Demko, Dan Farrar, Anil Goel, Anisoara Nica, and Matthew Young-Lai were unfailing in their support. My friends have provided a source of strength and stability in my life, and I would like to thank them for the encouragement, support, and humour they have provided over the years. The banter list provided welcome distraction, as did occasional events. My family has also contributed in uncountably many ways to the successful completion of this work. The Bowmans, Browns, Carters, Farleys, Richardsons, and Sharpes were often in my thoughts. My grandparents, Lloyd and Winnifred Richardson, provided me with model examples. My brother, Don, also provided me with advice and feedback on early versions of this research, and I thank him for that. In particular, I would like to thank my mom, Nancy, for inspiring me by her example, working hard to achieve her goals while still finding time to appreciate the world around her. Finally, it is with love and gratitude that I thank my wife Sherri for helping to motivate me to start this course of study, then for being supportive and tolerant through many late nights, early mornings, and working vacations. vii Dedication For Dr. Lloyd T. Richardson ix Contents 1 Introduction 1 1.1 Motivation...................................... 3 1.2 UsingPrefetching................................ 5 1.3 Application Request Patterns . ...... 6 1.4 Thesis........................................ 8 1.5 Outline ....................................... 8 2 Preliminaries 11 2.1 Scalpel System Architecture . ..... 11 2.2 Pseudo-Code Conventions . 12 2.3 ModelofRequestStreams . 13 2.4 Notation for Strings and Sequences . ....... 15 3 Nested Request Patterns 17 3.1 ExampleofNesting................................ 18 3.2 PatternDetector................................. 19 3.2.1 ContextTree ................................ 19 3.2.2 Tracking Parameter Correlations . ..... 24 3.2.2.1 Overhead of Correlation Detection . 27 3.2.3 Client Predicate Selectivity . ..... 29 3.2.4 Summary of Pattern Detection . 30 3.3 QueryRewriter ................................... 32 3.3.1 NestedExecution.............................. 33 3.3.2 Query Rewrite Preliminaries . 34 3.3.2.1 Algebraic Notation . 36 3.3.2.2 Lateral Derived Tables . 37 3.3.2.3 Choosing The Best Correlation Value . 40 3.3.3 UnifiedExecution ............................. 41 3.3.3.1 The Outer Join Strategy . 42 xi 3.3.3.2 The Outer Union Strategy . 46 3.3.4 Partitioned Execution . 50 3.3.4.1 The Client Hash Join Strategy . 50 3.3.4.2 The Client Merge Join Strategy . 52 3.3.5 RewritingaContextTree. 55 3.3.5.1 Representing Run-Time Behaviour With ACTION Objects . 55 3.3.5.2 The REWRITE-TREE Procedure................. 58 3.3.6 SummaryofQueryRewriter . 60 3.4 Prefetcher ...................................... 62 3.4.1 Mistaken Predictions . 65 3.5 PatternOptimizer ................................ 65 3.5.1 ValidExecutionPlans . 66 3.5.2 RankingPlans ............................... 66 3.5.3 Exhaustive Enumeration . 68 3.6 Experiments..................................... 68 3.6.1 Effects of Client Predicate Selectivity . ........ 70 3.6.2 QueryCosts................................. 72 3.6.3 NumberofColumns ............................ 73 3.6.4 ExecutionCosts .............................. 74 3.6.5 ScalpelOverhead.............................. 76 3.7 Summary of Nested Request Patterns . ...... 76 4 Cost Model 79 4.1 Estimating Per-Request Overhead U0 ........................ 80 4.2 Estimating the Cost of Interpreting Results . ........... 81 4.3 Estimating the Cost of Queries . ..... 82 4.3.1 Estimating the Cost of a Lateral Derived Table . ....... 83 4.3.2 Estimating the Cost of an Outer Union . 85 4.4 SummaryofCostModel .............................. 86 xii 5 Batch Request Patterns 89 5.1 ExampleofBatchPattern. 91 5.2 PatternDetector................................. 94 5.2.1 ModelsofRequestStreams . 95 5.2.1.1 Order-k Models ......................... 96 5.2.1.2 Choosing a Context Length . 99 5.2.1.3 Finite State Models . 101 5.2.1.4 Summary of Request Models . 101 5.2.2 SuffixTrieDetection . 102 5.2.2.1 Implicit Suffix Tries . 103 5.2.2.2 Estimating the Probability of a Future Request . 103 5.2.2.3 Tracking Parameter Correlations . 107 5.2.2.4 Summary of Suffix Trie Detection . 111 5.2.3 A Path Compressed Suffix Trie . 112 5.2.3.1 Estimating the Probability of a Future Request . 118 5.2.3.2 Tracking Correlations . 120 5.2.3.3 Summary of Suffix Tree Detection . 126 5.2.4 Summary of Pattern Detector . 126 5.3 PatternOptimizer ................................ 127 5.3.1 CostBasedOptimization . 129 5.3.1.1 Feasible Prefetches . 132 5.3.1.2 Choosing the Best Feasible Prefetch . 132 5.3.2 Building a Finite-State Model . 135 5.3.3 RemovingRedundancy. 136 5.3.4 Summary of the Pattern Optimizer . 138 5.4 QueryRewriter ................................... 139 5.4.1 Alternative Prefetch Strategies . 140 5.4.2 Rewriting with Join and Union . 144 5.4.3 Representing Run-Time Behaviour With ACTION Objects . 144 5.4.4 SummaryofQueryRewriter . 148 5.5 Prefetcher ...................................... 150 5.5.1 SummaryofPrefetcher. 154 5.6 Experiments..................................... 154 5.6.1 Effectiveness of Semantic Prefetching . ....... 154 5.6.1.1 BatchLength .......................... 156 xiii 5.6.1.2 Useful Prefetches . 157 5.6.1.3 BreakdownofCosts . 159 5.6.2 Scalpel Training Overhead . 160 5.7 Summary of Batch Request Patterns . 161 6 CombiningNestedandBatchRequestPatterns 163 6.1 Example Combining Nesting and Batches . 163 6.2 Combining Context/Suffix Trees . 165 6.3 Current Implementation . 167 6.4 Summary of Combining Nested and Batch Request Patterns . .......... 168 7 Prefetch Consistency 169 7.1 Updates by the Same Transaction .

Scalpel: Optimizing Query Streams Using Semantic Prefetching

Minimax Redundancy for Markov Chains with Large State Space

Digital Communication Systems 2.2 Optimal Source Coding

Principles of Communications ECS 332

IEEE Information Theory Society Newsletter

Itnl0908.Pdf

Evaluation of Image Compression Algorithms for Electronic Shelf Labels

Ieee-Level Awards

IEEE Information Theory Society Newsletter

Principles of Communications ECS 332

IEEE Information Theory Society Newsletter

Digital Communication Systems ECS 452

Distance Path Because, 01 1 Regardless of What Happens Subsequently, This Path Will 0 × 4 Have a Larger Hamming 11 1/10 (1) 1 Distance from Y