Power Hints for Query Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Power Hints for Query Optimization Nicolas Bruno, Surajit Chaudhuri, Ravi Ramamurthy Microsoft Research, USA {nicolasb,surajitc,ravirama}@microsoft.com Abstract— Commercial database systems expose query hints database administrators need to “tweak” and correct the wrong to address situations in which the optimizer chooses a poor choices that led the optimizer to pick a poor execution plan. plan for a given query. However, current query hints are not For that purpose, a common mechanism found in commer- flexible enough to deal with a variety of non-trivial scenarios. In this paper, we introduce a hinting framework that enables cial database systems is called query hinting. Essentially, query the specification of rich constraints to influence the optimizer hints instruct the optimizer to constrain its search space to to pick better plans. We show that while our framework unifies a certain subset of execution plans (see Section VIII for an previous approaches, it goes considerably beyond existing hinting overview of existing query hints in commercial products). The mechanisms, and can be implemented efficiently with moderate following example clarifies the concept and usage of query changes to current optimizers. hints. Consider the query below: I. INTRODUCTION SELECT R.a, S.b, T.c FROM R, S, T Relational database management systems (DBMSs) let users WHERE R.x = S.y AND S.y = T.z AND R.d < 10 AND S.e > 15 AND T.f < 20 specify queries using high level declarative languages such as SQL. The query optimizer of the DBMS is then responsible for and suppose that the optimizer returns the query plan in finding an efficient execution plan to evaluate the input query. Figure 1(a). Furthermore, suppose that the cardinality of For that purpose, cost-based optimizers search a large space of σR.d<10 is severely underestimated, and thus the index-nested- alternative execution plans, and choose the one that is expected loop join receives many more tuples than what was anticipated. to be evaluated in the least amount of time. In doing so, query In this case, the query might perform too many index seeks optimizers rely on a cost model that estimates the resources on S and result in a bad plan overall. In this situation, DBAs that are needed for each alternative plan in consideration. can optimize the query using a special hint OPTION(HASH Optimizer cost models are usually very complex, and de- JOIN)1 that is simply concatenated to the query string. In this pend on cardinality estimates, plan properties, and specialized special mode, the optimizer is forced to choose only execution cost formulas for each operator in an execution plan. It is well plans that use hash-based join alternatives. The result of this known (e.g., see [1]) that currently deployed cost models can optimization might produce the execution plan of Figure 1(b). be inherently inaccurate due to several factors: Note that this alternative plan uses a different join order, which might make a DBA wonder whether the original join order - Cardinality estimates, especially those on intermediate was indeed more efficient. To answer this question, the DBA query expressions, are inferred based on statistical infor- would want to execute the plan in Figure 1(b) and also an mation on base tables making simplifying assumptions alternative plan that shares the same join order with the plan on correlation and uniformity. in Figure 1(a) (but does not use an index-nested-loop join - Calibration constants used for costing (e.g., the cost of a between R and S). This new plan, shown in Figure 1(c) can single disk I/O) might not be fully accurate or consistent be obtained by using the expanded hint OPTION(HASH JOIN, with respect to the underlying hardware. FORCE ORDER), which only considers plans whose join order - Cost formulas cannot possibly capture every detail on the is compatible with the order of tables in the SQL query string. complex algorithms used in execution plans. After actually executing all these alternatives, the DBA finally - Runtime parameters affect execution costs of queries chooses the most appropriate plan and uses the corresponding (e.g., the amount of buffer pool memory affects the cost hint in a production environment. We make the following of an index-nested-loops join operator, which would find observations based on this example. different numbers of tuples already cached in memory). - Finally, even if we could determine the precise cost of Hint granularity/expressivity: Current hints are usually a single query in isolation, concurrent query executions not flexible enough for many scenarios. In most systems, can have a drastic impact in the true cost of a query, due forcing the first join in Figure 1(a) to be hash-based would to factors like contention of indexes, excessive locking, also result in either a fixed join order (e.g., by using local join and hotspots, among others. hints), or the hash-join algorithm being used for all joins in the query block (e.g., by using the HASH JOIN hint discussed As a consequence, query optimizers do not always produce optimal plans, and sometimes, for complex queries they might 1We use Microsoft SQL Server syntax for the sample hints, but note that even return plans that are decisively poor. In such cases, the same concepts are also applicable to other DBMSs with minor variations. INL Join Hash Join Hash Join INL Join Index Seek(T) Index Scan(R) Hash Join Hash Join Index Scan(T) Index Scan(R) Index Seek(S) Index Scan(T) Index Scan(S) Index Scan(R) Index Scan(S) (a) Original Plan. (b) Using OPTION(HASH JOIN). (c) Using OPTION(HASH JOIN, FORCE ORDER). Fig. 1. Using query hints to fix bad plans. above). Hints are also not flexible enough to express certain an architecture that enables interactive query tuning sessions constraints of the structure of the execution plan, such as when using Phints. In Section VI we discuss algorithmic extensions trying to force an early pre-aggregation below a join. In many and open problems for our techniques. Finally, in Section VII cases, the granularity at which hints constrain the search space we report an experimental evaluation of our approach. is either too coarse, which effectively prevents experienced II. SEARCH SPACE MODEL DBAs from fine-tuning a poorly optimized query, or instead requires a fully specified execution plan [2], which might be In this paper we model hints as constraints over the search too inflexible and tedious for all but the simplest queries. There space of execution plans considered by the query optimizer. is also no unifying principle behind available hints (which are We next describe an abstract representation of such search a heterogeneous bag of guidelines that DBAs can apply). space, which is implicitly used in one form or another by all Interactivity: The process of tuning a poorly performing optimizers. The idea is to compactly represent the very large query is usually interactive. DBAs might try some hint, set of possible execution plans by sharing as much structure observe the resulting plan picked by the optimizer, optionally as possible. Specifically, one can model the space of plans execute it in scenarios that resemble the production environ- of a given query by using a directed acyclic graph, which ment, and keep modifying the hints to obtain a plan that is we call PlanSpace.APlanSpace graph contains two kinds better than what the optimizer originally produced without of nodes, denoted OP and OR nodes. Each OP node in the hints. We believe that this trial and error process is inherent PlanSpace corresponds to a specific operator in an execution to the tuning of a poorly optimized query. Hints enable DBAs plan (e.g., hash-join), and all of its children (if any) are OR to enhance the search strategy of the query optimizer by using nodes. In turn, each OR node in the PlanSpace corresponds to knowledge that is not captured by its cost model. a set of equivalent alternatives that can be used interchangeably In this paper, we argue DBAs require a general and con- to form execution plans (e.g. RST), and every one of sistent framework for constraining the search space of current its children (there is at least one) are OP nodes. We illustrate optimizers, which goes beyond what is currently offered in this representation with the query below: 2 commercial DBMSs . In this paper, we address hints that SELECT * FROM R, S influence the plan choice of the optimizer. Our solution should WHERE R.x = S.y satisfy the following requirements: ORDER BY R.x - Expressivity: Simple constraints should be easy to spec- Figure 2 shows a fragment of the PlanSpace corresponding ify, but the framework should allow specifying non-trivial to the query above. In the figure, OR nodes are represented by constraints to ensure DBAs have the necessary flexibility. black circles, and OP nodes are represented with boxes that - Efficiency: Obtaining execution plans for given sets of query hints should be efficient (especially for interactive tuning sessions). 1 - Unification: The solution should not degrade to yet an- Sort(R.x) other collection of heterogenous tools, but instead should 2 be based on some unifying methodology. Merge Join Merge Join Hash Join Hash Join The rest of the paper is structured as follows. First, in Section II we present a conceptual model to represent the search space of a query optimizer. Then, in Section III we introduce Phints (short for Power Hints), a small language 3 4 used to specify rich constraints over the optimizer’s search space of plans. Then, in Section IV we present efficient Sort(R.x) Sort(S.y) algorithms to obtain the plan that satisfies Phints constraints and has the least estimated cost.