Automated Selection of Materialized Views and Indexes for SQL Databases Sanjay Agrawal Surajit Chaudhuri Vivek Narasayya Microsoft Research Microsoft Research Microsoft Research [email protected] [email protected] [email protected] Abstract large number of recent papers in this area, most of the Automatically selecting an appropriate set of prior work considers the problems of index selection and materialized views and indexes for SQL materialized view selection in isolation. databases is a non-trivial task. A judicious choice Although indexes and materialized views are similar, must be cost-driven and influenced by the a materialized view is much richer in structure than an workload experienced by the system. Although index since a materialized view may be defined over there has been work in materialized view multiple tables, and can have selections and GROUP BY selection in the context of multidimensional over multiple columns. In fact, an index can logically be (OLAP) databases, no past work has looked at considered as a special case of a single-table, projection the problem of building an industry-strength tool only materialized view. This richness of structure of for automated selection of materialized views materialized views makes the problem of selecting and indexes for SQL workloads. In this paper, materialized views significantly more complex than that we present an end-to-end solution to the problem of index selection. We therefore need innovative of selecting materialized views and indexes. We techniques for dealing with the large space of potentially describe results of extensive experimental interesting materialized views that are possible for a given evaluation that demonstrate the effectiveness of set of SQL queries and updates over a large schema. our techniques. Our solution is implemented as Previous papers on materialized view selection typically part of a tuning wizard that ships with Microsoft ignore this problem. Rather, they focus only on the SQL Server 2000. “search” problem of picking an attractive set of materialized views from a given set. Thus, they implicitly 1. Introduction assume that the given set is the set of all potentially interesting materialized views for the workload. Such an In addition to indexes, today’s commercial SQL database approach is simply not scalable in the context of SQL systems also support creation and use of materialized workloads. Finally, to be an effective solution, it is views. The presence of the right materialized views can important to ensure that the solution to this problem is significantly improve performance, particularly for robust and takes into account the complexities of full SQL decision support applications. However, to realize this as a query language, as well as pragmatic issues such as potential, a judicious selection of materialized views is the fact that in today’s commercial database systems, it is crucial. often the case that the language of materialized views is a Conceptually, both indexes and materialized views are restricted subset of the language of queries. For example, physical structures that can significantly accelerate a materialized view may not be allowed to contain nested performance. An effective physical database design tool sub-queries. must therefore take into account the interaction between In this paper, we present an architecture and novel indexes and materialized views by considering them algorithms for addressing each of the above problems. together to optimize the physical design for the workload Our work leverages previous work we did in building an on the system. Ignoring this interaction can significantly index selection tool for Microsoft SQL Server [4,5], but compromise the quality of recommendations. Despite a requires several significant innovations. We establish that in order to pick a physical design consisting of indexes Permission to copy without fee all or part of this material is and materialized views, it is critical to search over the granted provided that the copies are not made or distributed for combined space of indexes and materialized views direct commercial advantage, the VLDB copyright notice and (Section 5). We quantify the impact on quality of not the title of the publication and its date appear, and notice is enumerating this space together, particularly in the given that copying is by permission of the Very Large Data Base presence of storage constraints or updates. Second, we Endowment. To copy otherwise, or to republish, requires a fee present a principled way to identify a much smaller set of and/or special permission from the Endowment. candidate materialized views such that searching over the Proceedings of the 26th International Conference on Very reduced space of candidate materialized views preserves Large Databases, Cairo, Egypt, 2000 most of the gains of searching the entire space of possible 496 materialized views, at a fraction of the enumeration cost As mentioned in the introduction, searching the space (Section 4). We introduce two key techniques that form of all syntactically relevant indexes and materialized the basis of a scalable approach for candidate materialized views for a workload is infeasible in practice, particularly view selection. First, we show how to identify interesting when the workload is large or complex. Therefore, it is sets of tables such that we need to consider materialized crucial to eliminate spurious indexes and materialized views only over such sets of tables. Next, we present a views from consideration early, thereby focusing the view merging technique that identifies candidate search on a smaller, and interesting subset. The candidate materialized views that while not optimal for any single selection module is responsible for identifying a set of query, can be beneficial to multiple queries in the traditional indexes, materialized views and indexes on workload. The techniques presented in this paper are materialized views for the given workload that are worthy designed to be robust for handling the generality of SQL of further exploration. Efficient selection of candidate as well as other pragmatic issues arising in index and materialized views is a key contribution of our work. For materialized view selection. These techniques have the purposes of this paper, we assume that candidate enabled us to build an industry-strength physical database indexes have already been picked. For details on how design tool that can determine an appropriate set of candidate indexes may be chosen for a workload, we refer indexes, materialized views (and indexes on materialized the reader to [4]. views) for a given database and workload consisting of SQL queries and updates. This tool is now part of Workload Microsoft SQL Server 2000’s upcoming release. The extensive experimental results in this paper (Section 6) demonstrate the value of our proposed techniques. This work was done as part of the AutoAdmin [1] research Syntactic structure project at Microsoft, which explores novel techniques to selection make databases self-tuning. Microsoft SQL 2. Architecture for Index and Materialized Server View Selection Candidate Candidate An architectural overview of our approach to index Index Materialized Selection View Selection and materialized view selection is shown in Figure 1. We Configuration assume that we are given a representative workload for Simulation which we need to recommend indexes and materialized and Cost views. One way to obtain such a workload is to use the Estimation logging capability of modern database systems to capture Module a trace of queries and updates faced by the system. Configuration Alternatively, customer or organization specific Enumeration benchmarks may be used. As in our previous work on index selection [4], the key components of the architecture are: syntactic structure selection, candidate selection, configuration enumeration, and configuration Final simulation and cost estimation. Recommendation Given a workload, the first step is to identify syntactically relevant indexes, materialized views and Figure 1. Architecture of Index and Materialized View indexes on materialized views that can potentially be used Selection Tool to answer the query. For example, consider a query Q: SELECT Sum(Sales) FROM Sales_Data WHERE City = Once we have chosen a set of candidate indexes and ‘Seattle’. For the query Q, the following materialized candidate materialized views, we need to search among views (among others) are syntactically relevant: v1: these structures to determine the ideal physical design, SELECT Sum(Sales) FROM Sales_Data WHERE City = henceforth called a configuration. In our context, a ‘Seattle’. v2: SELECT City, Sum(Sales) FROM configuration will consist of a set of traditional indexes, Sales_Data GROUP BY City. v3: SELECT City, Product, materialized views and indexes on materialized views. In Sum(Sales) FROM Sales_Data GROUP BY City, this paper we will not discuss issues related to selection of Product. Optionally, we can consider additional indexes indexes on materialized views due to lack of space. on the columns of the materialized view. Like indexes on Despite the remarkable pruning achieved by the candidate base tables, indexes on materialized views can be single- selection module, searching through this space in a naïve column or multi-column, clustered or non-clustered, with fashion by enumerating all subsets of structures is the restriction that a given materialized view can have at infeasible. We adopt the same
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-