
Materialized Views and Data Warehouses Nick Roussop oulos Department of Computer Science and Institute of Advanced Computer Studies University of Maryland [email protected] 2 The Multifaceted Form of Views Relational views have several forms: Abstract A data warehouse is a redundant collection pureprogram: an unmaterialized view is a pro- gram sp eci cation, \the intension", that gener- of data replicated from several p ossibly dis- ates data. Query mo di cation [Sto75] and com- tributed and lo osely coupled source databases, + organized to answer OLAP queries. Rela- piled queries [ABC 76]were the rst techniques tional views are used b oth as a sp eci cation exploiting views{ their basic di erence is that the technique and as an execution plan for the rst is used as a macro that do es not optimize until run-time while the second stores optimized derivation of the warehouse data. In this p o- execution plans. Such a view form is a pure pro- sition pap er, we summarize the versatilityof gram with no extensional attachments. Each time relational views and their p otential. the view program is invoked, it generates materi- alizes the data at a cost that is roughly the same 1 Views for eachinvo cation. The imp ortance of the \algebraic closedness" of the relational mo del has not b een recognized enough in derived data: a materialized view is \the exten- its 27 years of existence. Although a lot of energy sion" of the pure program form and has the char- has b een consumed on dogmatizing on the \relational acteristics of data likeany other relational data. purity", on its interface simplicity, on its mathemati- Thus, it can b e further queried to build views- cal foundation, etc., there has not b een a single pap er on-views or collectively group ed [Pap94] to build with a central fo cus on the imp ortance of relational super-views. The derivation op erations are at- views, their versatility, and their yet-to-b e exploited tached to materialized views. These pro cedural p otential. attachments along with some \delta" relational What is a relational view? Is it a program? Is it algebra are used to p erform incremental up dates data? Is it an index? Is it an OLAP aggregate? It is on the extension. all these. And a lot more. Below I summarize the most imp ortant uses, techniques, and b ene ts p ertaining to pure data: when materialized views are converted views. Note that the cited work here is not meantto to snapshots, the derivation pro cedure is detached b e exhaustive but representative and easily accessible and the views b ecome pure data that is not main- from my short-term memory. tainable pure data is at the opp osite end of the sp ectrum from pure program. The copyright of this paper belongs to the paper's authors. Per- mission to copy without fee al l or part of this material is granted pure index: view indexes [Rou82b] and View- provided that the copies are not made or distributed for direct commercial advantage. Caches [Rou91] illustrate this avor of views. Pro ceedings of the 4th KRDB Workshop Their extension has only p ointers to the underly- Athens, Greece, 30-August-1997 ing data which are dereferenced when the values F. Baader, M.A. Jeusfeld, W. Nutt, eds. are needed. Like all indexing schemes, the imp or- tance of indexes lies in their organization, which http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-8/ Page 1 facilitates easy manipulation of p ointers and ef- JMRS93]. The goals of these studies range from im- cient single-pass dereferencing, and thus avoids proving query optimization and pro cessing to supp ort- thrashing. ing rules in active databases, to query pro cessing in client-server and distributed/replicated database ar- hybrid data & index: a partially materialized view chitectures, to handling time queries, to obtaining ef- [BR96] stores some attributes as data while the cient up date dissemination, to avoiding exp ensive rest are referenced through p ointers. This form computations of external predicates, etc. All these combines data and indexes. B-trees, Join indexes techniques have one common underlying theme: the [Val87], star-indexes [Sys96] and most of the other re-use of views to save cost. indexing schemes b elong to this category, with ap- Amortization and re-use of views can only b e p os- 2 propriate schema mapping for translating p oint- sible if they can b e discovered by the query optimizer ers to record eld values. Note that in this form, which decides to plug-in those views which reduce the the data values are drawn directly from the un- cost of the query. The b ene ts are multiplied in a derlying relations and no transformation to these multi-user environment with a lot of shared access 1 values is required . to views. Despite this, only the ADMS prototyp e has extended the query optimizer and its cost mo del OLAP aggregate/indexing: a data cub e [GBLP96] [CR94b] to include in its plan selection materialized is a set of materialized or indexed views [GHRU96, views, ViewCaches, and incremental access metho ds RKR97]. They corresp ond to pro jections of the and a tailored bu er manager , as well as a tailored multi-dimensional space data to lesser dimension- bu er manager designed to supp ort these access meth- ality subspaces and store aggregate values in it. o ds [CR93]. However, b oth IBM and Microsoft plan to In this form, the data values are aggregated from incorp orate similar constructs in their DB2 and Sequel a collection of underlying relation values. Sum- Server future releases. mary tables and Star Schemas [Sys96] b elong in The most common technique for discovering views this form the latter b elongs here as muchasin in any of its forms is subsumption [Rou82a, Fin82, the previous category. LY85, Rou91, BJNS94 ]. Subsumption in its most gen- eral form is an undecidable problem, but for the most Each of these forms is used by some comp onentof common queries can b e reduced to an NP-complete a relational system. Having a uni ed view of all forms problem. For simple conjunctive query views, it fur- of relational views is imp ortant in recognizing com- ther reduces to p olynomial-time and very ecient al- monalities, re-using implementation techniques, and gorithms [Rou82a, CR94b]. discovering p otential uses not yet exploited. In a data warehouse where query execution and I/O are magni ed, the mandate for re-use cannot b e ig- 3 Discovery and Re-use of Views nored. Furthermore, in an OLAP environment, unlike OLTP, up dates come in bulk rather than a few-at- RDBMSs do nothing else but generate or access mate- a-time, making incremental up date techniques more rialized views 24 hours a day whether these are prede- e ectively amortized [RKR97]. Therefore, query op- ned views, results of compiled queries, ad ho c queries, + timizers based on materialized view fragments are a or even materialized view fragments [RCK 95], i.e., necessity.At this p oint, data warehouses rely solely temp orary results generated during the execution of on users' memory for re-using precomputed summary a larger query. Unfortunately, commercial RDBMSs tables. This severely limits their p erformance p oten- discard these views immediately after they are deliv- tial. ered to the user or to a subsequent execution phase. The cost for generating the views is for one-time-use 4 Pro cessing of Views only instead of b eing amortized over multiple and/or shared accesses [Rou91]. Now let's examine view pro cessing for all the view Caching query intermediate results for sp eeding forms except for the pure data snapshots which are up intra- and inter-query pro cessing has b een stud- not maintainable. View pro cessing involves view scan- ied widely [Fin82,LY85, Rou91, Sel87, Jhi88, DR92, ning, incremental up date, or b oth applied simultane- AL80, Rou82b, Rou91, Sel88, Jhi88, RK86b, BALT86, ously. Scanning and incremental up date of views im- Fin82,LY85, DR92, HS93, RK86b, Han87a, Han87b, ply sp ecial lo cks, lo cking proto cols [RES93], autho- 1 rization [RB85], and consistency proto cols for asyn- This is how the indexed form is almost exclusively used chronous up dates from multiple sources [ZGMHW95]. although there is no intrinsic reason for not applying a transfor- mation function, other than the identity one, to the underlying 2 values b efore indexing them- e.g., calibrate the values b efore the user cannot b e aware of views generated by the system entered in a B-tree. and other users. Page 2 I will concentrate here on p erformance issues. dating is not a factor any more. Therefore, minimal dereferencing is a go o d target optimization. Partially View scanning in the pure program view form is typ- materialized views [BR96] which materialize only the ically the same as re-execution of the query that cre- subset of the attributes useful for the incremental up- ated the view. There is no p erformance b ene t for un- date, outer-joins instead of joins, or other appropriate materialized views other than predicting re-execution attribute caching techniques [Sta89] are b est suited. cost more accurately after the rst time. The p erfor- On the other hand, fully materialized views are cum- mance is bad but predictable. Scanning a materialized b ersome and generate a lot of unnecessary I/O and view has a cost that dep ends on the ratio of the useful data movement for just up dating views that are to b e tuples in it to answer a given query, called density of used in the future.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-