Materialized Views and Data Warehouses

Nick Roussop oulos

Department of Computer Science

and

Institute of Advanced Computer Studies

University of Maryland

[email protected]

2 The Multifaceted Form of Views

Relational views have several forms:

Abstract

A is a redundant collection  pureprogram: an unmaterialized is a pro-

gram sp eci cation, \the intension", that gener-

of data replicated from several p ossibly dis-

ates data. Query mo di cation [Sto75] and com-

tributed and lo osely coupled source ,

+

organized to answer OLAP queries. Rela- piled queries [ABC 76]were the rst techniques

tional views are used b oth as a sp eci cation exploiting views{ their basic di erence is that the

technique and as an execution plan for the rst is used as a macro that do es not optimize

until run-time while the second stores optimized

derivation of the warehouse data. In this p o-

execution plans. Such a view form is a pure pro-

sition pap er, we summarize the versatilityof

gram with no extensional attachments. Each time

relational views and their p otential.

the view program is invoked, it generates materi-

alizes the data at a cost that is roughly the same

1 Views

for eachinvo cation.

The imp ortance of the \algebraic closedness" of the

relational mo del has not b een recognized enough in

 derived data: a materialized view is \the exten-

its 27 years of existence. Although a lot of energy

sion" of the pure program form and has the char-

has b een consumed on dogmatizing on the \relational

acteristics of data likeany other relational data.

purity", on its interface simplicity, on its mathemati-

Thus, it can b e further queried to build views-

cal foundation, etc., there has not b een a single pap er

on-views or collectively group ed [Pap94] to build

with a central fo cus on the imp ortance of relational

super-views. The derivation op erations are at-

views, their versatility, and their yet-to-b e exploited

tached to materialized views. These pro cedural

p otential.

attachments along with some \delta" relational

What is a relational view? Is it a program? Is it

algebra are used to p erform incremental up dates

data? Is it an index? Is it an OLAP aggregate? It is

on the extension.

all these. And a lot more. Below I summarize the most

imp ortant uses, techniques, and b ene ts p ertaining to

 pure data: when materialized views are converted

views. Note that the cited work here is not meantto

to snapshots, the derivation pro cedure is detached

b e exhaustive but representative and easily accessible

and the views b ecome pure data that is not main-

from my short-term memory.

tainable pure data is at the opp osite end of the

sp ectrum from pure program.

The copyright of this paper belongs to the paper's authors. Per-

mission to copy without fee al l or part of this material is granted

 pure index: view indexes [Rou82b] and View-

provided that the copies are not made or distributed for direct

commercial advantage.

Caches [Rou91] illustrate this avor of views.

Pro ceedings of the 4th KRDB Workshop

Their extension has only p ointers to the underly-

Athens, Greece, 30-August-1997

ing data which are dereferenced when the values

F. Baader, M.A. Jeusfeld, W. Nutt, eds.

are needed. Like all indexing schemes, the imp or-

tance of indexes lies in their organization, which http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-8/

Page 1

facilitates easy manipulation of p ointers and ef- JMRS93]. The goals of these studies range from im-

cient single-pass dereferencing, and thus avoids proving and pro cessing to supp ort-

thrashing. ing rules in active databases, to query pro cessing in

client-server and distributed/replicated ar-

 hybrid data & index: a partially materialized view

chitectures, to handling time queries, to obtaining ef-

[BR96] stores some attributes as data while the

cient up date dissemination, to avoiding exp ensive

rest are referenced through p ointers. This form

computations of external predicates, etc. All these

combines data and indexes. B-trees, Join indexes

techniques have one common underlying theme: the

[Val87], star-indexes [Sys96] and most of the other

re-use of views to save cost.

indexing schemes b elong to this category, with ap-

Amortization and re-use of views can only b e p os-

2

propriate schema mapping for translating p oint-

sible if they can b e discovered by the query optimizer

ers to record eld values. Note that in this form,

which decides to plug-in those views which reduce the

the data values are drawn directly from the un-

cost of the query. The b ene ts are multiplied in a

derlying relations and no transformation to these

multi-user environment with a lot of shared access

1

values is required .

to views. Despite this, only the ADMS prototyp e

has extended the query optimizer and its cost mo del

 OLAP aggregate/indexing: a data cub e [GBLP96]

[CR94b] to include in its plan selection materialized

is a set of materialized or indexed views [GHRU96,

views, ViewCaches, and incremental access metho ds

RKR97]. They corresp ond to pro jections of the

and a tailored bu er manager , as well as a tailored

multi-dimensional space data to lesser dimension-

bu er manager designed to supp ort these access meth-

ality subspaces and store aggregate values in it.

o ds [CR93]. However, b oth IBM and Microsoft plan to

In this form, the data values are aggregated from

incorp orate similar constructs in their DB2 and Sequel

a collection of underlying values. Sum-

Server future releases.

mary tables and Star Schemas [Sys96] b elong in

The most common technique for discovering views

this form the latter b elongs here as muchasin

in any of its forms is subsumption [Rou82a, Fin82,

the previous category.

LY85, Rou91, BJNS94 ]. Subsumption in its most gen-

eral form is an undecidable problem, but for the most

Each of these forms is used by some comp onentof

common queries can b e reduced to an NP-complete

a relational system. Having a uni ed view of all forms

problem. For simple conjunctive query views, it fur-

of relational views is imp ortant in recognizing com-

ther reduces to p olynomial-time and very ecient al-

monalities, re-using implementation techniques, and

gorithms [Rou82a, CR94b].

discovering p otential uses not yet exploited.

In a data warehouse where query execution and I/O

are magni ed, the mandate for re-use cannot b e ig-

3 Discovery and Re-use of Views

nored. Furthermore, in an OLAP environment, unlike

OLTP, up dates come in bulk rather than a few-at-

RDBMSs do nothing else but generate or access mate-

a-time, making incremental up date techniques more

rialized views 24 hours a day whether these are prede-

e ectively amortized [RKR97]. Therefore, query op-

ned views, results of compiled queries, ad ho c queries,

+

timizers based on materialized view fragments are a

or even materialized view fragments [RCK 95], i.e.,

necessity.At this p oint, data warehouses rely solely

temp orary results generated during the execution of

on users' memory for re-using precomputed summary

a larger query. Unfortunately, commercial RDBMSs

tables. This severely limits their p erformance p oten-

discard these views immediately after they are deliv-

tial.

ered to the user or to a subsequent execution phase.

The cost for generating the views is for one-time-use

4 Pro cessing of Views

only instead of b eing amortized over multiple and/or

shared accesses [Rou91].

Now let's examine view pro cessing for all the view

Caching query intermediate results for sp eeding

forms except for the pure data snapshots which are

up intra- and inter-query pro cessing has b een stud-

not maintainable. View pro cessing involves view scan-

ied widely [Fin82,LY85, Rou91, Sel87, Jhi88, DR92,

ning, incremental up date, or b oth applied simultane-

AL80, Rou82b, Rou91, Sel88, Jhi88, RK86b, BALT86,

ously. Scanning and incremental up date of views im-

Fin82,LY85, DR92, HS93, RK86b, Han87a, Han87b,

ply sp ecial lo cks, lo cking proto cols [RES93], autho-

1

rization [RB85], and consistency proto cols for asyn-

This is how the indexed form is almost exclusively used

chronous up dates from multiple sources [ZGMHW95]. although there is no intrinsic reason for not applying a transfor-

mation function, other than the identity one, to the underlying

2

values b efore indexing them- e.g., calibrate the values b efore the user cannot b e aware of views generated by the system

entered in a B-tree. and other users.

Page 2

I will concentrate here on p erformance issues. dating is not a factor any more. Therefore, minimal

dereferencing is a go o d target optimization. Partially

View scanning in the pure program view form is typ-

materialized views [BR96] which materialize only the

ically the same as re-execution of the query that cre-

subset of the attributes useful for the incremental up-

ated the view. There is no p erformance b ene t for un-

date, outer-joins instead of joins, or other appropriate

materialized views other than predicting re-execution

attribute caching techniques [Sta89] are b est suited.

cost more accurately after the rst time. The p erfor-

On the other hand, fully materialized views are cum-

mance is bad but predictable. Scanning a materialized

b ersome and generate a lot of unnecessary I/O and

view has a cost that dep ends on the ratio of the useful

data movement for just up dating views that are to b e

tuples in it to answer a given query, called density of

used in the future.

the view. For a 100 ratio, scanning a materialized

It should b e mentioned here that the issue of self-

view is optimal b ecause it has all the data for answer-

maintenance [GJM96] of views is imp ortant. However,

ing the query compacted in a tight storage space. If the

the additional information necessary for the incremen-

densityislow, the noise can b e more than the amount

tal up date and its storage organization must b e well

of useful data. For the index view form, scanning cost

designed since this a ects p erformance. For example,

can range from near optimal, when the p ointers are

the storage organization of the deltas mayhavean

aligned and p oint to a tight space, to very high, when

equivalent adverse e ects to thrashing if their tuples

p ointer dereferencing causes thrashing similar to un-

are scattered in an unclustered space.

clustering indexes in RDBMSs or in OODBMSs. For

this reason, in the index form, it is very imp ortant that

the p ointers b e well organized [RK87, AR90] and use a

5 Data Mining within Views

tailored bu er manager [CR93] whichavoids thrashing

Views which are materialized or partially materialized

caused by the multi-dimensionality of the view. View-

contain valuable information in them suchasvalue dis-

Cache uses a form of puzzle-shap ed packed R-trees

tributions and other statistical information that are

[RL85] and tailored replacement strategies. Cu-

much more accurate than the clumsy ones maintained

b etrees [RKR97] utilize multi-dimensional compressed

by the DBMS statistics utility which is run once in

and packed R-trees [RL85]. Again, for p erformance,

a while. Having accurate value distributions avoids

the organization is the only thing that matters.

those unrealistic assumptions and guestimates based

Incremental up date techniques for views are ma-

on the uniformity assumption. Since RDBMSs do

ture, as they go back for more than a decade of re-

nothing else but generate materialized views, a smarter

search [BLT86, RK86b, Rou87, Rou91, RES93]. The

system can extract this valuable meta-data and, with a

same techniques were the foundation for the manage-

query feedback mechanism [CR94a], maintain precise

ment of replicated data [RK86a, RK86b] which found

statistics. This was shown to incur no additional I/O

its way to the log-based to ols of commercial

cost and have negligible CPU only overhead for a big

RDBMSs.

return, as the error in the estimation is very close to

Incremental up date of a view dep ends again on its

none.

underlying form. In its un-materialized form the cost

of an incremental up date is the cost of re-execution.

6 Harvesting the Executions of Views

For other forms wemust distinguish two cases. The

rst case o ccurs when the incremental up date is done

Consider a view as a program again. Whether we exe-

in real-time during the query execution. In this case,

cute it to materialize it or to incrementally up date it,

the up date is combined with scanning and therefore,

the system's b ehavior can b e observed during this ex-

the cost of incremental up date is subsumed by the

ecution and b e used to adapt resource allo cation dur-

scanning cost. This was the main ob jective of the

ing subsequent executions. For example, bu er page

one-pass incremental up date algorithms of ViewCache.

fault b ehavior during a view's execution can b e accu-

The subsumed cost savings are signi cant and this was

rately predicted using regression from a few executions

shown by comparing worst case analysis estimations

of it [CR93]. Using this information, bu er allo cation

against actual timed exp eriments [RES93]. This was

is done much more eciently using a marginal gain

esp ecially true for views-on-views b ecause of the elim-

technique [FNS91]. Clearly, bu ers are some of the

ination of storing and accessing intermediate results.

most imp ortant resources to b e managed, but other

resources suchaslocks, logs, threads, etc., can b e ob- The second case is when the incremental up date of

served during view execution and used to adapt strate-

a view is done at times other than scanning. This is

gies for improving p erformance. the typical case in a data warehouse where up dates

Rep eated materialization of views can also b e har- from multiple sources are applied asynchronously ei-

vested to obtain patterns of access and use them for ther when they arriveoratscheduled often o -line

just-in-time page prefetching from disk. This not only times. The b ene t of combining scanning and up-

Page 3

enhances bu er management, but more imp ortantly, ACM SIGMOD International Confer-

reduces context switching overhead. Similar tech- ence, pages 61{71, August 1986.

niques have b een engaged successfully in OS studies

[BR96] Lars Baekgraard and Nick Roussop ou-

[PG96] where the patterns are much less predictable

los. \Ecient Refreshment of Data

than re-materializing a view.

Warehouse Views". Technical Rep ort

CS-TR-3642, Dept. of Computer Sci-

7 Conclusion

ence, Univ of Maryland, College Park,

Views are the most imp ortant asset of the relational

MD, May 1996.

mo del. They provide a uniform conceptual and imple-

[CR93] C. M. Chen and N. Roussop oulos.

mentation mo del of relational programs, derived data,

\Adaptive Database Bu er Allo cation

indexes, and aggregated derived data. I can think of a

Using Query Feedback". In Procs. of

very few things that are so elegant and practical to o.

the 19th Int. Conf. on Very Large Data

These are my views on views. And, like most views,

Bases, Dublin, Ireland, 1993.

they are evolving and incrementally up dating with

time. I am impressed with views' versatility, resilience,

[CR94a] C. M. Chen and N. Roussop oulos.

and refusal of retiring [Mum96]. As for the yet-to-b e

\Adaptive Selectivity Estimation Using

discovered uses of views that I promised at the b e-

Query Feedback". In Procs. of the ACM

ginning of the pap er, I remind you that it is just a

SIGMOD Intl. Conf. on Management of

sp eculative view on my part.

Data, Minneap olis, Minnesota, 1994.

References

[CR94b] Chungmin Melvin Chen and Nick Rous-

sop oulos. \The Implementation and

+

[ABC 76] M. M. Astrahan, M. W. Blasgen, D. D.

Performance Evaluation of the ADMS

Chamb erlin, K. P. Eswaran, Jim Gray,

Query Optimizer: Integrating Query

P.P. Griths, W. F. King, R. A. Lo-

Result Caching and Matching". In Proc.

rie, P. R. McJones, J. W. Mehl, G. R.

of the International ConferenceonEx-

Putzolu, I. L. Traiger, B. W. Wade,

tending Database Technology EDBT,

and V. Watson. \System R: Relational

pages 323{336, Cambridge, UK, March

Approach to Database Management".

1994.

ACM Transactions on Database Sys-

tems, 12:97{137, June 1976.

[DR92] A. Delis and N. Roussop oulos. \Evalua-

tion of an Enhanced Workstation-Server

[AL80] M. E. Adiba and B. G. Lindsay.

DBMS Architecture". In Procs. of 18th

\Database Snapshots". In Procs. of 6th

VLDB, 1992.

VLDB, 1980.

[Fin82] S. Finkelstein. \common Expression

[AR90] A. Amir and N. Roussop oulos. \Opti-

Analysis in Database Applications". In

mal View Caching". Information Sys-

Procs. of the ACM SIGMOD Intl. Conf.

tems, 152:169{171, 1990.

on Management of Data, pages 235{245,

_

[BALT86] Jos e A. Blakeley,Per Ake Larson, and

1982.

Frank Wm. Tompa. \Eciently Up dat-

[FNS91] C. Faloutsos, R. Ng, and T. Sellis.

ing Materialized Views". In Carlo Zan-

\Predictive Load Control for Flexible

iolo, editor, Proc. of the ACM SIGMOD

Bu er Allo cation". In VLDB Conf.

Conference, pages 61{71, Washington,

Proceedings, pages 265{274, Barcelona,

D.C., May 1986.

Spain, Septemb er 1991. Also available

[BJNS94] M. Buchheit, M. A. Jeusfeld, W. Nutt,

as UMIACS-TR-91-34 and CS-TR-2622.

and M. Staudt. \Subsumption Between

[GBLP96] J. Gray, Bosworth, Layman, and Pi-

Queries to Ob ject-Oriented Databases".

ramish. \Data Cub e: A Relational Ag-

In Proc. of the International Confer-

gregation". In Proc. of the 12th Int.

ence on Extending Database Technology

Conference on Data Engineering, New

EDBT, Cambridge, UK, March 1994.

Orleans, February 1996. IEEE.

[BLT86] J. A. Blakeley,P. A. Larson, and F. W.

[GHRU96] H. Gupta, V. Harinarayan, Ra jaraman, Tompa. \Eciently Up dating Materi-

and J. Ullman. \Index Selection for alized Views". In Proc. of the 1986

Page 4

[PG96] R. Hugo Patterson and Garth A. Gib- OLAP". In Proc. of of VLDB, Bombay,

son. \Exp osing I/O Concurrency with India, August 1996.

Informed Prefetching". In IPDIS, 1996.

[GJM96] Ashish Gupta, H. V. Jagadish, and

[RB85] N. Roussop oulos

Inderpal Singh Mumick. \Data Inte-

and C. Bader. \Dynamic Access Con-

gration using Self-Maintainable Views".

trol for Relational Views". Information

In Proc. of the International Confer-

Systems, 103:361{369, August 1985.

ence on Extending Database Technol-

ogy EDBT, pages 140{144, Avignon,

+

[RCK 95] N. Roussop oulos, C. M. Chen, S. Kel-

France, March 25-26 1996.

ley, A. Delis, and Y. Papakonstantinou.

\The ADMS Pro ject: Views \R"Us".

[Han87a] E. Hanson. \A Performance Analysis

Data Engineering Bul letin, 182:19{28,

of View Materialization Strategies". In

June 1995.

Proc. of SIGMOD Conference on Man-

agement of Data, pages 440{453. ACM,

[RES93] Nick Roussop oulos, Nikos Economou,

1987.

and Antony Stamenas. \ADMS: A

Testb ed for Incremental Access Meth-

[Han87b] Eric N. Hanson. \A Performance Anal-

o ds". IEEE Transactions on Know l-

ysis of View Materialization Strategies".

edge and Data Engineering, 55:762{

In Proc. of the ACM SIGMOD Confer-

774, Octob er 1993.

ence, pages 440{453, Brighton, England,

Septemb er 1987.

[RK86a] N. Roussop oulos and H. Kang. \Pre-

liminary Design of AD M S : A

[HS93] J. M. Hellerstein and M. Stone-

Workstation-Mainframe Integrated Ar-

braker. \Predicate Migration: Optimiz-

chitecture for Database Management

ing Queries with Exp ensive Predicates".

Systems". In Proc. 12th International

In Procs. of ACM-SIGMOD, 1993.

Conference on VLDB,Kyoto, Japan,

August 1986.

[Jhi88] A. Jhingran. \A p erformance Study

[RK86b] N. Roussop oulos and H. Kang. \Prin-

of Query Optimization Algorithms on

ciples and Techniques in the Design of

a Database System Supp orting Pro ce-

AD M S ". Computer, 1912:19{25,

dures". In Procs. of the 14th Intl. Conf.

1986.

on VLDB, 1988.

[RK87] N. Roussop oulos and H. Kang. \A

[JMRS93] C. S. Jensen, Leo Mark, Nick Rous-

Pip eline N-way Join Algorithm based on

sop oulos, and Timos Sellis. \Using Dif-

the 2-way Semijoin Program". IEEE

ferential Techniques to Eciently Sup-

Trans. on Know ledge and Data Engi-

p ort Transaction Time". VLDB Jour-

neering, 34:486{495, Decemb er 1987.

nal, 21:75{111, 1993.

[RKR97] Nick Roussop oulos, Yannis Kotidis, and



[LY85] P.-A. Larson and H. Z. Yang. \Comput-

Mema Roussop oulos. \Cub etree: Or-

ing Queries from Derived Relations". In

ganization of and Bulk Up dates on the

Procs. of the 11th Intl. Conf. on VLDB,

Data Cub e". In Proc. of the ACM

pages 259{269, 1985.

SIGMOD Conference,Tucson, Arizona,

May 13-15 1997.

[Mum96] In Inderpal Mumick and Ashish Gupta,

editors, Proceedings of the Workshop

[RL85] N. Roussop oulos and D. Leifker. \Direct

on Materialized Views: Techniques and

Spatial Search on Pictorial Databases

Applications, Montreal, Canada, June 7

Using Packed R-Trees". In Proc. ACM

1996.

SIGMOD, pages 17{31, Austin, Texas,

May 1985.

[Pap94] Y. Papakonstantinou. \Computing a

[Rou82a] N. Roussop oulos. \The Logical Access Query as a Union of Disjoint Horizontal

Path Schema of a Database". IEEE Fragments". Technical rep ort, Depart-

Trans. on Software Engineering, SE- ment of Computer Science, University

86:563{573, 1982. of Maryland, College Park, MD, 1994.

Page 5

[Rou82b] N. Roussop oulos. \View Indexing in

Relational Databases". ACM TODS,

72:258{290, 1982.

[Rou87] Nick Roussop oulos. \Overview of

ADMS: A High Performance Database

Management System". In Fal l Joint

Computer Conference, Dallas, Texas,

Octob er 25-29 1987.

[Rou91] N. Roussop oulos. \The Incremental

Access Metho d of View Cache: Con-

cept, Algorithms, and Cost Analysis".

ACM Transactions on Database Sys-

tems, 163:535{563, Septemb er 1991.

[Sel87] T. Sellis. \Eciently Supp orting Pro ce-

dures in Relational Database Systems".

Proc. ACM SIGMOD,May 1987.

[Sel88] T. K. Sellis. \Intelligent Caching

and Indexing Techniques for Relational

Database Systems". Inform. Systems,

132, 1988.

[Sta89] Antonios G. Stamenas. \High Perfor-

mance Incremen-

tal Relational Databases". Technical re-

p ort, UMIACS-TR-89-49, CS-TR-2245,

Department of Computer Science, Uni-

versity of Maryland, College Park, MD

20742, May 1989.

[Sto75] M. Stonebraker. \Implementation of In-

tegrity Constraints and Views by Query

Mo di cation". In Proceedings of the

ACM SIGMOD Intl. Conf. on the Man-

agement of Data, pages 65{78, San Jose,

CA, 1975.

[Sys96] Inc. Red Brick Systems. \Star Schemas

And STARjoin Technology". Technical

rep ort, 1996. White Pap er.

[Val87] Patrick Valduriez. \Join Indices".

ACM Transactions on Database Sys-

tems, 122:218{246, June 1987.

[ZGMHW95] Yue Zhuge, Hector Garcia-Molina,

Joachim Hammer, and Jennifer Widom.

\View Maintenance in a Warehousing

Environment". In Michael J. Carey and

Donovan A. Schneider, editors, Proc. of

the ACM SIGMOD Conference, pages

316{327, San Jose, California, May

1995.

Page 6