Materialized Views and Data Warehouses
Nick Roussop oulos
Department of Computer Science
and
Institute of Advanced Computer Studies
University of Maryland
2 The Multifaceted Form of Views
Relational views have several forms:
Abstract
A data warehouse is a redundant collection pureprogram: an unmaterialized view is a pro-
gram sp eci cation, \the intension", that gener-
of data replicated from several p ossibly dis-
ates data. Query mo di cation [Sto75] and com-
tributed and lo osely coupled source databases,
+
organized to answer OLAP queries. Rela- piled queries [ABC 76]were the rst techniques
tional views are used b oth as a sp eci cation exploiting views{ their basic di erence is that the
technique and as an execution plan for the rst is used as a macro that do es not optimize
until run-time while the second stores optimized
derivation of the warehouse data. In this p o-
execution plans. Such a view form is a pure pro-
sition pap er, we summarize the versatilityof
gram with no extensional attachments. Each time
relational views and their p otential.
the view program is invoked, it generates materi-
alizes the data at a cost that is roughly the same
1 Views
for eachinvo cation.
The imp ortance of the \algebraic closedness" of the
relational mo del has not b een recognized enough in
derived data: a materialized view is \the exten-
its 27 years of existence. Although a lot of energy
sion" of the pure program form and has the char-
has b een consumed on dogmatizing on the \relational
acteristics of data likeany other relational data.
purity", on its interface simplicity, on its mathemati-
Thus, it can b e further queried to build views-
cal foundation, etc., there has not b een a single pap er
on-views or collectively group ed [Pap94] to build
with a central fo cus on the imp ortance of relational
super-views. The derivation op erations are at-
views, their versatility, and their yet-to-b e exploited
tached to materialized views. These pro cedural
p otential.
attachments along with some \delta" relational
What is a relational view? Is it a program? Is it
algebra are used to p erform incremental up dates
data? Is it an index? Is it an OLAP aggregate? It is
on the extension.
all these. And a lot more. Below I summarize the most
imp ortant uses, techniques, and b ene ts p ertaining to
pure data: when materialized views are converted
views. Note that the cited work here is not meantto
to snapshots, the derivation pro cedure is detached
b e exhaustive but representative and easily accessible
and the views b ecome pure data that is not main-
from my short-term memory.
tainable pure data is at the opp osite end of the
sp ectrum from pure program.
The copyright of this paper belongs to the paper's authors. Per-
mission to copy without fee al l or part of this material is granted
pure index: view indexes [Rou82b] and View-
provided that the copies are not made or distributed for direct
commercial advantage.
Caches [Rou91] illustrate this avor of views.
Pro ceedings of the 4th KRDB Workshop
Their extension has only p ointers to the underly-
Athens, Greece, 30-August-1997
ing data which are dereferenced when the values
F. Baader, M.A. Jeusfeld, W. Nutt, eds.
are needed. Like all indexing schemes, the imp or-
tance of indexes lies in their organization, which http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-8/
Page 1
facilitates easy manipulation of p ointers and ef- JMRS93]. The goals of these studies range from im-
cient single-pass dereferencing, and thus avoids proving query optimization and pro cessing to supp ort-
thrashing. ing rules in active databases, to query pro cessing in
client-server and distributed/replicated database ar-
hybrid data & index: a partially materialized view
chitectures, to handling time queries, to obtaining ef-
[BR96] stores some attributes as data while the
cient up date dissemination, to avoiding exp ensive
rest are referenced through p ointers. This form
computations of external predicates, etc. All these
combines data and indexes. B-trees, Join indexes
techniques have one common underlying theme: the
[Val87], star-indexes [Sys96] and most of the other
re-use of views to save cost.
indexing schemes b elong to this category, with ap-
Amortization and re-use of views can only b e p os-
2
propriate schema mapping for translating p oint-
sible if they can b e discovered by the query optimizer
ers to record eld values. Note that in this form,
which decides to plug-in those views which reduce the
the data values are drawn directly from the un-
cost of the query. The b ene ts are multiplied in a
derlying relations and no transformation to these
multi-user environment with a lot of shared access
1
values is required .
to views. Despite this, only the ADMS prototyp e
has extended the query optimizer and its cost mo del
OLAP aggregate/indexing: a data cub e [GBLP96]
[CR94b] to include in its plan selection materialized
is a set of materialized or indexed views [GHRU96,
views, ViewCaches, and incremental access metho ds
RKR97]. They corresp ond to pro jections of the
and a tailored bu er manager , as well as a tailored
multi-dimensional space data to lesser dimension-
bu er manager designed to supp ort these access meth-
ality subspaces and store aggregate values in it.
o ds [CR93]. However, b oth IBM and Microsoft plan to
In this form, the data values are aggregated from
incorp orate similar constructs in their DB2 and Sequel
a collection of underlying relation values. Sum-
Server future releases.
mary tables and Star Schemas [Sys96] b elong in
The most common technique for discovering views
this form the latter b elongs here as muchasin
in any of its forms is subsumption [Rou82a, Fin82,
the previous category.
LY85, Rou91, BJNS94 ]. Subsumption in its most gen-
eral form is an undecidable problem, but for the most
Each of these forms is used by some comp onentof
common queries can b e reduced to an NP-complete
a relational system. Having a uni ed view of all forms
problem. For simple conjunctive query views, it fur-
of relational views is imp ortant in recognizing com-
ther reduces to p olynomial-time and very ecient al-
monalities, re-using implementation techniques, and
gorithms [Rou82a, CR94b].
discovering p otential uses not yet exploited.
In a data warehouse where query execution and I/O
are magni ed, the mandate for re-use cannot b e ig-
3 Discovery and Re-use of Views
nored. Furthermore, in an OLAP environment, unlike
OLTP, up dates come in bulk rather than a few-at-
RDBMSs do nothing else but generate or access mate-
a-time, making incremental up date techniques more
rialized views 24 hours a day whether these are prede-
e ectively amortized [RKR97]. Therefore, query op-
ned views, results of compiled queries, ad ho c queries,
+
timizers based on materialized view fragments are a
or even materialized view fragments [RCK 95], i.e.,
necessity.At this p oint, data warehouses rely solely
temp orary results generated during the execution of
on users' memory for re-using precomputed summary
a larger query. Unfortunately, commercial RDBMSs
tables. This severely limits their p erformance p oten-
discard these views immediately after they are deliv-
tial.
ered to the user or to a subsequent execution phase.
The cost for generating the views is for one-time-use
4 Pro cessing of Views
only instead of b eing amortized over multiple and/or
shared accesses [Rou91].
Now let's examine view pro cessing for all the view
Caching query intermediate results for sp eeding
forms except for the pure data snapshots which are
up intra- and inter-query pro cessing has b een stud-
not maintainable. View pro cessing involves view scan-
ied widely [Fin82,LY85, Rou91, Sel87, Jhi88, DR92,
ning, incremental up date, or b oth applied simultane-
AL80, Rou82b, Rou91, Sel88, Jhi88, RK86b, BALT86,
ously. Scanning and incremental up date of views im-
Fin82,LY85, DR92, HS93, RK86b, Han87a, Han87b,
ply sp ecial lo cks, lo cking proto cols [RES93], autho-
1
rization [RB85], and consistency proto cols for asyn-
This is how the indexed form is almost exclusively used
chronous up dates from multiple sources [ZGMHW95]. although there is no intrinsic reason for not applying a transfor-
mation function, other than the identity one, to the underlying
2
values b efore indexing them- e.g., calibrate the values b efore the user cannot b e aware of views generated by the system
entered in a B-tree. and other users.
Page 2
I will concentrate here on p erformance issues. dating is not a factor any more. Therefore, minimal
dereferencing is a go o d target optimization. Partially
View scanning in the pure program view form is typ-
materialized views [BR96] which materialize only the
ically the same as re-execution of the query that cre-
subset of the attributes useful for the incremental up-
ated the view. There is no p erformance b ene t for un-
date, outer-joins instead of joins, or other appropriate
materialized views other than predicting re-execution
attribute caching techniques [Sta89] are b est suited.
cost more accurately after the rst time. The p erfor-
On the other hand, fully materialized views are cum-
mance is bad but predictable. Scanning a materialized
b ersome and generate a lot of unnecessary I/O and
view has a cost that dep ends on the ratio of the useful
data movement for just up dating views that are to b e
tuples in it to answer a given query, called density of
used in the future.
the view. For a 100 ratio, scanning a materialized
It should b e mentioned here that the issue of self-
view is optimal b ecause it has all the data for answer-
maintenance [GJM96] of views is imp ortant. However,
ing the query compacted in a tight storage space. If the
the additional information necessary for the incremen-
densityislow, the noise can b e more than the amount
tal up date and its storage organization must b e well
of useful data. For the index view form, scanning cost
designed since this a ects p erformance. For example,
can range from near optimal, when the p ointers are
the storage organization of the deltas mayhavean
aligned and p oint to a tight space, to very high, when
equivalent adverse e ects to thrashing if their tuples
p ointer dereferencing causes thrashing similar to un-
are scattered in an unclustered space.
clustering indexes in RDBMSs or in OODBMSs. For
this reason, in the index form, it is very imp ortant that
the p ointers b e well organized [RK87, AR90] and use a
5 Data Mining within Views
tailored bu er manager [CR93] whichavoids thrashing
Views which are materialized or partially materialized
caused by the multi-dimensionality of the view. View-
contain valuable information in them suchasvalue dis-
Cache uses a form of puzzle-shap ed packed R-trees
tributions and other statistical information that are
[RL85] and tailored cache replacement strategies. Cu-
much more accurate than the clumsy ones maintained
b etrees [RKR97] utilize multi-dimensional compressed
by the DBMS statistics utility which is run once in
and packed R-trees [RL85]. Again, for p erformance,
a while. Having accurate value distributions avoids
the organization is the only thing that matters.
those unrealistic assumptions and guestimates based
Incremental up date techniques for views are ma-
on the uniformity assumption. Since RDBMSs do
ture, as they go back for more than a decade of re-
nothing else but generate materialized views, a smarter
search [BLT86, RK86b, Rou87, Rou91, RES93]. The
system can extract this valuable meta-data and, with a
same techniques were the foundation for the manage-
query feedback mechanism [CR94a], maintain precise
ment of replicated data [RK86a, RK86b] which found
statistics. This was shown to incur no additional I/O
its way to the log-based replication to ols of commercial
cost and have negligible CPU only overhead for a big
RDBMSs.
return, as the error in the estimation is very close to
Incremental up date of a view dep ends again on its
none.
underlying form. In its un-materialized form the cost
of an incremental up date is the cost of re-execution.
6 Harvesting the Executions of Views
For other forms wemust distinguish two cases. The
rst case o ccurs when the incremental up date is done
Consider a view as a program again. Whether we exe-
in real-time during the query execution. In this case,
cute it to materialize it or to incrementally up date it,
the up date is combined with scanning and therefore,
the system's b ehavior can b e observed during this ex-
the cost of incremental up date is subsumed by the
ecution and b e used to adapt resource allo cation dur-
scanning cost. This was the main ob jective of the
ing subsequent executions. For example, bu er page
one-pass incremental up date algorithms of ViewCache.
fault b ehavior during a view's execution can b e accu-
The subsumed cost savings are signi cant and this was
rately predicted using regression from a few executions
shown by comparing worst case analysis estimations
of it [CR93]. Using this information, bu er allo cation
against actual timed exp eriments [RES93]. This was
is done much more eciently using a marginal gain
esp ecially true for views-on-views b ecause of the elim-
technique [FNS91]. Clearly, bu ers are some of the
ination of storing and accessing intermediate results.
most imp ortant resources to b e managed, but other
resources suchaslocks, logs, threads, etc., can b e ob- The second case is when the incremental up date of
served during view execution and used to adapt strate-
a view is done at times other than scanning. This is
gies for improving p erformance. the typical case in a data warehouse where up dates
Rep eated materialization of views can also b e har- from multiple sources are applied asynchronously ei-
vested to obtain patterns of access and use them for ther when they arriveoratscheduled often o -line
just-in-time page prefetching from disk. This not only times. The b ene t of combining scanning and up-
Page 3
enhances bu er management, but more imp ortantly, ACM SIGMOD International Confer-
reduces context switching overhead. Similar tech- ence, pages 61{71, August 1986.
niques have b een engaged successfully in OS studies
[BR96] Lars Baekgraard and Nick Roussop ou-
[PG96] where the patterns are much less predictable
los. \Ecient Refreshment of Data
than re-materializing a view.
Warehouse Views". Technical Rep ort
CS-TR-3642, Dept. of Computer Sci-
7 Conclusion
ence, Univ of Maryland, College Park,
Views are the most imp ortant asset of the relational
MD, May 1996.
mo del. They provide a uniform conceptual and imple-
[CR93] C. M. Chen and N. Roussop oulos.
mentation mo del of relational programs, derived data,
\Adaptive Database Bu er Allo cation
indexes, and aggregated derived data. I can think of a
Using Query Feedback". In Procs. of
very few things that are so elegant and practical to o.
the 19th Int. Conf. on Very Large Data
These are my views on views. And, like most views,
Bases, Dublin, Ireland, 1993.
they are evolving and incrementally up dating with
time. I am impressed with views' versatility, resilience,
[CR94a] C. M. Chen and N. Roussop oulos.
and refusal of retiring [Mum96]. As for the yet-to-b e
\Adaptive Selectivity Estimation Using
discovered uses of views that I promised at the b e-
Query Feedback". In Procs. of the ACM
ginning of the pap er, I remind you that it is just a
SIGMOD Intl. Conf. on Management of
sp eculative view on my part.
Data, Minneap olis, Minnesota, 1994.
References
[CR94b] Chungmin Melvin Chen and Nick Rous-
sop oulos. \The Implementation and
+
[ABC 76] M. M. Astrahan, M. W. Blasgen, D. D.
Performance Evaluation of the ADMS
Chamb erlin, K. P. Eswaran, Jim Gray,
Query Optimizer: Integrating Query
P.P. Griths, W. F. King, R. A. Lo-
Result Caching and Matching". In Proc.
rie, P. R. McJones, J. W. Mehl, G. R.
of the International ConferenceonEx-
Putzolu, I. L. Traiger, B. W. Wade,
tending Database Technology EDBT,
and V. Watson. \System R: Relational
pages 323{336, Cambridge, UK, March
Approach to Database Management".
1994.
ACM Transactions on Database Sys-
tems, 12:97{137, June 1976.
[DR92] A. Delis and N. Roussop oulos. \Evalua-
tion of an Enhanced Workstation-Server
[AL80] M. E. Adiba and B. G. Lindsay.
DBMS Architecture". In Procs. of 18th
\Database Snapshots". In Procs. of 6th
VLDB, 1992.
VLDB, 1980.
[Fin82] S. Finkelstein. \common Expression
[AR90] A. Amir and N. Roussop oulos. \Opti-
Analysis in Database Applications". In
mal View Caching". Information Sys-
Procs. of the ACM SIGMOD Intl. Conf.
tems, 152:169{171, 1990.
on Management of Data, pages 235{245,
_
[BALT86] Jos e A. Blakeley,Per Ake Larson, and
1982.
Frank Wm. Tompa. \Eciently Up dat-
[FNS91] C. Faloutsos, R. Ng, and T. Sellis.
ing Materialized Views". In Carlo Zan-
\Predictive Load Control for Flexible
iolo, editor, Proc. of the ACM SIGMOD
Bu er Allo cation". In VLDB Conf.
Conference, pages 61{71, Washington,
Proceedings, pages 265{274, Barcelona,
D.C., May 1986.
Spain, Septemb er 1991. Also available
[BJNS94] M. Buchheit, M. A. Jeusfeld, W. Nutt,
as UMIACS-TR-91-34 and CS-TR-2622.
and M. Staudt. \Subsumption Between
[GBLP96] J. Gray, Bosworth, Layman, and Pi-
Queries to Ob ject-Oriented Databases".
ramish. \Data Cub e: A Relational Ag-
In Proc. of the International Confer-
gregation". In Proc. of the 12th Int.
ence on Extending Database Technology
Conference on Data Engineering, New
EDBT, Cambridge, UK, March 1994.
Orleans, February 1996. IEEE.
[BLT86] J. A. Blakeley,P. A. Larson, and F. W.
[GHRU96] H. Gupta, V. Harinarayan, Ra jaraman, Tompa. \Eciently Up dating Materi-
and J. Ullman. \Index Selection for alized Views". In Proc. of the 1986
Page 4
[PG96] R. Hugo Patterson and Garth A. Gib- OLAP". In Proc. of of VLDB, Bombay,
son. \Exp osing I/O Concurrency with India, August 1996.
Informed Prefetching". In IPDIS, 1996.
[GJM96] Ashish Gupta, H. V. Jagadish, and
[RB85] N. Roussop oulos
Inderpal Singh Mumick. \Data Inte-
and C. Bader. \Dynamic Access Con-
gration using Self-Maintainable Views".
trol for Relational Views". Information
In Proc. of the International Confer-
Systems, 103:361{369, August 1985.
ence on Extending Database Technol-
ogy EDBT, pages 140{144, Avignon,
+
[RCK 95] N. Roussop oulos, C. M. Chen, S. Kel-
France, March 25-26 1996.
ley, A. Delis, and Y. Papakonstantinou.
\The ADMS Pro ject: Views \R"Us".
[Han87a] E. Hanson. \A Performance Analysis
Data Engineering Bul letin, 182:19{28,
of View Materialization Strategies". In
June 1995.
Proc. of SIGMOD Conference on Man-
agement of Data, pages 440{453. ACM,
[RES93] Nick Roussop oulos, Nikos Economou,
1987.
and Antony Stamenas. \ADMS: A
Testb ed for Incremental Access Meth-
[Han87b] Eric N. Hanson. \A Performance Anal-
o ds". IEEE Transactions on Know l-
ysis of View Materialization Strategies".
edge and Data Engineering, 55:762{
In Proc. of the ACM SIGMOD Confer-
774, Octob er 1993.
ence, pages 440{453, Brighton, England,
Septemb er 1987.
[RK86a] N. Roussop oulos and H. Kang. \Pre-
liminary Design of AD M S : A
[HS93] J. M. Hellerstein and M. Stone-
Workstation-Mainframe Integrated Ar-
braker. \Predicate Migration: Optimiz-
chitecture for Database Management
ing Queries with Exp ensive Predicates".
Systems". In Proc. 12th International
In Procs. of ACM-SIGMOD, 1993.
Conference on VLDB,Kyoto, Japan,
August 1986.
[Jhi88] A. Jhingran. \A p erformance Study
[RK86b] N. Roussop oulos and H. Kang. \Prin-
of Query Optimization Algorithms on
ciples and Techniques in the Design of
a Database System Supp orting Pro ce-
AD M S ". Computer, 1912:19{25,
dures". In Procs. of the 14th Intl. Conf.
1986.
on VLDB, 1988.
[RK87] N. Roussop oulos and H. Kang. \A
[JMRS93] C. S. Jensen, Leo Mark, Nick Rous-
Pip eline N-way Join Algorithm based on
sop oulos, and Timos Sellis. \Using Dif-
the 2-way Semijoin Program". IEEE
ferential Techniques to Eciently Sup-
Trans. on Know ledge and Data Engi-
p ort Transaction Time". VLDB Jour-
neering, 34:486{495, Decemb er 1987.
nal, 21:75{111, 1993.
[RKR97] Nick Roussop oulos, Yannis Kotidis, and
[LY85] P.-A. Larson and H. Z. Yang. \Comput-
Mema Roussop oulos. \Cub etree: Or-
ing Queries from Derived Relations". In
ganization of and Bulk Up dates on the
Procs. of the 11th Intl. Conf. on VLDB,
Data Cub e". In Proc. of the ACM
pages 259{269, 1985.
SIGMOD Conference,Tucson, Arizona,
May 13-15 1997.
[Mum96] In Inderpal Mumick and Ashish Gupta,
editors, Proceedings of the Workshop
[RL85] N. Roussop oulos and D. Leifker. \Direct
on Materialized Views: Techniques and
Spatial Search on Pictorial Databases
Applications, Montreal, Canada, June 7
Using Packed R-Trees". In Proc. ACM
1996.
SIGMOD, pages 17{31, Austin, Texas,
May 1985.
[Pap94] Y. Papakonstantinou. \Computing a
[Rou82a] N. Roussop oulos. \The Logical Access Query as a Union of Disjoint Horizontal
Path Schema of a Database". IEEE Fragments". Technical rep ort, Depart-
Trans. on Software Engineering, SE- ment of Computer Science, University
86:563{573, 1982. of Maryland, College Park, MD, 1994.
Page 5
[Rou82b] N. Roussop oulos. \View Indexing in
Relational Databases". ACM TODS,
72:258{290, 1982.
[Rou87] Nick Roussop oulos. \Overview of
ADMS: A High Performance Database
Management System". In Fal l Joint
Computer Conference, Dallas, Texas,
Octob er 25-29 1987.
[Rou91] N. Roussop oulos. \The Incremental
Access Metho d of View Cache: Con-
cept, Algorithms, and Cost Analysis".
ACM Transactions on Database Sys-
tems, 163:535{563, Septemb er 1991.
[Sel87] T. Sellis. \Eciently Supp orting Pro ce-
dures in Relational Database Systems".
Proc. ACM SIGMOD,May 1987.
[Sel88] T. K. Sellis. \Intelligent Caching
and Indexing Techniques for Relational
Database Systems". Inform. Systems,
132, 1988.
[Sta89] Antonios G. Stamenas. \High Perfor-
mance Incremen-
tal Relational Databases". Technical re-
p ort, UMIACS-TR-89-49, CS-TR-2245,
Department of Computer Science, Uni-
versity of Maryland, College Park, MD
20742, May 1989.
[Sto75] M. Stonebraker. \Implementation of In-
tegrity Constraints and Views by Query
Mo di cation". In Proceedings of the
ACM SIGMOD Intl. Conf. on the Man-
agement of Data, pages 65{78, San Jose,
CA, 1975.
[Sys96] Inc. Red Brick Systems. \Star Schemas
And STARjoin Technology". Technical
rep ort, 1996. White Pap er.
[Val87] Patrick Valduriez. \Join Indices".
ACM Transactions on Database Sys-
tems, 122:218{246, June 1987.
[ZGMHW95] Yue Zhuge, Hector Garcia-Molina,
Joachim Hammer, and Jennifer Widom.
\View Maintenance in a Warehousing
Environment". In Michael J. Carey and
Donovan A. Schneider, editors, Proc. of
the ACM SIGMOD Conference, pages
316{327, San Jose, California, May
1995.
Page 6