Temp oral

Richard T Sno dgrass

Department of Computer Science University of Arizona Tucson AZ

rtscsarizonaedu

Abstract This pap er summarizes the ma jor concepts approaches and

implementation strategies that have b een generated over the last f

teen years of research into data base management system supp ort for

varying information We rst examine the time domain its struc

ture dimensional i ty indeterminacy and representation We then discuss

facts may b e asso ciated with time and consider data mo deling and how

representational issues We survey the many temp oral query languages

we examine the impact to each of the that have b een prop osed Finally

comp onents of a DBMS of adding temp oral supp ort fo cusing on query

optimization and evaluation

Intro duction

Time is an imp ortant asp ect of all realworld phenomena Events o ccur at sp ecic

p oints in time ob jects and the relationships among ob jects exist over time

The ability to mo del this temp oral dimension of the real wold is essential to

many computer applications such as econometrics banking inventory control

accounting law medical records land and geographical information systems

and airline reservations

Con ventional databases represent the state of an enterprise at a single mo

ment of time Although the contents of the continue to change as new

information is added these changes are viewed as mo dications to the state

with the old outofdate data b eing deleted from the database The current

such contents of the database may b e viewed as a shapshot of the enterprise In

systems the attributes involving time are manipulated solely by the application

programs the database management system DBMS interprets dates as values

in the base data typ es No conventional system interprets temp oral domains

when deriving new relations

Applicationindep endent DBMS supp ort for timevarying information has

years with approximately pa b een an active area of research for ab out

p ers generated thus far citeBolourmckenzieAstamso o This pap er

attempts to capture and summarize the ma jor concepts approaches and imple

mentation strategies that have b een generated by that research

W e rst examine the time domain its structure dimensionality interest

ingly there are several time dimensions and temp oral indeterminacy followed

by issues in representing values in this domain We demonstrate that time is

actually more complex than the spatial domain as the formers dimensions are nonhomogeneous

Section follows a similar organization in examining how facts may b e asso

ciated with time Data mo deling issues are rst examined then representational

t comparisons with We briey con alternatives are explored with frequen

sider how facts may b e simultaneously asso ciated with b oth space and time a

common phenomena in land and geographic information systems

We next consider languages for expressing temp oral queries We illustrate

the various typ es of queries through examples in the temp oral query language

TQuel and briey appraise various standards eorts

Temp oral DBMS implementation is the topic of Sec We examine the im

pact to each of the comp onents of a DBMS of adding temp oral supp ort dis

cussing query optimization and evaluation in some detail

We conclude with a summary of the ma jor accomplishments and disapp oint

ments of research into temp oral databases

We omit one ma jor asp ect that of database design due to lack of space

The Time Domain

In this section we fo cus on time itself how it is mo deled and how it is repre

sented The next section will then combine time with facts to mo del timevarying

information

Structure

We initially assume that there is one dimension of time The distinctions we

address here will apply to each of the several dimensions we consider in the next

section

Early work on temporal logic centered around two structural mo dels of time

linear and branching citevanBenthem In the linear mo del time advances

from the to the in a totally ordered fashion In the branching mo del

mo del time is linear from the past to now where also termed the possible

it then divides into several time lines each representing a p otential sequence

of events citeWorb oysA Along any future path additional branches may

exist The structure of branching time is a tree ro oted at now The most general

mo del of time in a temp oral logic represents time as an arbitrary set with a

partial order imp osed on it Additional axioms intro duce other more rened

or example linear time can b e sp ecied by adding an axiom mo dels of time F

imp osing a total order on this set Recurrent pro cesses may b e asso ciated with

a cyclic mo del of time citeChomickiA LorentzosC LorentzosB

In spatial mo dels there is much less diversity and a linear mo del is generally

adequate

Axioms may also b e added to temp oral logics to characterize the density of

the time line citevanBenthem Combined with the linear mo del discrete

mo dels of time are isomorphic to the natural numb ers implying that each p oint

in time has a single successor citeCliord Dense mo dels of time are iso

morphic to either the rationals or the reals b etween any two moments of time

another moment exists Continuous mo dels of time are isomorphic to the reals

ie they are b oth dense and unlike the rationals contain no gaps

In the continuous mo del each real numb er corresp onds to a p oint in time

in the discrete mo del each natural numb er corresp onds to a nondecomp osable

with an arbitrary Such a nondecomp osable unit of time

is refered to as a citeAriavACliordB other p erhaps less desir

able terms include time quantum citeAnderson moment citeAllenB

instant citeGadiaA and time unit citeNavatheTanselE A chronon

is the smallest duration of time that can b e represented in this mo del It is not

a p oint but a line segment on the time line

Although time itself is generally p erceived to b e continuous most prop osals

for adding a temp oral dimension to the relational data mo del are based on the

discrete time mo del Several practical arguments are given in the literature for

this preference for the discrete mo del over the continuous mo del First measures

CLIFFORD Clo ck of time are inherently imprecise citeANDERSON

ing instruments invariably rep ort the o ccurrence of events in terms of

not time p oints Hence events even socalled instantaneous events can at

b est b e measured as having o ccurred during a chronon Secondly most natural

language references to time are compatible with the discrete time mo del For

example when we say that an o ccurred at pm we usually dont

mean that the event o ccurred at the p oint in time asso ciated with pm

but at some time in the chronon p erhaps minute asso ciated with pm

citeANDERSON CLIFFORDB DyresonD Thirdly the concepts of

chronon and interval allow us to naturally mo del events that are not instanta

neous but have duration citeANDERSON Finally any implementation of

a data mo del with a temp oral dimension will of necessity have to have some

discrete enco ding for time Sec

Space may similarly b e regarded as discrete dense or continuous Note that

in all three of these alternatives two separate spacelling ob jects cannot b e

lo cated in the same p oint in space and time they can b e lo cated in the same

place at dierent or at the same time in dierent places

Axioms can also b e placed on the boundedness of time Time can b e b ounded

orthogonally in the past and in the future The same applies to mo dels of space

Mo dels of time may include the concept of distance most temp oral logics

do not do so however Both time and space are metrics in that they have

a distance function satisfying four prop erties the distance is nonnegative

the distance b etween any two nonidentical elements is nonzero the

distance from time to time is identical to the distance from to and

the distance from to is equal to or greater than the distance from to

plus the distance from to the triangle inequality

With distance and b oundedness restrictions on range can b e applied The

scientic cosmology of the Big Bang p osits that time b egins with the Big Bang

billion years ago There is much debate on when it will end dep ending on

whether the universe is open or closed Hawking provides a readable intro duction

to this controversy citeHawking If the universe is closed then time will

have an end when the universe collapses back onto itself in what is called the

Big Crunch If it is op en then time will go on forever

Similar considerations apply to space In particular an op en universe im

plies unb ounded space However many applications assume a b ound as well as

a range geographical information systems dont need to contend with values

greater than approximately million meters

time from absolute time more precise Finally one can dierentiate relative

terms are unachored and anchored For example AM Jan uary is an

absolute time whereas hours is a relative time This distinction though is

not as crisp as one would hop e b ecause absolute time is with resp ect to another

time in this example midnight January AD We will show in Sec how

to exploit this interaction Relative time diers from distance in that the former

has a direction eg one could envision a relative time of hours whereas a

distance is unsigned

One can also dierentiate b etween relative and absolute space with the same

provisos

ity Dimensional

Time is multidimensional citeSno dgrassA Valid time concerns the time a

fact was true in reality The valid time of an event is the wall clo ck time at which

the event o ccurred in the real world indep endent of the recording of that event

alid times can also b e in the future if it is known that some in some database V

fact will b e true at a sp ecied time in the future Transaction time concerns the

time the fact was in the database as stored data The transaction time

an interval of an event identies the transactions that inserted the informa

tion ab out the event into the database and removed this information from the

database As with space these two dimensions are orthogonal A data mo del

supp orting neither is termed snapshot as it captures only a single snapshot in

time of b oth the database and the enterprise that the database mo dels A date

mo del supp orting only valid time is termed historical one that supp orts only

transaction time is termed rol lback and one that supp orts b oth valid and trans

action time is termed bitemporal temporal is a generic term implying some kind

of time supp ort

Figure illustrates a single bitemp oral relation ie table comp osed of a

sequence of historical states indexed by transaction time It is the result of four

transactions starting from an empty relation three tuples ie rows were

added one tuple was added one tuple was added and an existing one

terminated logically deleted and the starting time of a previous tuple the

middle one added in transaction was changed to a somewhat later time

presumably the original starting time was incorrect and a recently added tu

ple the b ottom one was deleted presumably it should not have b een there

in the rst place Each up date op eration involves copying the historical state

then applying the up date to the newly created state Of course less redundant

representations than the one shown are p ossible While well consider only lin

ear time branching transaction time provides a useful mo del for versioning in

computeraided design tasks citeDittrich such as CAD citeEcklund

Katz and CASE citeBernsteinA Hsieh

Fig A bitemp oral relation

A dierent depiction that has proven useful is to timestamp each fact with

1

a bitemporal element which is a set of bitemporal chronons Each bitemp oral

chronon represents a tiny rectangle in validtimetransactiontime space Fig

ure shows the bitemp oral element asso ciated with the middle tuple of Fig

Historical and rollback databases eectively record historical chronons and rol l

back chronons resp ectively

Fig A bitemp oral element

1

This term is a generalization of temporal element previously used to denotes a set of

An alternative equally desirable term single dimensional chronons citeGadiaB

is bitemp oral lifespan citeCliordA

While valid time may b e b ounded or unb ounded as we saw cosmologists

feel that it is at least b ounded in the past transaction time is b ounded on b oth

ends Sp ecically transaction time starts when the database is created b efore

which time nothing was stored and do esnt extend past now no facts are

known to have b een stored in the future Changes to the database state are

required to b e stamp ed with the current transaction time Hence rollback and

bitemp oral relations are appendonly making them prime candidates for storage

on writeonce optical disks As the database state evolves transaction times grow

monotonically In contrast successive transactions may mention widely varying

valid times For instance the fourth transaction in Fig added information to

the database that was transaction timestamp ed with time while changing a

valid time of one of the tuples to

The three dimensions in space are truly orthogonal and homogeneous the one

exception b eing the sp ecial treatment sometimes accorded elevation In contrast

the two time dimensions are not homogeneous transaction time has a dierent

semantics than valid time Valid and transaction time are orthogonal though

there are generally some application dep endent correlations b etween the two

times As a simple example consider the situation where a fact is recorded as

d bitemp oral database so on as it b ecomes valid in reality In such a specialize

termed degenerate citeJensen valid and transaction time are identical As

another example if a cloud cover measurement is recorded at most two days after

it was valid in reality and if it takes at least six hours from the measurement time

to record the measurement then such a relation is delayed strongly retroactively

bounded with bounds six hours and two days

Multiple transaction times may also b e stored in the same relation termed

temporal generalization citeJensen These times may also b e related to each

other or to the valid time in various sp ecialized ways For example a particular

value for the reectivity of a cloud over a p oint on the Earth may b e recorded

by an Earth Sensing Satellite at a particular time Here the valid time and

transaction time are correlated and the satellites database may b e considered to

b e a degenerate bitemp oral database Later this data is sent to a ground station

and stored the transaction time of the stored data will b e dierent from the valid

time this database may b e classied as a bounded retroactive database Later

still the data from several ground stations are merged into a central database

storing the original valid time the transaction time of the recording into the

central database and the inherited transaction time when the data was stored

in the ground station database All three times may b e needed for instance if

data massaging was done with algorithms that were b eing improved over time

Such multiple transaction time dimensions do not have a spatial analogue

Indeterminacy

Information that is historical ly indeterminate can b e characterized as dont

know exactly when information This kind of information is prevalent it arises

in various situations including the following

Finer system granularity In p erhaps most cases the granularity of the

database do es not match the precision to which an event time is known For

example an event time known to within one day and recorded on a system

with timestamps in the granularity of a millisecond happ ened sometime

during that day but during which millisecond is unknown

Imp erfect dating techniques Many dating techniques are inherently impre

cise such as radioactive and Carb on dating All clo cks have an inherent

imprecision citeDyresonD

ertainty in planning Pro jected completion dates are often inexactly Unc

sp ecied eg the pro ject will complete three to six months from now

Unknown or imprecise event times In general event times could b e un

known or imprecise For example if we do now know when an individual

was b orn the individuals date of birth could b e recorded in the database

as either unknown she was b orn b etween now and the b eginning of time

or imprecise she was b orn b etween now and years ago

There have b een several prop osals for adding historical indeterminacy to

the time mo del citeGadiaAKahnA as well as more sp ecic work on

accommo dating multiple time granularities citeLadkinAWiederholdA

The possible chronons mo del unies treatment of b oth asp ects citeDyresonD

In this mo del an event is determinate if it is know when ie during which

chronon it o ccurred A determinate event cannot overlap two chronons If it is

unknown when an event o ccurred but known that it did o ccur then the event is

historically indeterminate The indeterminacy refers to the time when the event

o ccurred not whether the event did or did not o ccur

Two pieces of information completely describ e an indeterminate event a set

of possible chronons and an event probability distribution A single chronon from

the set of p ossible chronons denotes when the indeterminate event actually o c

curred However it is unknown which p ossible chronon is the actual one The

event probability distribution gives the probability that the event o ccurred dur

ing each chronon in the set of p ossible chronons

The implementation of the p ossible chronons mo del supp orts a xed minimal

chronon size Multiple granularities are handled by representing the indetermi

nacy explicitly For example if the underlying chronon is a microsecond and an

event is known to within a day then this indeterminate event would b e asso

ciated with a set of p ossible chronons and p erhaps a uniform event

probability distribution

As a practical matter events that o ccurred in the prehistoric past cannot

b e dated as precisely as events that o ccur in the present There is an implicit

telescoping view of time Dating of recent events can often b e done to the

millisecond while events that o ccurred million years ago can b e dated to

p erhaps at b est the nearest years Dating future events is also problem

atic It is imp ossible to say how many seconds will b e b etween Midnight January

and Midnight January b ecause we dont know how many leap

second will b e added to correct for changes in the rotational clo ck We can guess

at the numb er of seconds but leap shifts to the current clo ck are likely to

invalidate our guess

Historical indeterminacy o ccurs only in valid time The granularity of a trans

action time timestamp is the smallest intertransaction time Transaction times

are always determinate since the chronon during which a transaction takes place

is always known

Most of the ab ove may b e applied to space Information that is spatial ly in

determinate can b e characterized as dont know exactly where information It

is also prevalent due to granularity concerns measurement techniques and un

known or imprecise lo cation sp eciers One could envision an analogous p ossible

space quanta mo del that could capture the variety of spatial indeterminacy The

telescoping view phenomenon also o ccurs in space as distant lo cations are less

precisely known

As with time a spatial data granularity coarser than the database manage

granularity is often adopted One of the more common ment system DBMS

mo dels Typ e Sec covers the twodimensional space with a grid with

p oint lo cations and asso ciated attributes rep orted to the nearest cell center In

this mo del the data is a multiple of the underlying DBMS granularity For ex

ample the DBMS granularity might b e a meter with all lo cation sp eciers b eing

expressed in this unit while the grid cells may b e kilometers on a side

Representation

Since time and space are metrics a system of units is required to represent

particular events or lo cations A timestamp or lo cation sp ecier has a physical

realization and an interpretation The physical realization is a pattern of bits

while the interpretation is the meaning of each bit pattern that is the time or

lo cation each pattern represents

Interpretation For time the central unit is the second However there are at

least seven dierent denitions of this fundamental unit citeDyresonD

t solar of the interval from no on to no on varies from day Apparen

to day

Mean solar UT of a mean solar day averaged over a year

varies from year to year

Mean sidereal of a mean sidereal day measuring the rotation of

the Earth with resp ect to a distant star varies from year to year

corrected for p olar wander UT UT

UT UT corrected for seasonal variations

Ephemeris mean solar second for the year do es not vary This was

the standard denition from to

Units SI the duration of p eri International System of

o ds of the radiation corresp onding to the transition b etween the two hyp er

ne levels of cesium atoms citePetley

When a range of less than years is supp orted the dierences b etween

these denitions are generally inconsequential except for the apparent solar

second which varies by over the course of a year When ranges of several

billion years are supp orted however all of these denitions dier signicantly

Dierent regions of the time line are used by dierent communities For ex

ample apparent is imp ortant to historians who care ab out whether

something happ ened in the daylight or in darkness as well as to users of cadastral

real estate databases which utilize civil citeHunter Ephemeris

time is used by astronomers while the SI second is the basis for radioactive

dating used by geo chronologists Because of these dierent needs as well as the

telescoping view of time we have prop osed a sp ecic temp oral interpretation

termed the baseline that constructs a timeline by using dierent well

dened clo cks in dierent p erio ds This clo ck shown in Fig not to scale

partitions the time line into a set of contiguous periods Each p erio d runs on

a dierent clo ck A synchronization point where two clo cks are correlated de

limits a p erio d b oundary The synchronization p oints o ccur at Midnight on the sp ecied date

Ephemeris Mean Solar Days UTC TDT Time

Dawn of Time Past UTC/TAI Future (Moving) End of Time? (The Big Bang) Synchronization Synchronization Synchronization (14,000,000,000 B.C. Point Point Point

+/- 4,000,000,000) (1/1/9,000 B.C.) (A.D. 1/1/1972) (Currently A.D. 7/1/1992)

Fig The baseline clo ck

From the Big Bang until Midnight January BC the baseline clo ck

runs on ephemeris time This clo ck is preferable to the solar clo ck since ephemeris

time is indep endent of the formation of the Earth and the Solar System Also

we prefer using the ephemeris clo ck to the solar clo ck b ecause an ephemeris year

is a xed duration unlike the tropical year For historic events BC to

January the baseline clo ck follows the mean solar day clo ck Historic

events are usually dated with calendars dates invariably count days

and use an intercalation rule to relate the numb er of days to longerterm celestial

clo cks eg the relates days to months and tropical years At

Midnight January the baseline clo ck switches to Universal Co ordinated

Time UTC Midnight January is when UTC was synchronized with

the SI denition of second and the current system of leap seconds was adopted

The baseline clo ck runs on UTC until one second b efore Midnight July

This is the next time at which a leap second may b e added a leap second

will b e added on this date according to the latest International Earth Rotation

Service bulletin citeUSNO After Midnight July until the Big

Crunch or the end of our baseline clo ck the baseline clo ck follows Terrestrial

Dynamic Time TDT an idealized atomic time citeGuinot based on the

SI second since b oth UTC and mean solar time are unknown and unpredictable

The situation is much simpler for space Here the SI meter is the com

monly accepted unit with a single accepted denition the length of the path

traveled by light in vacuum during a time interval of of a second

citePetley Distance is dened in terms of time rather than the other way

10

over long around b ecause time can b e measured more accurately part in

15

intervals and part in for b etween a minute and a day citeQuinnRamseyC

The baseline clo ck and its representation are indep endent of any calendar

We used Gregorian calendar dates in the ab ove discussion only to provide an

informal indication of when the synchronization p oints o ccurred Many calendar

systems are in use to day example calendars include academic years consists of

semesters common scal nancial year b eginning at the New Y ear federal

scal nancial year b eginning the rst of Octob er and time card hour days

and day weeks yearround The usage of a calendar dep ends on the cultural

legal and even business orientation of the user citeSo o A DBMS attempt

ing to supp ort time values must b e capable of supp orting all the multiple notions

of time that are of interest to the user p opulation

Space also has multiple notions though with less variability The metric

US and nautical unit systems are the most prevalent Both time and space

have precisely dened underlying semantics that may b e mapp ed to multiple

display formats The spatial baseline is less complex than that for time it

consists of a single measure the meter

ysical Realization The baseline clo ck denes the meaning of each time Ph

stamp bit pattern in the physical realization of a timestamp The chronons of

the baseline clo ck are the chronons in its constituent clo cks We assume that

each chronon is one second in the underlying constituent clo ck A chronon may

b e denoted by an integer corresp onding to a single DBMS granularity or it

may b e denoted by a sequence of integers corresp onding to a nested granularity

For example if we assume a granularity of a second relative to Midnight January

then in a single granularity the integer denotes AM

year month day hour March If we assume a nested granularity of h

minute secondi then the sequence hi denotes that same time

Various timestamps are in use in commercial database management sys

tems and op erating systems a summary is provided in Table This table com

pares formats from op erating systems sp ecically Unix MSDOS and the Mac

Intosh op erating systems the database systems DB citeDateB SQL

citeDateAMelton and several prop osed formats to b e discussed shortly

The SQL datetime timestamp app ears twice in the comparison once with its optional fractional second precision eld set to microseconds and once without

the optional eld The last three representations have recently b een prop osed

and will b e discussed shortly

SYSTEM Size Range Granularity

bytes

OS several  years second

DB date years day

DB time hours second

DB timestamp years microsecond

SQL datetime years second

SQL fractional datetime years microsecond

Low Resolution  billion years second

High Resolution  years microsecond

Extended Resolution  billion years nanosecond

SYSTEM Numb er of Bytes Space

Comp onents Needed Eciency

OS several

DB date

DB time

DB timestamp

SQL datetime

SQL fractional datetime

Low Resolution

High Resolution

Extended Resolution

Table A comparison of some physical layouts

Size is the numb er of bytes devoted to the representation while range refers

to the dierence b etween the youngest and oldest time values that can b e rep

resented The granularity of a timestamp is the precision to which a time value

can b e represented If a representation has more than one component it is of a

representation in which nested granularity The extreme is the DB timestamp

the year month day hour minute second and numb er of microseconds are

is a measure of how much of the rep individually represented Space eciency

resentation is actually needed It is computed as a p ercentage of the numb er of

bits needed to represent every chronon in the temp oral interpretation DB and

SQL b oth use the Gregorian calendar temp oral interpretation versus the num

b er of bits devoted to the physical realization The minimum numb er of bytes

needed to store the numb er of chronons dictated by a timestamps granularity

and range is shown as a separate column For instance SQLs datetime time

stamp uses bytes but only bytes of space are needed to store a range of

years to the granularity of a second

The evaluated timestamps fall into two camps OSstyle timestamps and

databasestyle timestamps OSstyle timestamps have a limited range and gran

ularity these limitations are dictated by the size of the timestamp OSstyle

t having a single granularity The time timestamps are maximally space ecien

stamp itself is merely a count of the numb er of chronons that have elapsed since

the origin in the temp oral interpretation But optimal space eciency is attained

at the exp ense of some time eciency

In contrast databasestyle timestamps as exemplied by the DB timestamp

format are generally larger than OSstyle timestamps they have a wider range

and ner granularity But as a group they also have p o orer space utilization

having a nested granularity The advantage of representing values separately is

that they can b e quickly accessed Extracting the numb er of years from an OS

style timestamp is more involved than p erforming a similar task on an DB

timestamp

These existing timestamp representations suer from inadequate range to o

coarse a granularity excessive space requirements or a combination of these

drawbacks Finally none of the timestamps are able to represent historical in

determinacy Consequently we recently prop osed new timestamp formats in

corp orating features from b oth the OSstyle and databasestyle timestamps

These formats combine high space eciency with high time eciency for fre

quent timestamp op erations citeDyresonD

There is a natural tradeo b etween range and granularity in timestamp

development Using the same numb er of bits a timestamp designer can make

the granularity coarser to extend the range or she can limit the range to supp ort

ner granularities These observations imply that a format based on a single

bit word is inadequate for our purp oses there are simply not enough bits

Since we wanted to keep the timestamp formats on bit word b oundaries we

allo cated the next word increment bits to our basic format Using bits

it is p ossible to represent all of time that is a range of billion years to the

granularity of a second and a range of historical time to granularities much ner

than a second

There are three basic typ es of timestamps events spans and intervals

citeSOO We develop ed three new event timestamp formats with dierent

resolutions high low and extended Resolution is a rough measure of a time

stamp precision The low resolution format can represent times to the precision

of a second High resolution narrows the precision to a microsecond while ex

tended resolution is even more precise it can represent times to the precision

of a nanosecond High resolution has limited range but extended precision while

low resolution has extended range but limited precision Extended resolution

handles those uncommon cases where the user wants b oth an extended range

and an extended precision at the cost of an extra word of storage

Interval timestamp formats are simply two event timestamps one for the

starting event of the interval and one for the terminating event of the interval We

use this representation b ecause all op erations on intervals are actually op erations

on their delimiting events citeSOOA There are sixteen interval timestamp

formats in toto The typ e elds in the delimiting event timestamps distinguish

each format

Spans are relative times There are two kinds of spans xed and variable

citeSOO A xed span is a count of chronons It represents a xed duration

in terms of chronons on the baseline clo ck b etween two time values The

xed span formats use exactly the same layouts as the standard event formats

with a dierent interpretation The chronon count in the span representation

is indep endent of the origin instead of b eing interpreted as a count from the

origin The sign bit indicates whether the span is p ositive or negative rather

than indicating the direction from the origin

A variable span s duration is dep endent on an asso ciated event A common

variable span is a month The duration represented by a month dep ends on

whether that month is asso ciated with an event in June days or in July

days or even in February or days Variable spans use a sp ecialized

format requiring bits

To represent indeterminate events we added nine formats three of each

resolution There are three analogous formats for low resolution and three for

extended resolution The most common for high resolution with a uniform dis

tribution requires only bits

As we have seen in other areas the considerations for space are similar yet

considerably simpler A spatial representation of bits p er dimension for a

range required to map the Earth results in a granularity of one decimeter and

of one centimeter for the third dimension the atmosphere the o ceans and the

Earths interior or for restricted areas such as the United States or Europ e

Moving up to bits makes spatial indeterminacy representations feasible and

reduces the granularity to a nanometer which should b e adequate for quite a

while

Asso ciating Facts with Time

The previous section explored mo dels and representations for the time domain

We now turn to asso ciating time with facts

Underlying Data Mo del

Time has b een added to many data mo dels the entityrelationship mo del citeDeAntonellisKlopprogge

semantic data mo dels citeHammerUrban knowledgebased data mo dels

citeDayalA deductive databases citeChomickiChomickiAChomickiKabanza

and ob jectoriented mo dels citeDayalManolaANarasimhaluRoseScioreScioreAWuu

However by far the ma jority of work in temp oral databases is based on the re

lational mo del For this reason we will assume this data mo del in subsequent discussion

Attribute Variability

There are several basic ways in which an attribute asso ciated with an ob ject

can interact with time and space A timeinvariant attribute citeNavathe

do es not change over time Some temp oral data mo dels require that the key

of a relation b e timeinvariant some others identify the ob jects participat

ing in the relation with a timeinvariant surrogate a systemgenerated unique

identier of an item that can b e referenced and compared for equality but not

displayed to the user citeHall Secondly the value of an attribute may b e

drawn from a temp oral domain An example is date stamping where cadastral

parcel records in a land information system contain elds that note the dates

of registration of deeds transfers of titles and other p ertinent historical in

formation citeVrana Such temp oral domains are termed userdened time

citeSno dgrassA other than b eing able to b e read in displayed and p erhaps

compared no sp ecial semantics is asso ciated with such domains Interestingly

most such attributes are timeinvariant For example the transfer date for a

particular title transfer is valid over all time

An analogous sitation exists for space There are spaceinvariant attributes

as well as attributes that are drawn from spatial domains an example b eing

an attribute recording the square feet of a residence which is a relative spatial

measure in two dimensions

Time and spacevarying attributes are more interesting There are ve basic

cases to consider

The value of an attribute asso ciated with a spaceinvariant ob ject may vary

over time termed attribute temporality citeVrana An example is p er

centage of cloud cover over the Earth which has a single value at each p oint

in time but varies over time The value is uniquely sp ecied by the temp oral

co ordinates either valid time or a combination of valid and transaction

time

The value of an attribute asso ciated with a region in space may b e time

invariant An example is elevation the value varies spatially but not temp o

rally assuming historical time certainly the elevation varies over geologic

The value is unique given spatial co ordinates time

The value of an attribute asso ciated with a region in space may vary over

time An example is the p ercentage of cloud cover over each kilometer

square grid element Here the ob ject is identied spatially and each ob ject

is asso ciated with a timevarying sequence of values Both the temp oral and

the spatial co ordinates are required to uniquely identify a value

The b oundary lines identifying a cadastral ob ject eg a particular land

packet may vary over time termed temporal topology citeVrana The

orientation and interaction of spatial ob jects change over time such ob jects

could nevertheless have timeinvariant attributes such as initial purchase

price The temp oral and spatial co ordinates are required to uniquely identify

a value but this identication is indirect via the top ology

The nal case is the most complex the value of an attribute which varies

over time is asso ciated with a cartographic feature that also varies over time

citeLangran An example is the appraised value of a land packet The

appraised value may change yearly and the b oundary of the land packet

changes as it is reapp ortioned and parts sold to others As with the previous

two cases b oth temp oral and spatial co ordinates are required to identify a

value but the temp oral co ordinates are utilized twice rst to identify a

cartographic feature and then to select a particular attribute value asso ciated

with that feature

A further categorization is p ossible concerning which temp oral domains are in

volved only valid time only transaction time or b oth and which spatial do

1

mains are involved two dimensions dimensions or a full three dimensions

2

The rst case is the traditional domain of temp oral DBMSs The second case

is the domain of conventional LISs and GISs While there has b een some con

ceptual work on merging temp oral and spatial supp ort as discussed throughout

this pap er current implemented systems are fairly weak in this regard

es Representational Alternativ

Over two dozen extensions to the relational mo del to incorp orate time have

b een prop osed over the last years These mo dels may b e compared by ask

ing four basic questions how is valid time represented how is transaction time

represented how are attribute values represented and is the mo del homoge

neous ie are all attributes restricted to b e dened over the same valid times

citeGadiaB

Data Mo dels Table lists most of the temp oral data mo dels that have b een

prop osed to date If the mo del is not given a name we appropriate the name

given the asso ciated query language where available Many mo dels are describ ed

in several pap ers the one referenced is the initial journal pap er in which the

mo del was dened Some mo dels are dened only over valid time or transac

tion time others are dened over b oth Whether the mo del is homogeneous is

indicated in the next column Tupletimestamp ed data mo dels to b e identied

in the next section and data mo dels that use single chronons as timestamps

are of necessity homogeneous The issue of homogeneity is not relevant for those

data mo dels supp orting only transaction time The last column indicates a short

identier which denotes the mo del the table is sorted on this column

We omit a few intermediate data mo dels sp ecically Gadias multihomo

geneous mo del citeGadiaA which was a precursor to his heterogeneous

mo del Gadia and Gadias twodimensional temp oral relational database

mo del citeBhargavaB which is a precursor to Gadia We also do not

include the data mo del used as the basis for dening temp oral relational com

pleteness citeTuzhilin b ecause it is a generic data mo del that do es not

force decisions on most of the asp ects to b e discussed here

More detail on these data mo dels including a comprehensive comparison

may b e found elsewhere citeMcKenzieBSno dgrassA

Data Mo del Citation Temp oral Homogeneous Identier

Dimensions

citeSno dgrassA b oth yes Ahn

Temp orally Oriented Data Mo del citeAriavA b oth yes Ariav

Time Relational Mo del citeBenZvi b oth yes BenZvi

Historical Data Mo del citeCliord valid yes Cliord

Historical Relational Data Mo del citeCliordA valid no Cliord

Homogeneous Relational Mo del citeGadiaB valid yes Gadia

Heterogeneous Relational Mo del citeGadiaA valid no Gadia

TempSQL citeGadia b oth yes Gadia

DMT citeJensenD transaction NA Jensen

LEGOL citeJones valid yes Jones

DATA citeKimball transaction NA Kimball

citeLometA transaction NA Lomet

Temp oral Relational Mo del citeLorentzosB valid no Lorentzos

citeLum transaction yes Lum

citeMcKenzieC b oth no McKenzie

Temp oral Relational Mo del citeNavathe valid yes Navathe

HQL citeSadeghiB valid yes Sadeghi

HSQL citeSarda valid yes Sarda

Temp oral Data Mo del citeSegev valid yes Shoshani

TQuel citeSno dgrassA b oth yes Sno dgrass

Postgres citeStonebrakerD transaction no Stonebraker

HQuel citeTanselB valid no Tansel

Accounting Data Mo del citeThompsonA b oth yes Thompson

Time Oriented Databank Mo del citeWiederhold valid yes Wiederhold

Table Temp oral Data Mo dels

Valid Time Two fairly orthogonal asp ects are involved in representing valid

time First is valid time represented with single chronon identiers ie event

timestamps Sec with intervals ie as interval timestamps Sec or

as historical elements ie as a set of chronon identiers or equivalently as a

nite set of intervals Second is valid time asso ciated with entire tuples or

with individual attribute values A third alternative asso ciating valid time with

sets of tuples ie relations has not b een incorp orated into any of the prop osed

data mo dels primarily b ecause it lends itself to high data redundancy The

data mo dels are elevated on these two asp ects in Table Interestingly only

one quadrant timestamping tuples with an historical element has not b een

considered but see Sec

Transaction Time The same general issues are involved in transaction time

but there are ab out twice as many alternatives Transaction time may b e asso ciated with

Single chronon Interval Historical element

pair of chronons set of chronons

Timestamp ed Lorentzos Gadia Cliord

attribute Thompson McKenzie Gadia

values Tansel Gadia

Ariav Ahn

Cliord BenZvi

Timestamp ed Lum Jones

tuples Sadeghi Navathe

Shoshani Sarda

Wiederhold Sno dgrass

Table Representation of Valid Time

a single chronon which implies that tuples inserted on each transaction

signify the termination logical deletion of previously current tuples with

identical keys with the timestamps of these previously recorded tuples not

requiring change

an interval A newly inserted tuple would b e asso ciated with the interval

starting at now and ending at the sp ecial value UC untilchanged

three chronons BenZvis mo del records the transaction time when the

valid start time was recorded the transaction time when the valid stop

time was recorded and the transaction time when the tuple was logically

deleted

a transactiontime element which is a set of notnecessarilycontigous chronons

Another issue concerns whether transaction time is asso ciated with individual

attribute values with tuples or with sets of tuples

The choices made in the various data mo dels are characterized in Table

Gadia is the only data mo del to timestamp attribute values it is dicult

to eciently implement this alternative directly Gadia also is the only data

mo del that uses transactiontime elements but see Sec BenZvi is the

only one to use three transactiontime chronons All of the rows and columns

are represented by at least one data mo del

Attribute Value Structure The nal ma jor decision to b e made in designing

a temp oral data mo del is how to represent attribute values There are six basic

alternatives In some mo dels the timestamp app ears as an explicit attribute

we do not consider such attributes in this analysis

A tomic valuedvalues do not have any internal structure Ariav BenZvi

Cliord Jensen Jones Kimball Lomet Lorentzos Lum Navathe Sadeghi

Sarda Shoshani Sno dgrass Stonebraker and Thompson all adopt this ap

proach Tansel allows atomic values as well as others listed b elow

Single chronon Interval Three Transactiontime element

pair of chronons Chronons set of chronons

Timestamp ed

attribute Gadia

values

Ariav

Timestamp ed Jensen Sno dgrass BenZvi

tuples Kimball Stonebraker

Lomet

Timestamp ed Ahn McKenzie

sets of tuples Thompson

Table Representation of Transaction Time

Set valuedvalues are sets of atomic values Tansel supp orts this represen

tation

atomic valuedvalues are functions from the generally valid Functional

time domain to the attribute domain Cliord Gadia Gadia and

Gadia adopt this approach

Ordered pairsvalues are an ordered pair of a value and a historical ele

ment timestamp McKenzie adopts this approach

Triplet valuedvalues are a triple of attribute value valid from time and

value to time This is similar to the ordered pairs representation except that

only one interval may b e represented Tansel supp orts this representation

Settriplet valuedvalues are a set of triplets This is more general than

ordered pairs in that more than one value can b e represented and more

general than functional valued since more than one attribute value can exist

at a single valid time citeTanselB Tansel supp orts this representation

In the conventional relational mo del if attributes are atomicvalued they are

considered to b e in rst normal form citeCo ddB Hence only the data

mo dels placed in the rst category may b e considered to b e strictly in rst normal

form However in several of the other mo dels the nonatomicity of attribute

values comes ab out b ecause time is added It turns out that the prop erty of

all snapshots are in rst normal form is closely asso ciated with homogeneity

Sec

Separating Semantics from Representation It is our contention that fo

cusing on data presentation how temp oral data is displayed to the user on

data storage with its requisite demands of regular structure and on ecient

query evaluation has complicated the central task of capturing the timevarying

semantics of data The result has b een as we have seen a plethora of incom

patible data mo dels with many query languages Sec and a corresp onding

surfeit of database design and implementation strategies that may b e employed across these mo dels

We advo cate instead a very simple conceptual temporal data model that

captures the essential semantics of timevarying relations but has no illusions

of b eing suitable for presentation storage or query evaluation Existing data

mo dels may b e used for these latter tasks This conceptual mo del timestamps

tuples with bitemp oral elements sets of bitemporal chronons which are rectan

gles in the twodimensional space spanned by valid time and transaction time

see Fig Because no two tuples with mutually identical explicit attribute

values termed valueequivalent tuples are allowed in a bitemp oral relation in

stance the full time of a fact is contained in a single tuple

In Table the conceptual temp oral data mo del o ccupies the unlled entry

corresp onding to timestamping tuples with historical elements and o ccupies

the entry in Table corresp onding to timestamping tuples with transaction

time elements An imp ortant prop erty of the conceptual mo del shared with the

conventional relational mo del but not held by the representational mo dels is

that relation instances are semantically unique distinct instances mo del dierent

realities and thus have distinct semantics

It is p ossible to demonstrate equivalence mappings b etween the conceptual

mo del and several representational mo dels citeJensenA Mappings have

already b een demonstrated for three data mo dels Gadia attribute time

stamping Jensen tuple timestamping with a single transaction chronon and

Sno dgrass tuple timestamping with interval valid and transaction times This

equivalence is based on snapshot equivalence which says that two relation in

stances are equivalent if all their snapshots taken at all times valid and transac

tion are identical Snapshot equivalence provides a natural means of comparing

rather disparate representations An extension to the conventional relational al

gebraic op erators may b e dened in the conceptual data mo del and can b e

mapp ed to analogous op erators in the representational mo dels Finally we feel

that the conceptual data mo del is the appropriate lo cation for database design

and query optimization

In essence we advo cate moving the distinction b etween the various existing

temp oral data mo dels from a semantic basis to a physical p erformancerelevant

basis utilizing the prop osed conceptual data mo del to capture the timevarying

semantics Data presentation storage representation and timevarying seman

tics should b e considered in isolation utilizing dierent data mo dels Semantics

sp ecically as determined by logical database design should b e expressed in the

conceptual mo del Multiple presentation formats should b e available as dierent

applications require dierent ways of viewing the data The storage and pro cess

ing of bitemp oral relations should b e done in a data mo del that emphasizes

eciency

Querying

A data mo del consists of a set of ob jects with some structure a set of constraints

on those ob jects and a set of op erations on those ob jects citeTsichritzis In

the two previous sectins we have investigated in detail the structure of and con

straints on the ob jects of temp oral relational databases the temp oral relation

In this section we complete the picture by discussing the op erations sp ecically

temp oral query languages

Many temp oral query languages have b een prop osed In fact it seems oblig

atory for each researcher to dene their own data mo del and query language we

return to this issue at the end of this section We rst summarize the twenty

o dd query languages that have b een prop osed thus far We then briey discuss

the various activities that should b e supp orted by a temp oral query language

using a sp ecic language in the examples Finally we touch on work b eing done

in the standards arena that is attempting to bring highly needed order to this

confusing collection of languages

We do not consider the related topic of temporal reasoning also termed infer

encing or rulebased search citeChomickiKahnAKarlssonALeeShengSripada

that uses articial intelligence techniques to p erform more sophisticated analyses

of temp oral relationships generally with much lower query pro cessing eciency

Language Prop osals

Table lists the ma jor temp oral query language prop osals to date While many

of these languages each have several asso ciated pap ers we have indicated the

most comprehensive or most readily available reference The underlying data

mo del is a reference to Table The next column lists the conventional query

language the temp oral prop osal is based on from the following

Structured Query Language citeDateB a tuple calculusbased lan SQL

guage the lingua franca of conventional relational databases

Quel The tuple calculus based query language citeHeld originally dened

for the Ingres relational DBMS citeStonebrakerA

QBE QuerybyExample citeZlo of a domain calculus based query lan

guage

L An intensional logic formulated in the context of computational linguistics I

s

citeMontague

relational algebra A pro cedural language with relations as ob jects citeCo dd

DEAL An extension of the relational algebra incorp orating functions recursion

and deduction citeDeen

Most of the query languages have a formal denition Some of the calculusbased

query languages have an asso ciated algebra that provides a means of evaluating

queries

More comprehensive comparisons may b e found elsewhere citeMcKenzieBSno dgrassA

Typ es of Temp oral Queries

We now examine the typ es of temp oral queries that each of the ab ovelisted query

languages supp ort to varying degrees Well use TQuel citeSno dgrassA in the examples as it is the most completely dened temp oral language citeSno dgrassA

Name Citation Underlying Based Formal Equivalent

Data Mo del On Semantics Algebra

HQL citeSadeghiA Sadeghi DEAL partial citeSadeghiB

HQuel citeTanselB Tansel Quel yes citeTanselB

HSQL citeSarda Sarda SQL no citeSardaB

HTQuel citeGadiaB Gadia Quel yes citeGadiaB

Legol citeJones Jones relational no NA

algebra

Postquel citeStonebrakerB Stonebraker Quel no none

TDM citeSegev Shoshani SQL no none

Temp oral Rela citeLorentzosB Lorentzos relational yes NA

tional Algebra algebra

TempSQL citeGadia Gadia SQL partial none

TimeByExample citeTansel Tansel QBE yes citeTanselB

TOSQL citeAriavA Ariav SQL no none

TQuel citeSno dgrassA Sno dgrass Quel yes citeMcKenzieC

TSQL citeNavathe Navathe SQL no none

citeBenZvi BenZvi SQL yes citeBenZvi

citeCliord Cliord I L yes NA

s

citeCliordA Cliord relational yes NA

algebra

citeGadiaA Gadia Quel no none

citeJensenA Jensen relational yes NA

algebra

citeMcKenzieC McKenzie relational yes NA

algebra

citeThompsonA Thompson relational yes NA

algebra

citeTuzhilin several relational yes NA

algebra

Table Temp oral query languages

Schema Denition We will use one relation in these examples

relation Example Dene the Cities

create persistent interval CitiesName is char State is char

Population is I IncorporationDate is event

Size is area

Cities has ve explicit attributes two strings denoted by char a byte inte

ger denoted by I a userdened event and a userdened area The persis

tent and interval keywords sp ecify a bitemp oral relation with four implicit

timestamp attributes a valid start time a valid end time a transaction start

time and a transaction end time The valid timestamps dene the interval when

the attribute values were true in reality and the transaction timestamps sp ecify

the interval when the information was current in the database 2

Quel Retrieval Statements Since TQuel is a strict sup erset of Quel all Quel

queries are also TQuel queries citeSno dgrassA Here we give one such query

as a review of Quel

The query uses a range statement to sp ecify the tuple variable C which will

remain active for use in subsequent queries

Example What is the current population of the cities in Arizona

range of C is Cities

retrieve CName CPopulation

where CState Arizona

The target list sp ecies which attributes of the qualifying tuples are to b e retained

in the result and the where clause sp ecies which underlying tuples from the

underlying relations qualify to participate in the query Because the defaults

have b een dened appropriately each TQuel query yields the same result as its

Quel counterpart 2

Transactiontime Slice The as of clause rolls back a trans Rollback

actiontime relation consisting of a sequence of snapshot relation states or a

bitemp oral relation consisting of a sequence of validtime relation states to the

state that was current at the sp ecied transaction time It can b e considered to

b e a transaction time analogue of the where clause restricting the underlying

tuples that participate in the query

Example What was the population of Arizonas cities as best known in

retrieve CName CPopulation

where CState Arizona

as of begin of January

This query uses an event temp oral constant delimited with vertical bars

TQuel supp orts multiple calendars and calendric systems citeSo o So oA

So oC In this case the default is the Gregorian calendar with English month

names 2

Validtime Selection The when clause is the validtime analogue of the where

clause it sp ecies a predicate on the event or interval timestamps of the under

lying tuples that must b e satised for those tuples to participate in the remainder

of the pro cessing of the query

Example What was the population of the cities in Arizona in as best

known right now

retrieve CName CPopulation

where CState Arizona

when C overlap January

as of present

A careful examination of the prose statement of this and the previous query

illustrates the fundamental dierence b etween valid time and transaction time

The as of clause selects a particular transaction time and thus rol ls back the

relation to its state stored at the sp ecied time Corrections stored after that time

will not b e incorp orated into the retrieved result The particular when statement

given here selects the facts valid in reality at the sp ecied time All corrections

stored up to the time the query was issued are incorp orated into the result In

this case all corrections made after to the census of will b e included

in the resulting relation 2

Example What was the population of the cities in Arizona in as best

known at that time

retrieve CName CPopulation

where CState Arizona

when C overlap January

as of January

The result of this query executed any time after January will b e identical

to the result of the rst query sp ecied What is the current population of the

cities in Arizona executed exactly on midnight of that date 2

The valid clause serves the same purp ose as the Valid Time Pro jection

target list it sp ecies some asp ect of the derived tuples in this case the valid

time of the derived tuple

Example For what date is the most recent information on Arizonas cities

valid

retrieve CAll

valid at begin of C

where CState Arizona

This query extracts relevant events from an interval relation 2

Aggregates As TQuel is a sup erset of Quel all Quel aggregates are still avail

able citeSno dgrassA

Example What is the current population of Arizona

retrieve sumCPopulation where CState Arizona

Note that this query only counts city residents 2

This query applied to a bitemp oral relation yields the same result as its

conventional analogues that is a single value With just a little more work we

can extract its timevarying b ehavior

Example How has the population of Arizona ucuated over time

retrieve sumCPopulation where CState Arizona

when true 2

New temp orallyoriented aggregates are also available in TQuel One of the

most useful computes the interval when the argument was rising in value This

aggregate may b e used wherever an interval expression is exp ected

Example For each growing city when did it start growing

retrieve CName

ion by CName valid at begin of risingCPopulat

where CState Arizona 2

Indeterminacy Indeterminacy asp ects can hold for individual tu Historical

ples or for all the tuples in a relation

Example The information in the Cities relation is known only to within thirty

days

cities to indeterminate span days modify

days is a span an unanchored length of time citeSo oC Spans can

b e created by taking the dierence of two events spans can also b e added to an

event to obtain a new event 2

Example What cities in Arizona denitely had a population over at

the beginning of

retrieve CName

where CState Arizona and CPopulation

when C overlap January

The default is to only retrieve tuples that fully satisfy the predicate This is

consistent with the Quel semantics 2

Historical indeterminacy enters queries at two places sp ecifying the credibil

ity of the underlying information to b e utilized in the query and sp ecifying the

plausibility of temp oral relationships expressed in the when and valid clauses

Well only illustrate plausibility here

Example What cities in Arizona had a population over probably at the

beginning of

retrieve CName

where CState Arizona and CPopulation

when C overlap January probably

is syntactic sugar for with plausibility 2 Here probably

Schema Evolution Often the needs to b e mo died to ac

commo date a changing set of applications The modify statement has several

variants allowing any previous decision to b e later changed or undone Schema

evolution involves transaction time as it concerns how the data is stored in the

database citeMcKenzieA As an example changing the typ e of a relation

from a bitemp oral relation to an historical relation will cause future intermedi

ate states to not b e recorded states stored when the relation was a temp oral

relation will still b e available

Example The Cities relation should no longer record al l errors

to not persistent 2 modify Stocks

Standards

Supp ort for time in conventional data base systems eg citeENFORM

ORACLE is entirely at the level of userdened time ie attribute values

drawn from a temp oral domain These implementations are limited in scop e

and are in general unsystematic in their design citeDATE DATEB

The standards b o dies eg ANSI are somewhat b ehind the curve in that SQL

includes no time supp ort Date and time supp ort very similar to that in DB is

currently b eing prop osed for SQL citeMELTON SQL corrects some of

the inconsistencies in the time supp ort provided by DB but inherits its basic

design limitations citeSo oC

An eort is currently underway within the research community to consolidate

approaches to temp oral data mo dels and calculusbased query languages to

achieve a consensus extension to SQL and an asso ciated data mo del up on which

future research can b e based This extension is termed the Temporal Structured

Query Language or TSQL not to b e confused with an existing language prop osal

of the same name

System Architecture

The three previous sections in concert sketched the b oundaries of a temp oral

data mo del by examining the temp oral domain how facts may b e asso ciated

with time and how temp oral information may b e queried We now turn to

the implementation of the temp oral data mo del as encapsulated in a temp oral

DBMS

Adding temp oral supp ort to a DBMS impacts virtually all of its comp o

nents Figure provides a simplied architecture for a conventional DBMS The

database administrator DBA and her sta design the database pro ducing a

physical schema sp ecied in a data denition language DDL which is pro cessed

by the DDL Compiler and stored generally as system relations in the System

Catalog Users prepare queries either ad ho c or emb edded in pro cedural co de

ocessor The query is rst lexically and syn which is submitted to the Query Pr tactically analyzed using information from the system catalog then optimized

for ecient execution A query evaluation plan is sent to the Query Evalua

tor For ad ho c queries this o ccurs immediately after pro cessing for emb edded

queries this o ccurs when the cursor asso ciated with a particular query is op ened

The query evaluator is usually an interpreter for a form of the relational algebra

annotated with access metho ds and op erator strategies While evaluating the

query this comp onent accesses the database via a Stored Data Manager which

implements concurrency control transaction management recovery buering

and the available data access metho ds

Fig Comp onents of a data base management system

In the following we visit each of these comp onents in turn reviewing what

changes need to b e made to add temp oral supp ort

DDL Statements

Relational query languages such as Quel and SQL actually do much more than

simply sp ecify queries they also serve as data denition languages eg through

Quels create statement cf Sec and as data manipulation languages

eg through SQLs INSERT DELETE and UPDATE statements The changes to

supp ort time involve adding temp oral domains such as event interval and span

citeSo oC and adding constructs to sp ecify supp ort for transaction and valid

time such as the TQuel keywords persistent and interval

System Catalog

The big change here is that the system catalog must consist of transactiontime

relations Schema evolution concerns only the recording of the data and hence

do es not involve valid time The attributes and their domains the indexes even

the names of the relations all vary over transaction time

Query Pro cessing

There are two asp ects here one easily extended language analysis and one for

which adding temp oral supp ort is much more complex query optimization

Language analysis needs to consider multiple calendars which extend the

language with calendarsp ecic functions An example is monthof which only

makes sense in calendars for which there are months The changes to language

pro cessing primarily involving mo dications to semantic analysis name resolu

tion and typ e checking have b een worked out in some detail citeSo oA

Optimization of temp oral queries is substantially more involved than that for

conventional queries for several reasons First optimization of temp oral queries

is more critical and thus easier to justify exp ending eort on than conventional

optimization The relations that temp oral queries are dened over are larger

and are growing monotonically with the result that unoptimized queries take

longer and longer to execute This justies trying harder to optimize the queries

and sp ending more execution time to p erform the optimization

Second the predicates used in temp oral queries are harder to optimize citeLeungLeungA

In traditional database applications predicates are usually equality predicates

hence the prevalence of equijoins and natural joins if a lessthan join is in

volved it is rarely in combination with other lessthan predicates On the other

hand in temp oral queries lessthan joins app ear more frequently as a conjunc

tion of several inequality predicates As an example the TQuel overlap op erator

is translated into two lessthan predicates on the underlying timestamps Opti

mization techniques in conventional databases fo cus on equality predicates and

often implement inequality joins as cartesian pro ducts with their asso ciated

ineciency

And third there is greater opp ortunity for query optimization when time

is present citeLeungA Time advances in one direction the time domain

is continuous expanding and the most recent time p oint is the largest value in

the domain This implies that a natural clustering or sort order will manifest

itself which can b e exploited during query optimization and evaluation The

endof t holds for every timeinterval tuple t integrity constraint beginof t

Also for many relations it is the case that the intervals asso ciated with a key

are contiguous in time with one interval starting exactly when the previous

interval ended An example is salary data where the intervals asso ciated with

the salaries for each employee are contiguous Semantic query optimization can

exploit these integrity constraints as well as additional ones that can b e inferred

citeShenoy

In this section we rst examine lo cal query optimization of a single query

then consider global query optimization of several queries simultaneously Both

involve the generation of a query evaluation plan which consists of an algebraic

expression annotated with access metho ds

A single query can b e optimizing by replacing Lo cal Query Optimization

the algebraic expression with an equivalent one that is more ecient by changing

an access metho d asso ciated with a particular op erator or by adopting a partic

ular implementation of an op erator The rst alternative requires a denition of

equivalence in the form of a set of tautologies Tautologies have b een identied

for the conventional relational algebra citeEndertonSmithBUllmanB

as well as for many of the algebras listed in Table Some of these temp oral al

gebras supp ort the standard tautologies enabling existing query optimizers to

b e used

To determine which access metho d is b est for each algebraic op erator meta

data that is statistics on the stored temp oral data and cost models that is

predictors of the execution cost for each op erator implementationaccess metho d

combination are needed Temp oral data requires additional metadata such as

lifespan of a relation the time interval over which the relation is dened the

lifespans of the tuples the surrogate and tuple arrival distributions the distri

butions of the timevarying attributes regularity and granularity of temp oral

data and the frequency of null values which are sometimes intro duced when

attributes within a tuple arent synchronized citeGunadhiA Such statisti

cal data may b e up dated by random sampling or by a scan through the entire

relation

There has b een some work in developing cost mo dels for temp oral op

tors An extensive analytical mo del has b een develop ed and validated for TQuel

queries citeAhnBAhn and selectivity estimates on the size of the results

of various temp oral joins have b een derived citeGunadhiDGunadhiA

Query Optimization In global query optimization a collection of Global

queries is simultaneously optimized the goal b eing to pro duce a single query

evaluation plan that is more ecient than the collection of individual plans

citeSatohASellisA A state transition network app ears to b e the b est way

to organize this complex task citeJensenG Materialized views citeBlakeleyBBlakeleyRoussop oulosRoussop oulos

are exp ected to play an imp ortant role in achieving high p erformance in the

face of temp oral databases of monotonically increasing size For an algebra

to utilize this approach incremental forms of the op erators are required cf citeMcKenzieJensenD

Query Evaluation

Achieving adequate eciency in query evaluation is very imp ortant We rst ex

amine op erations on timestamps some of which are critical to high p erformance

We then review a study that showed that a straightforward implementation

would not result in reasonable p erformance Since joins are the most exp ensive

yet very common op erations they have b een the fo cus of a signicant amount

of research Finally we will examine the many temp oral indexes that have b een

prop osed

Domain Op erations In Sec we outlined the domain of timestamps Query

evaluation p erforms input comparison arithmetic and output op erations on

values of this domain Ordered by contribution to execution eciency they are

comparison which is often in the inner lo op of join pro cessing arithmetic

which is most often p erformed during creation of the resulting tuple output

which is only done when transferring results to the screen or to pap er a much

slower pro cess than execution or even disk IO and nally input which is done

exactly once p er value However the SQL format with its ve comp onents see

Table is optimized for the relatively infrequent op erations of Gregorian in

put and output and is rather slow at comparison and addition The prop osed

formats instead optimize comparison at the exp ense of input and output For

a sequence of op erations that inputs two relations and computes and outputs

the overlap favoring input and output more than exp ected the high resolu

tion format is more ecient with only tuples than the SQL format even

though the high resolution format has much greater range and smaller granular

ity citeDyresonC

Performing these op erations eciently in the presence of historical inde

terminacy is more challenging For the default range credibility and ordering

plausibility and for comparing events whose sets of p ossible chronons do not

overlap there is little overhead even when historical indeterminacy is supp orted

citeDyresonD The average worse case for comparison over all plausibil

ities when the sets of p ossible chronons overlap signcantly is less than

microseconds on a Sun or ab out the time to transfer a byte tuple from

disk

ard Implementation The imp ortance of ecient query op A Straightforw

timization and evaluation for temp oral databases was underscored by an initial

study that analyzed the p erformance of a bruteforce approach to adding time

supp ort to a conventional DBMS In this study the university Ingres DBMS

was extended in a minimal fashion to supp ort TQuel querying citeAhnB

The results were very discouraging for those who might have b een considering

such an approach Sequential scans as well as access metho ds such as hashing

and ISAM suered from rapid p erformance degradation due to evergrowing

overow chains Because adding time creates multiple tuple versions with the

same key reorganization do es not help to shorten overow chains The ob jective

of work in temp oral query evaluation then is to avoid lo oking at all of the data

b ecause the alternative implies that queries will continue to slow down as the

database accumulates facts

There were four basic resp onses to this challenge The rst was a prop osal to

separate the historical data which grew monotonically from the current data

whose size was fairly stable and whose accesses were more frequent citeLum

This separation termed temporal partitioning was shown to signicantly im

prove p erformance of some queries citeAhnB and was later generalized to

allow multiple cached states which further improves p erformance citeJensenG

Second new query optimization strategies were prop osed Sec Third new

join algorithms to b e discussed next were prop osed And nally new temp oral

indexes also to b e discussed were prop osed

Three kinds of temp oral joins have b een studied binary joins multiway Joins

joins and joins executed on multipro cessors

A wide variety of binary joins have b een considered including timejoin time

equijoin TEjoin citeCliordA eventjoin TEouterjoin citeGunadhi

containjoin containsemijoin intersectjoin citeLeungA and containsemijoin

citeLeung The various algorithms prop osed for these joins have generally

b een extensions to nested lo op or merge joins that exploit sort orders or lo cal

workspace

Leung argues that a checkp oint index Sec is useful when stream pro

cessing is employed to evaluate b oth twoway and multiway joins citeLeung

Finally Leung has explored in depth partitioning strategies and temp oral

query pro cessing on multipro cessors citeLeung

Temp oral Indexes Conventional indexes have long b een prop osed to reduce

the need to scan an entire relation to access a subset of its tuples Indices are

even more imp ortant in temp oral relations that grow monotonically in size In

table we summarize the temp oral index structures that have b een prop osed to

+

Trees citeComer which index date Most of the indexes are based on B

on values of a single key the remainder are based on RTrees citeGuttman

which index on ranges intervals of multiple keys There has b een considerable

discussion concerning the applicability of p ointbased schemes for indexing in

terval data Some argue that structures that explicitly accommo date intervals

such as RTrees and their variants are preferable others argue that mapping

intervals to their endp oints is ecient for spatial search citeLomet

If the structure requires that exactly one record with each key value exist

at any time or if the data records themselves are stored in the index then

it is designated a primary storage structure otherwise it can b e used either

as a primary storage structure or as a secondary index The checkp oint index is

asso ciated with a particular indexing condition making it suitable for use during

the pro cessing of queries consistent with that condition

A ma jority of the indexes are tailored to transaction time exploiting the

app endonly nature of such information Most utilize as a key the validtime or

transactiontime interval or p ossibly b oth in the case of the Mixed Media R

Tree Lums index do esnt include time at all rather it is a means of accessing

the history represented as a linked list of tuples of a key value The App endonly

+

Tree indexes the transactionstart time of the data and the LopSized B Tree

is most suited for indexing events such as bank transactions Ab out half the

indexes utilize only the timestamp as a key some include a single nontemp oral

attribute and the two based on RTrees can exploit its multidimensionality to

supp ort an arbitrary numb er of nontemp oral attributes Of the indexes supp ort

ing nontemp oral keys most treat such keys as a true separate dimension the

exceptions b eing the indexes discussed by Ahn which supp ort a single comp osite

key with the interval as a comp onent

While preliminary p erformance studies have b een carried out for each of these

indexes in isolation there has b een little eort to compare them concerning their

space and time eciency Such a comparison would have to consider the diering

abilities of each those supp orting no nontemp oral keys would b e useful for

doing temp oral cartesian pro ducts but p erhaps less useful for doing temp oral

joins that involved equality predicates on nontemp oral attributes as well as

various underlying distributions of time and nontemp oral keys the indexes

presume various nonuniform distributions to achieve their p erformance gains

over conventional indexes which generally assume a uniform key distribution

Stored Data Manager

We examine three topics storage structures including page layout concurrency

control and recovery Page layout for temp oral relations is more complicated

than conventional relations if nonrst normal form ie nonatomic attribute

values are adopted as is prop osed in many of the temp oral data mo dels listed in

Sec Often such attributes are stored as linked lists for example representing

a validtime element set of validtime chronons as a linked list of intervals Hsu

has develop ed an analytical mo del to determine the optimal blo ck size for such

linked lists citeHsuA

Many structures have b een prop osed including reverse chaining all history

versions for a key are linked in reverse order citeBenZviDadamLum

accession lists a blo ck of time values and asso ciated tuple ids b etween the

current store and the history store clustering storing history versions together

on a set of blo cks stacking storing a xed numb er of history versions and

cel lular chaining linking blo cks of clustered history versions with analytical

p erformance mo deling citeAhnC b eing used to compare their space and

time eciency citeAhnB

Several researchers have investigated adapting existing concurrency control

and transaction management techniques to supp ort transaction time The subtle

issues involved in cho osing whether to timestamp at the b eginning of a transac

tion which restricts the concurrency control metho d that can b e used or at the

end of the transaction which may require data earlier written by the transaction

to b e read again to record the transaction have b een resolved in favor of the lat

ter through some implementation tricks citeDadamStonebrakerDLometA

Non

Name Citation Based On Primary Temp oral Temp oral Temp oral

Secondary Dimensions Keys Keys

+

App endonly citeGunadhiA B Tree primary transaction event

Tree

+

Checkp oint citeLeung B Tree secondary transaction event

Index

+

LopSided citeKolovsonB B Tree b oth transaction event

+

B Tree

Monotonic citeElmasri Time Index b oth transaction interval

+

B Tree

+

citeLum B Tree or primary transaction none

Hashing

+

TimeSplit citeLomet B Tree primary transaction interval

BTree

Mixed Media citeKolovson RTree b oth transaction interval k ranges

RTree transvalid pairs of k 

intervals

+

Time Index citeElmasri B Tree b oth b oth interval

+

Twolevel Combined citeElmasri B Tree b oth b oth interval

AttributeTime Time Index

Index

+

citeAhnB B Tree various various interval

Hashing

SRTree citeKolovson Segment Index b oth b oth interval k ranges

RTree pairs of k 

intervals

Table Temp oral Indexes

The Postgres system is an impressive prototyp e DBMS that supp orts transac

tion time citeStonebrakerB Timestamping in a distributed setting has also

b een considered citeLometA Integrating temp oral indexes with concur

rency control to increase the available concurrency has b een studied citeLometB

Finally since a transactiontime database contains all past versions of the

database it can b e used to recover from media failures that cause a p ortion or

all of the current version to b e lost citeLomet

Conclusion

We conclude with a list of accomplishments a list of disapp ointments and a

p ointer to future work

There have b een many signicant accomplishments over the past fteen years

of temp oral database research

The semantics of the time domain including its structure dimensionality

and indeterminacy is wellundersto o d

Representational issues of timestamps have recently b een resolved

Op erations on timestamps are now wellundersto o d and ecient implemen

tations exist

A great amount of research has b een exp ended on temp oral data mo dels

addressing this extraordinarily complex and subtle design problem

Many temp oral query languages have b een prop osed The numerous typ es of

temp oral queries are fairly wellundersto o d Half of the prop osed temp oral

query languages have a strong formal basis

Temp oral joins are wellundersto o d and a multitude of implementations

exist

Approximately a dozen temp oral index structures have b een prop osed

The interaction b etween transaction time supp ort and concurrency control

and transaction management has b een studied to some depth

Several prototyp e temp oral DBMS implementations have b een develop ed

There have also b een some disapp ointments

The userdened time supp ort in the SQL standard is p o orly designed

The representation sp ecied in that standard suers from inadequate range

excessive space requirements and inecient op erations

There has b een almost no work done in comparing the two dozen temp oral

data mo dels to identify common features and dene a consensus data mo del

up on which future research and commercialization may b e based It is our

feeling that exp ecting a data mo del to simultaneously express timevarying

semantics while optimizing data presentation data storage and query evalu

ation is unrealistic We advo cate a twotiered data mo del with a conceptual

data mo del expressing the semantics and with several representational data

mo dels serving these other ob jectives

There is also a need to consolidate approaches to temp oral query languages

identify the b est features of each of the prop osed languages and incorp orate

these features into a consensus query language that could serve as the basis

for future research into query optimization and evaluation Also more work

is needed on adding time to socalled fourth generation languages that are

revolutionizing user interfaces for commercially available DBMSs

It has b een demonstrated that a straightforward implementation of a tem

p oral DBMS will exhibit p o or p erformance

More empirical studies are needed to compare join algorithms and to p os

sibly suggest even more ecient variants

While there are a host of individual approaches to isolated p ortions of a

DBMS no coherent architecture has arisen While the analysis given in Sec

may b e viewed as a starting p oint much more work is needed to integrate

these approaches into a cohesive structure

There has b een little eort to compare the relative p erformance of temp oral indexes making selection in sp ecic situations dicult or imp ossible

Temp oral database design is still in its infancy hindered by the plethora of

temp oral data mo dels

There are as yet no prominent commercial temp oral DBMSs despite the

obvious need in the marketplace

Obviously these disapp ointments should b e addressed In addition future work

is also needed on adding time to the newer data mo dels that are gaining recogni

tion including ob jectoriented data mo dels and deductive data mo dels Finally

there is a great need for integration of spatial and temp oral data mo dels query

languages and implementation techniques

Acknowledgements

This work was supp orted in part by NSF grant ISI James Cliord was

helpful in understanding structural asp ects of mo dels of time Curtis Dyreson

Christian S Jensen Nick Kline and Michael So o provided useful comments on

a previous draft

Bibliography bib

Table of Contents

tro duction In

The Time Domain

Structure

Dimensional i ty

Indeterminacy

Representation

Interpretation

Physical Realization

Time Asso ciating Facts with

Underlying Data Mo del

Attribute Variability

Representational Alternatives

Data Mo dels

Valid Time

Transaction Time

Attribute Value Structure

Separating Semantics from Representation

Querying

Language Prop osals

Typ es of Temp oral Queries

Schema Denition

Quel Retrieval Statements

Rollback Transactiontime Slice

Validtime Selection

Valid Time Pro jection

Aggregates

Historical Indeterminacy

Schema Evolution

Standards

Architecture System

DDL Statements

System Catalog

Query Pro cessing

Lo cal Query Optimization

Global Query Optimization

Query Evaluation

Domain Op erations

A Straightforward Implementation

Joins

Temp oral Indexes

Stored Data Manager

Conclusion

Acknowledgements

Bibliography

a

This article was pro cessed using the L T X macro package with LLNCS style E