PREPRINT Please dont distribute

To app ear in Vittorio Castelli and Lawrence Bergman eds

Image Search and Retrieval of Digital Imagery

John Wiley and Sons

Database Supp ort for Multimedia Applications

Michael OrtegaBinderb erger Kaushik Chakrabarti

University of Illinois at UrbanaChampaign

Sharad Mehrotra

University of California at Irvine

August

Intro duction

Advances in high p erformance communication and storage technologies as well as

emerging largescale multimedia applications have made the design and development of multi

media information systems one of the most challenging and imp ortant directions of research and

development within computer science The payos of a multimedia infrastructure are tremendous

it enables many multibilli on dollarayear application areas Examples are medical information sys

tems electronic commerce digital libraries like multimedia data rep ositories for training educa

tion broadcast and entertainment sp ecial purp ose databases such as facengerprint databases

for security and geographical information systems storing satellite images maps etc

An integral comp onent of the multimedia infrastructure is a multimedia management

system Such a system supp orts mechanisms to extract and represent the content of multimedia

ob jects provides ecient storage of the content in the database supp orts contentbased queries

over multimedia ob jects and provides a seamless integration of the multimedia ob jects with the

traditional information stored in existing databases A multimedia database system consists of

multiple comp onents which provide the following functionalities

Multimedia Ob ject Representation techniquesmo dels to succinctly represent b oth

structure and content of multimedia ob jects in databases

Content Extraction mechanisms to automaticallysemiautomatically extract meaningful

features that capture the content of multimedia ob jects and that can b e indexed to supp ort

retrieval

Multimedia Information Retrieval techniques to match and retrieve multimedia ob jects

based on the similarity of their representation ie similaritybased retrieval

Multimedia Database Management extensions to data management technologies of in

dexing and query pro cessing to eectively supp ort ecient contentbased retrieval in database

management systems

Many of the ab ove issues have b een extensively addressed in other chapters of this b o ok Our

fo cus in this chapter is on how contentbased retrieval of multimedia ob jects can b e integrated into

database management systems as a primary access mechanism In this context we rst explore the

PREPRINT Please dont distribute

supp ort provided by existing ob jectoriented and ob jectrelational systems for building multimedia

applications We then identify limitations of existing systems in supp orting contentbased retrieval

and summarize approaches prop osed in the literature to address these limitations We b elieve

that this research will culminate in improved data management pro ducts that supp ort multimedia

ob jects as rstclass ob jects capable of b eing eciently stored and retrieved based on their

internal content

The rest of the chapter is organized as follows In Section we describ e a simple mo del

for contentbased retrieval of multimedia ob jects which is widely implemented and commonly

supp orted by commercial vendors We use this mo del throughout the chapter to explain the issues

that arise in integrating contentbased retrieval into database management systems DBMSs In

Section we explore how the evolution of relational databases into ob jectoriented and ob ject

relational systems which supp ort complex data typ es and userdened functions facilitates building

multimedia applications We apply the analysis framework of Section to the Oracle the

Informix and the IBM DB database systems in Section The chapter then identies limitations of

existing stateoftheart data management systems from the p ersp ective of supp orting multimedia

applications Finally Section outlines a set of research issues and approaches that we b elieve

are crucial for the development of database technology providing seamless supp ort for complex

multimedia information

A Mo del for ContentBased Retrieval

Traditionally contentbased retrieval from multimedia databases was supp orted by describing mul

timedia ob jects with textual annotations Textual information retrieval tech

niques were then used to search for multimedia information indirectly using the

annotations Such a textbased approach suers from numerous limitations including the imp ossi

bility of scaling it to large data sets due to the high degree of manual eort required to pro duce

the annotations the diculty of expressing visual content eg texturepatterns or shap e in an

image using textual annotations and the sub jectivity of manually generated annotations

To overcome several of these limitations a visual featurebased approach has emerged as a

promising alternative as is evidenced by several prototyp e and commercial systems

In a visual featurebased approach a multimedia ob ject is represented using visual

prop erties for example a digital photograph may b e represented using color texture shap e and

textual features Typically a user formulates a query by providing examples and the system

returns the most similar ob jects in the database The retrieval consists of ranking the similarity

b etween the featurespace representations of the query and of the images in the database The

query pro cess can therefore b e describ ed by dening the mo dels for ob jects queries and retrieval

Ob ject Mo del

A multimedia ob ject is represented as a collection of extracted features Each feature may have

multiple representations capturing it from dierent p ersp ectives For instance the color his

togram descriptor represents the color distribution in an image using value counts while the

color moments descriptor represents the color distribution in an image using statistical pa

rameters eg mean variance and skewness Asso ciated with each representation is a similarity

function that determines the similarity b etween two descriptor values Dierent representations

capture the same feature from dierent p ersp ectives The simultaneous use of dierent represen

tations often improves retrieval eectiveness but it also increases the dimensionality of the

PREPRINT Please dont distribute

Query O ith Ob ject

i

H

H W imp ortance of the ith

i

H W W

1 2

H Ob ject relative to

H

Hj the other query Ob jects

O O W imp ortance of Feature j

i;j

1 2

of Ob ject i relative to

W W W W Feature j of other Ob jects

11 12 21 22

R R R Representation of

i;j

Feature j of Ob ject i

R R R R

11 12 21 22

Figure Query Mo del

search space which reduces retrieval eciency and has the p otential for intro ducing redundancy

which can negatively aect eectiveness

Each feature space eg a color histogram space can b e viewed as a multidimensional space

in which a feature vector representing an ob ject corresp onds to a p oint A metric on the feature

space can b e used to dene the dissimilarity b etween the corresp onding feature vectors Distance

1

values are then converted to similarity values Two p opular conversion formulae are s d and

2

d

where s and d resp ectively denote similarity and distance With the rst formula s exp

2

if d is measured using the Euclidean distance function s b ecomes the cosine similarity b etween

the vectors while if d is measured using the Manhattan distance function s b ecomes the histogram

intersection similarity b etween them While cosine similarity is widely used in keywordbased

do cument retrieval histogramintersection similarity is common for color histograms A numb er of

image features and feature matching functions are further describ ed in Chapters to

Query Mo del

The query mo del sp ecies how a query is constructed and structured Much like multimedia

ob jects a query is also represented as a collection of features One dierence is that a user may

simultaneosly use multiple exampleob jects in which case the query can b e represented in either

of the following two ways

Featurebased representation The query is represented as a collection of features Each

feature contains a collection of feature representations with multiple values The values

corresp ond to the feature descriptors of the ob jects

Ob jectbased representation A query is represented as a collection of ob jects and each

ob ject consists of a collection of feature descriptors

In either case each comp onent of a query is asso ciated with a weight indicating its relative

imp ortance

Figure shows a structure of a query tree in an ob jectbased mo del In the gure the query

structure consists of multiple ob jects O and each ob ject is represented as a collection of multiple

i

feature values R

ij

1

The conversion formula assumes that the space is normalized to guarantee that the maximum distance b etween

p oints is equal to

PREPRINT Please dont distribute

Retrieval Mo del

The retrieval mo del determines the similarity b etween a query tree and ob jects in the database

The leaf level of the tree corresp onds to feature representations A similarity function sp ecic

to a given representation is used to evaluate the similarity b etween a leaf no de R and the

ij

corresp onding feature representation of the ob jects in the database Assume for example that

the leaf no des of a query tree corresp ond to two dierent color representations color histogram

and color moments While histogram intersection may b e used to evaluate the similarity

b etween the color histogram of an ob ject and that of the query the weighted Euclidean distance

metric may b e used to compute the similarity b etween the color moments descriptor of an ob ject

and that of the query The matching or retrieval pro cess at the feature representation level

pro duces one ranked list of results for each leaf of the query tree These ranked lists are combined

using another function to generate a ranked list describing the match results at the parent no de

Dierent functions may b e used to merge ranked lists at dierent no des of the query tree resulting

in dierent retrieval mo dels A common technique used is the weighted summation model Let a

no de N in the query tree have children N to N The similarity of an ob ject O in the database

i i1 in

with no de N represented as simil ar ity is computed as

i i

n

X

w simil ar ity where simil ar ity

ij ij i

j =1

n

X

w

ij

j =1

and simil ar ity is the measure of similarity of the ob ject with the j th child of no de N

ij i

Many other retrieval mo dels to generate overall similarity b etween an ob ject and a query have

b een explored in the literature For example in a Bo olean mo del suitably extended with fuzzy

and probabilistic interpretations is used to combine ranked lists A Bo olean op erator AND

OR NOT is asso ciated with each no de of the query tree and the similarity is interpreted

as a fuzzy value or a probability and combined with suitable merge functions Desirable prop erties

of such merge functions are studied by Fagin and Wimmers in

Extensions

In the previous section we have describ ed a simple mo del for contentbased retrieval that will

serve as the base reference in the remainder of the chapter Many extensions are p ossible and have

b een prop osed in the literature For example we have implicitly assumed that the user provides

appropriate weights for no des at each level of the query tree reecting the imp ortance of a given

featureno de to the users information need In practice however it is dicult for a user to

sp ecify the precise weights An approach followed in some research prototyp es ie MARS

MindReader is to learn these weights automatically using the pro cess of relevance feedback

Relevance feedback is used to mo dify the query representation by altering the weights

and structure of the query tree to b etter reect the users sub jective information need

Another limitation of our reference mo del is that it fo cuses on representation and contentbased

retrieval of images it has limited ability to represent structural spatial or temp oral prop erties

of general multimedia ob jects ie multiple synchronized audio and video streams and to mo del

retrieval based on these prop erties Even in the context of image retrieval the mo del describ ed

needs to b e appropriately extended to supp ort a more structured retrieval based on lo calregion

PREPRINT Please dont distribute

based prop erties Retrieval based on lo cal regionsp ecic prop erties and the spatial relationships

b etween the regions has b een studied in many prototyp es including

Overview of Current Database Technology

In this section we explore how multimedia applications requiring contentbased retrieval can b e

built using existing commercial data management systems Traditionally relational database tech

nology has b een geared towards business applications where data is largely in tabular form with

simple atomic attributes Relational systems usually supp ort only a handful of data typ es a

2

numeric typ e with its usual variations in precision a text typ e with some variations in the as

3

sumptions ab out the storage space available some temp oral data typ es such as date and time

4

with some variations Providing supp ort for multimedia ob jects in relational database systems

p oses many challenges First in contrast to the limited storage requirements of traditional data

typ es multimedia data such as images video and audio are quite voluminous a single record

may span several pages One alternative is to store the multimedia data in les outside of the

DBMS control with only pointers or references to the multimedia ob ject stored in the DBMS This

approach has numerous limitations since it makes the task of optimizing access to data dicult

and furthermore prevents DBMS access control over multimedia typ es An alternative solution is

to store the multimedia data in databases as binary large objects BLOBs which are supp orted

by almost all commercial systems BLOB is a data typ e used for data that do es not t into one of

the standard categories b ecause of its large size or its widely variable length or b ecause the only

needed op eration is storage rather than interpretation analysis or manipulation

While mo dern databases provide eective mechanisms to store very large multimedia ob jects

in a BLOB BLOBs are uninterpreted sequences of bytes which cannot represent the rich internal

structure of multimedia data Such a structure can b e represented in a DBMS using the supp ort for

userdened abstract data typ es ADTs oered by mo dern ob jectoriented and ob jectrelational

databases Such systems also provide supp ort for userdened functions UDFs or metho ds which

can b e used to implement similarity retrieval for multimedia typ es Similarity mo dels implemented

as UDFs can b e called from within SQL allowing contentbased retrieval to b e seamlessly integrated

into the database query language In the remainder of this section we discuss the supp ort for ADTs

UDF and BLOBs in mo dern databases that provides the core technology for building multimedia

database applications

UserDened Abstract Data Typ es

The basic relational mo del requires tables to b e in the rst normal form where every attribute

is atomic This p oses serious limitations in supp orting applications that deal with ob jectsdata

typ es with rich internal structure The only recourse is to translate b etween the complex structure

of the applications and the relational mo del every time an ob ject is read or written This results

in extensive overhead making the relational approach unsuitable for advanced applications that

require supp ort for complex data typ es

2

Typically numeric data can b e of integral typ e fractional data such as oating p oint in various precisions and

sp ecialized money typ es such as packed decimal that retained high precision for detailed money transactions

3

Notably the char data typ e sp ecies a maximum length of a character string and this space is always reserved

Varchar data in contrast o ccupies only the needed space for the stored character string and also has a maximum

length

4

Variations of temp oral data typ es include time date datetime sometimes with a precision sp ecication such as

year down to hours timestamp used to mark a sp ecic time for an event and interval to indicate the length of time

PREPRINT Please dont distribute

These limitations of relational systems have resulted in much research and commercial develop

ment to extend the database functionality with rich userdened data typ es in order to accommo

date the needs of advanced applications Research in extending the relational database technology

has pro ceeded along two parallel directions

The rst approach referred to as the ob jectoriented database OODBMS approach attempts

to enrich ob jectoriented languages such as C and Smalltalk with the desirable features of

databases such as concurrency control recovery and security while retaining supp ort for the rich

data typ es and semantics of ob jectoriented languages Examples of systems that have followed this

approach include research prototyp es such as and a numb er of commercial pro ducts

The ob jectrelational database ORDBMS systems on the other hand approach the problem

of adding additional data typ es by extending the existing relational mo del with the fullblown typ e

hierarchy of ob jectoriented languages The key observation was that the concept of domain of

an attribute need not b e restricted to simple data typ es Given its foundation in the relational

mo del the ORDBMS approach can b e considered a less radical evolution than the OODBMS

approach The ORDBMS approach pro duced such research prototyp es as Postgres and

Starburst and commercial pro ducts such as Illustra The ORDBMS technology has

now b een embraced by all ma jor vendors including Informix IBM DB Oracle

Sybase and UniSQL among others The ORDBMS mo del has b een incorp orated in the

SQL standards

While OODBMSs provide the full p ower of an ob jectoriented language they have lost ground

to ORDBMSs Interested readers are referred to for go o d insight from b oth a technical and

commercial p ersp ective into reasons for this development In the remainder of this chapter we will

concentrate on the ORDBMS approach

The ob jectrelational mo del retains relational mo del concepts of tables and columns in tables

Besides the basic typ es it provides for additional userdened abstract data typ es ADTs as well

as collections of basic and userdened typ es The functions that op erate on these ADTs known

as UserDened Functions UDFs are written by the user and are equivalent to methods in the

ob jectoriented context In the ob jectrelational mo del the elds of a table may corresp ond to basic

DBMS data typ es to other ADTs or can even just contain storage space whose interpretation is

entirely left to the userdened metho ds for the typ e The following example illustrates how a

user may create an ADT and include it in a table denition

create type ImageInfoType date varchar

location latitude real

longitude real location

create table SurveyPhotos photo id integer primary key not null

photographer varchar not null

photo location ImageInfoType not null

photo blob not null

The typ e ImageInfoType denes a structure to store the lo cation at which a photograph was taken

together with the date stored as a string This can b e useful for nature survey applications where

a biologist may wish to attach a geographic lo cation and a date to a photograph This abstract

data typ e is then used to create a table with an id for the photograph the photographers name

the photograph itself stored as a BLOB and the lo cation and date when it was taken

ORDBMSs extend the basic SQL language to allow userdened functions once they are com

piled and registered with the DBMS to b e called directly from within SQL queries thereby pro

viding a natural mechanism to develop domainsp ecic extensions to databases The following

example shows a sample query that calls a userdened function on the typ e declared ab ove

PREPRINT Please dont distribute

to grayscalephoto select photographer convert

from SurveyPhotos

where within distancephoto location

This query returns the photographer and a grayscale version of the image stored in the table The

distance UDF is a predicate that returns true if the place where the image was shot is within within

mile of the given lo cation This UDF ignores the date the picture was taken demonstrating how

predicates are free to implement any semantically signicant prop erties of an application Note

that the UDF convert to grayscale to convert the image to grayscale is not a predicate since it is

applied to an attribute in the select clause and returns a grayscale image

ADTs also provide for typ e inheritance and as a consequence p olymorphism This intro duces

some problems in the storage of ADTs as existing storage mangers assume that all rows in a table

share the same structure Several strategies have b een develop ed to cop e with this problem

including dynamic interpretation and using distinct physical tables for each p ossible typ e of a

larger logical table Section contains more details on this topic

Binary Large Ob jects

As mentioned previously binary large ob jects BLOBs are used for data that do es not t into any

of the conventional data typ es supp orted by a DBMS BLOBs are used as a data typ e for ob jects

that are either large have wildly varying size cannot b e represented by a traditional data typ e

5

or whose data might b e corrupted by character table translation Two main characteristics set

BLOBs apart from other data typ es they are stored separately from the record and their data

typ e is just a string of bytes

BLOBs are stored separately due to their size if placed inline with the record they could

span multiple pages and hence intro duce loss of clustering in the table storage Furthermore

applications may frequently only cho ose to access other attributes and not BLOBs or access

BLOBs selectively based on other attributes Indeed BLOBs have a dierent access pattern than

other attributes As observed in it is unreasonable to assume that applications will read

andor up date all the bytes b elonging to a BLOB at once It is more reasonable to assume that

only p ortions or substrings byte or bit will b e read or up dated during individual op erations To

cop e with such an access pattern many DBMSs distinguish b etween two typ es of BLOBs

regular BLOBs in which the application receives the whole data in a host variable all at once

and

smart BLOBs in which the application receives a handle and uses it to read from the BLOB

using the wellknown le system interfaces open close read write and seek This allows

negrained access to small parts of the BLOB

Besides the ab ove two mechanisms to deliver BLOBs from the database to applications that is

either via whole chunks or via a le interface a third option of a streaming interface is also p ossible

Such an interface is imp ortant for guaranteeing timely delivery of continuous media ob jects such

5

Most DBMSs supp ort data typ es that could b e used to store ob jects of miscellaneous typ es For example a

small image icon can b e represented using a varchar typ e The icon would b e stored inline with the record instead

of separately as would b e the case if the image icon is stored as a BLOB Even though there may b e p erformance

b enets from storing the icon inline say it is very frequently accessed it may still not b e desirable to store it

as a varchar since the icon may get corrupted in transmission and interpretation across dierent hardware due to

the dierences in character set representation across dierent machines Such data typ es sensitive to character translation should b e stored as BLOBs

PREPRINT Please dont distribute

as audio or video Currently to the b est of our knowledge no DBMS oers a streaming interface to

BLOBs Continuous media ob jects are stored outside the DBMSs in sp ecialized storage servers

and accessed from applications directly and not through a database interface This may however

change with the increasing imp ortance of continuous media data in enterprise computing

BLOBs present an additional challenge during query pro cessing Unless a BLOB is part of

a query predicate it is b est to avoid the inclusion of the corresp onding column during query

pro cessing since it saves an extra le access during pro cessing and more imp ortantly since BLOBs

due to their size tend to thrash the database buers used for query pro cessing For this reason

BLOB handles are often used and when the user requests the BLOB content separate database

buers are used to complete this transfer

For access control purp oses BLOBs are treated as a single atomic eld in a record Large

BLOBs could in principle b e shared by multiple users but the most ne grained lo cking unit

in current databases is a tuple or row lo ck which simultaneously lo cks all the elds inside the

tuple including the BLOBs Some of the SQL extensions needed to supp ort parallel op erations

from applications into database systems are discussed in

Supp ort for Extensible Indexing

While userdened ADTs and UDFs provide adequate mo deling p ower to implement advanced

applications with complex data typ es the existing access metho ds that supp ort the traditional

relational mo del ie Btree and hashing may not provide ecient retrieval of these data typ es

Consider for example a data typ e corresp onding to the geographical lo cation of an ob ject A

spatial data structure such as an Rtree or a grid le might provide a much more ecient

retrieval of ob jects based on spatial lo cation than a collection of Btrees each indexing a separate

spatial dimensions Access metho ds that exploit the semantics of the data typ e may reduce the cost

of retrieval As discussed in Chapters and this is certainly true for multimedia typ es such

as images where features ie color texture and shap e used to mo del image content corresp ond

to highdimensional feature spaces Retrieval of multimedia ob jects based on similarity in these

feature spaces cannot b e adequately supp orted using Btrees or for that matter common multi

dimensional data structures such as Rtree and region quadtree that are currently supp orted by

certain commercial DBMSs Sp ecialized access metho ds see Chapter need to b e incorp orated

into the DBMS to supp ort ecient contentbased retrieval of multimedia ob jects

Commercial ORDBMS vendors supp ort extensible access metho ds since it is not

feasible to provide native supp ort for all p ossible typ esp ecic indexing mechanisms These typ e

sp ecic access metho ds can then b e used by the query pro cessor to access data that is implement

typ e sp ecic UDFs eciently While these systems supp ort extensibility at the level of access

metho ds the interface exp orted for this purp ose is at a fairly low level and requires that access

metho d implementors write their own co de to pack records into pages maintain links b etween pages

handle physical consistency as well as concurrency control for the access metho d etc This makes

access metho d integration a daunting task Other cleaner approaches to adding new typ esp ecic

access metho ds are currently a topic of active research and will b e discussed in Section

Integrating External Data Sources

Many data sources are external to database systems therefore it is imp ortant to extend querying

capabilities to such data This can b e accomplished by providing a relational interface to external

data and making it lo ok like tables or by storing external data in the database while maintaining

an external interface for traditional applications to access the data These two approaches are

PREPRINT Please dont distribute

discussed next in more detail

External data can b e made to app ear as an internal database table by registering userdened

functions that access resources external to the database server even including remote services such

as search engines remote servers etc For example Informix has extended its Universal Server to

oer the capability of Virtual Tables VTI in which the user denes a set of functions designed

to access an external entity and make it app ear to b e a normal relational table suitable for searching

and up dating Similarly DB uses table functions and sp ecial SQL TABLE op erators to simulate

the existence of an internal table The primary aim of the table functions is to access external

search engines to assist DB in computing the answers for a query A detailed discussion of their

supp ort is found in

Another approach to integrate external data is based on the realization that much unstructured

data up to resides outside of DBMSs This led several vendors to develop a way to extend

their database oerings to incorp orate such external data into the database while maintaining its

current functional characteristics intact IBM develop ed an extension to their DB database named

Datalinks in which a DBMS table can contain a column which is an external le This le is

accessible by the table it logically resides in and through the traditional le system interface Users

have the illusion of interacting with a le system with traditional le system commands while the

data is stored under DBMS control In this way traditional applications can still access their

data les without restrictions and enjoy the recovery and protection b enets of the DBMS This

functionality implies protection against data corruption

Similarly the Oracle Internet File System addresses the same problem by mo difying

the le system to store les in database tables as BLOBs The Oracle Internet File System is of

interest here b ecause it allows normal users including web servers to access images through le

system interfaces while retaining all DBMS advantages

These advantages translate into small changes to existing delivery infrastructure such as web

servers and text pro cessing programs while retaining advanced functionality including searching

storage management and scalability

Commercial Extensions to DBMSs

We have discussed the evolution of the traditional relational mo del to mo dern extensible database

technology that supp orts userdened abstract data typ es and functions and the ability to call

such functions from SQL These extensions provide a p owerful mechanism for thirdparty vendors to

develop domainsp ecic extensions to the basic data management system Such extensions are called

Datablades in Informix Data Cartridges in Oracle and Extenders in DB Many Datablades are

commercially available for the Informix Universal Server some of which are shipp ed as standard

packages while others can b e purchased separately Example Datablades include the Geodetic

datablade that supp orts all the imp ortant data typ es and functions for geospatial applications

and includes an Rtree implementation for indexing spatiotemp oral data typ es Other Datablades

available are the Timeseries Datablade for timevarying numeric data such as sto cks the Web

Datablade that provides a tight coupling b etween the database server and a web server and a Video

Foundation Datablade to handle video les among others Similar Cartridges and Extenders are

also available for Oracle and DB resp ectively

Besides commercially available DatabladesCartridgesExtenders users can develop their own

domainsp ecic extensions For this purp ose each DBMS supp orts an API that a programmer

must conform to in developing the extensions Details of the API oered by Informix can b e found

in The API supp orted by Oracle referred to as the Oracle Data Cartridge Interface ODCI is discussed in

PREPRINT Please dont distribute

While each of the dierent systems that is Informix Oracle and DB supp ort the notion

of extensibility they dier somewhat in the degree of control and protection oered Informix

supp orts extensibility at a low level with very negrained access to the database server There

are a considerable numb er of ho oks into the server to customize many asp ects of query pro cessing

6

For example for predicates involving userdened functions over userdened typ es the predicate

functions have access to the conditions in the where clause itself This level of access allows for

very exible functionality and sp eed at a certain cost in safety Informix relies on the develop ers

of Datablades to follow their proto col closely and not do any damage Another feature oered by

the Informix Datablade API is allowing UDFs to acquire and maintain memory across multiple

invo cations Memory is released by the server based on the duration sp ecied by the data typ e

ie transaction duration query duration etc Such a feature simplies the task of implementing

certain userdened functions ie userlevel aggregation and grouping op erators

While Informix oers a p otentially more p owerful mo del for extensibility IBM DB is the only

system that isolates the server from faults in UDFs by allowing the execution of UDFs in their own

separate address space in addition to the server address space With this negrained fault

containment errors in UDFs will not bring the database server oline

Image Retrieval Extensions to Commercial DBMSs

In this section we discuss the image retrieval extensions available in commercial systems We

sp ecically explore the image retrieval technologies supp orted by Informix Universal Server Oracle

and IBM DB pro ducts These pro ducts oer a wide variety of desirable features designed to

provide integrated image retrieval in databases We illustrate some of the functionalities oered by

discussing how applications requiring image retrieval can b e built in these systems While other

vendors supp ort a subset of the desired technologies none integrate them to the same degree

resulting in a large eort on the part of customers wishing to create multimedia applications

To demonstrate how image retrieval applications can b e built using database extensions for

commercial DBMSs we will use a very simple example of a digital catalog of color pictures In this

application a collection of pictures is stored into a table For each picture the photographer and

date are stored into a table The basic table schema is

photo id an integer numb er to identify the item in the catalog

photographer a character string with the photographers name

date the date the picture was taken for simplicity we will use a character string instead of

a date datatyp e

photo the photo image and its necessary features for retrieval

The implementation of the photo attribute changes dep ending on the pro duct and is describ ed in

the following subsections In addition to these attributes any additional attributes tables and

steps necessary to store such a catalog in the database and execute contentbased retrieval queries

will b e illustrated in the subsections b elow corresp onding to the three systems discussed

Informix Image Retrieval Datablade

The Informix system includes a complete media asset management suite called Informix Me

diaTM to manage digital content in a central rep ository The pro duct is designed to

6 These are sp ecial userdened functions declared as op erators

PREPRINT Please dont distribute

handle any media typ e including images maps blueprints audio and video and is extensible to

supp ort additional media typ es It manages the entire life cycle of media ob jects from pro duction

to delivery to archiving including access control and rights management The pro duct is integrated

with image video and audio catalogers and image video keyframe and audio contentbased search

functionality This suite includes asset management software and a numb er of contentsp ecic Dat

ablades to tackle datatyp esp ecic needs The Excalibur Visual Retrievalware Datablade is

one such typ esp ecic Datablade that manages the storage transco ding and contentbased search

of images The image Datablade is also used for video keyframe search Image retrieval based on

color texture shap e brightness layout color structure and image asp ect ratio is supp orted Color

refers to the global color content of the image ie regardless of its lo cation Texture seeks to

distinguish such prop erties as smo othness or graininess of a surface Shap e seeks to express the

shap e of ob jects in an image for example a ballo on is a circular shap e Brightness layout captures

the relative energy in an image based on its lo cation in the image and similarly color structure

seeks to lo calize the color prop erties to regions of the image

For each image in the database a similarity score is computed to determine the degree to

which that image satises the query All featuretofeature matches are weighted with usersupplied

weights and combined into a nal score Only those images with a score ab ove a given similarity

threshold are returned to the user and the remaining images are deemed not relevant to the query

The Datablade supp orts datatyp es to store images and their image feature vectors Feature vectors

combine all the feature representations supp orted into a single attribute for the whole image

Therefore no subimage or region searching is p ossible

In order to build an image retrieval application using the image datablade in Informix the

following tasks must b e p erformed

Install Informix with the Universal Data Option and the Excalibur Visual Retrievalware

Datablade pro duct then congure the necessary table and index storage space in the server

Create a Database to store all tables and auxiliary data needed for our example We will call

this the Gal lery database

CREATE DATABASE Gallery

Create a table with the desired elds of which two are for the image retrieval Following our

example this statement creates such a table

collection CREATE TABLE photo

photo id integer primary key not null

photographer varchar not null

date varchar not null

photo IfdImgDesc not null

fv IfdFeatVect

The photo eld stores the image descriptor and the fv eld stores the feature vector for the

image which will b e used for contentbased search

Insert data into the table with all the values except for the fv eld which will b e lled

elsewhere

INSERT INTO photo collection photo id photographer date photo VALUES

Ansel Adams

IfdImgDescFromFiletmpjpg

PREPRINT Please dont distribute

Notice that the feature vector attribute was not sp ecied and thus retains a value of NULL

More photo collection entries can b e added using this metho d

At a later time the features are extracted to p opulate the fv attribute in the table

collection UPDATE photo

SET fv GetFeatureVectorphoto

WHERE fv IS NULL

This command sets the feature vector attribute for tuples where the features have not yet

b een extracted ie where the fv attribute is NULL The features are extracted from each

photo with the GetFeatureVector userdened function that is part of the Datablade Manu

ally extracting the feature information and up dating it in the table is desirable if many images

are loaded quickly and feature extraction can b e p erformed at a later time An alternative to

manual feature extraction is to automatically extract the features when each tuple is inserted

or up dated To accomplish this a database trigger can b e created that will automatically

execute the ab ove statement whenever there is an up date to the tuple

Once the Images are loaded and the features extracted the Resembles function is used to retrieve

those images similar to a given image The Resembles function accepts a numb er of parameters

The database image and query feature vectors to b e compared

A real numb er b etween and that is a cut o threshold in the similarity score Only images

that match with a score higher than the threshold are returned We refer to such a cuto as

the alpha cut value

A weighting value for each of the features used The weights do not have to add up to any

particular value but taken together they cannot exceed Weights are relative so the

weights and are equivalent

An output variable that contains the returned match score value

Query the photo col lection table with an exampleimage

The user provides an image feature vector as a query template This feature vector can either

b e stored in the table or corresp ond to an external image Using a feature vector for an

image already in the table requires a self join to identify the query feature vector A feature

vector for an external image requires calling the GetFeatureVector userdened function

The rst example uses an image already in the table the one with image id as the query

image

id score SELECT gphoto

FROM photo collection g photo collection s

WHERE

sphoto id

AND

Resemblesgfv sfv score REAL

ORDER BY score

The Resembles function takes two extracted feature vectors here gfv and sfv computes a

similarity score and compares it to the indicated threshold In this example the threshold is

which means all images will b e returned to the user Following the threshold six values

in the argument list identify the weights for each of the features Here only the rst three

PREPRINT Please dont distribute

features color shap e and texture are used while the remaining three are unused their

weights are set to The last parameter is an output variable named score of typ e REAL

which contains the similarity score for the image match b etween the query feature vector sfv

and the images stored in the table The score is then used to sort the result vectors to provide

a ranked output

The next example uses an external image as the query image with all features used for

matching and a nonzero threshold sp ecied

SELECT photo id score

FROM photo collection

WHERE Resemblesfv

GetFeatureVectorIfdImgDescFromFiletmpjpg

score REAL

ORDER BY score

Note how the features are extracted insitu by the GetFeatureVector function and passed to

the Resembles function to compute the score b etween each image and the query image In

this query only those images with a match score greater than will b e returned

DB UDB Image Extender

IBM oers a full content management suite that like the Media Asset Management Suite of In

formix provides a range of content administration delivery privilege management protection and

other services The IBM Content Manager pro duct has evolved over a numb er of years incorp o

rating technology from several sources including OnDemand DB ImagePlus and

VideoCharger The early fo cus of these pro ducts was to provide integrated storage for and access to

data of diverse typ es ie scanned handwritten notes images etc These pro ducts however only

provided search based on metadata For example searching was supp orted on manually entered

attributes asso ciated with each digitized image but not on the image itself This however changed

7

with the conversion of the IBM QBIC prototyp e image retrieval system into a DB Extender DB

now oers integrated image search from within the database via the DB UDB Image Extender

which supp orts several color and texture feature representations

In order to build the image retrieval application using the Image Extender the following tasks

need to b e p erformed

Install DB and the Image Extender and congure the necessary storage space for the server

This installs a numb er of extender supplied userdened distinct typ es and functions

Create a Database to store all tables and auxiliary data needed for our example We will call

this the Gal lery database

CREATE DATABASE Gallery

7

QBIC standing for Query By Image Content was the rst commercial contentbased Image Retrieval system

and was initiall y develop ed as an IBM research prototyp e Its system framework and techniques had profound

eects on later Image Retrieval systems QBIC supp orts queries based on example images userconstructed sketches

and drawings and selected color and texture patterns The color features used in QBIC are the average RGB

YiqLab and MTM Mathematical Transform to Munsell co ordinates and a k element Color Histogram Its

texture feature is an improved version of the Tamura texture representation ie combinations of coarseness

contrast and directionali ty Its shap e feature consists of shap e area circularity eccentricity ma jor axis orientation

and a set of algebraic moments invariants

PREPRINT Please dont distribute

Enable the Gal lery database for Image searches From the command line not the SQL

use the Extender manager and execute

dbext ENABLE DATABASE Gallery FOR DBIMAGE

This example uses the DB UDB version for UNIX and Microsoft Windows op erating systems

Create a table with the desired elds

CREATE TABLE photo collection

id integer PRIMARY KEY NOT NULL photo

photographer varchar NOT NULL

date varchar NOT NULL

photo DBIMAGE

Enable the table photo col lection for contentbased image retrieval This step again uses the

external Extender manager and is comp osed of several substeps

Set up the main table create auxiliary tables and indexes

dbext ENABLE TABLE photo collection FOR DBIMAGE USING TSPLTSP

This creates some auxiliary supp ort tables used by the Extender to supp ort image re

col lection table These tables are stored in the database tablespace trieval for the photo

named TSP while the supp orting large ob jects BLOBs are stored in the LTSP

tablespace The necessary indexes on auxiliary tables are also created in this step

Enable the photo column for contentbased image retrieval This step again uses the

external Extender manager

dbext ENABLE COLUMN photo collection photo FOR DBIMAGE

This makes the photo column active for use with the Image Extender and creates triggers

that will up date the auxiliary administrative tables in resp onse to any change insertion

col lection deletion up date to the data in table photo

Create a catalog for querying the column by image content This is done with the

extender manager

collection photo ON dbext CREATE QBIC CATALOG photo

This creates all the supp ort tables necessary to execute a contentbased image query

The keyword ON indicates that the cataloging pro cess ie the feature extraction will

b e p erformed automatically otherwise p erio dic manual recataloging is necessary

Op en a catalog for adding features for which feature extraction is to take place only

those features present in the catalog will b e available for querying Using the Extender

manager we issue the following command

dbext OPEN QBIC CATALOG photo collection photo

Add those features for which feature extraction should take place to the catalog Here

we will add all four supp orted features

dbext ADD QBIC FEATURE QbColorFeatureClass

dbext ADD QBIC FEATURE QbColorHistogramFeatureClass

dbext ADD QBIC FEATURE QbDrawFeatureClass

dbext ADD QBIC FEATURE QbTextureFeatureClass

These corresp ond to Average Color Histogram Color Positional Color and Texture

PREPRINT Please dont distribute

Not all features need to b e present including unnecessary features will only decrease

p erformance

Close the catalog

dbext CLOSE QBIC CATALOG

col lection table The examples presented here use embedded SQL to Insert into the photo

access a DB database server

EXEC SQL BEGIN DECLARE SECTION

long int Stor

id long the

EXEC SQL END DECLARE SECTION

the id the image id

Stor MMDB STORAGE TYPE INTERNAL int

EXEC SQL INSERT INTO photo collection VALUES

id id the

Ansel Adams name

date

DBIMAGE Image Extender UDF

CURRENT SERVER database server name

imagespicjpg image source file

ASIS keep image format

int Stor store in DB as BLOB

BW Picture comment

This insert p opulates the image data in the auxiliary tables and stores an image handle into

the photo col lection table The DBIMAGE userdened function uses the current server

reads the image lo cated in imagespicjpg and stores it in the server as sp ecied by the

int Stor variable The image is stored without a format change and the discovery of the

image format is left to the Image Extender this is sp ecied by the ASIS option Features are

extracted and stored for the image The comment BW picture is attached to the image in

the auxiliary tables The DBIMAGE userdened function oers several dierent parameter

lists that is it is an overloaded function to supp ort dierent sources to imp ort images

col lection table with an exampleimage Query the photo

SELECT Tphoto id Tphotographer SSCORE

collection T FROM photo

TABLE QbScoreTBFromStr

QbColorFeatureClass color and

QbColorHistogramFeatureClass fileserverimgpic and

QbDrawFeatureClass fileserverimgpicgif and

QbTextureFeatureClass fileserverimgpicgif

photo collection

photo

AS S

ID as varchar CASTTphoto as varchar WHERE CASTSIMAGE

This query uses the image stored in imgpicgif as a query image and uses all four fea

tures The QbScoreTBFromStr userdened function takes a query string an enabled table

photo col lection a column photo name and a maximum numb er of images to return

PREPRINT Please dont distribute

This userdened function returns a table with two columns The rst column is named IM

AGE ID and contains the image handle used by the Image Extender in the original table e

col lection The second column is named SCORE and is a numeric value which table photo

denotes the query to image similarity score interpreted as a distance A score of denotes a

p erfect match and higher values indicate progressively worse matches

The query string is structured as an and separated chain of feature name feature value

feature weight triplets The feature name indicates which feature to match The feature

value is a sp ecication of the value for the desired feature and can b e sp ecied in several

ways literally sp ecifying the values which is cumb ersome as it requires that the user

know the internal representation of each feature an image handle returned by the image

Extender itself so an already stored image can b e used as the query and an external le

for which the features are extracted and used The ab ove example uses the rst approach for

the average color feature sp ecifying an average color of red The remaining three features

use the third approach and use an external image from which features are extracted for the

query The feature weight indicates the weight for this feature and is relative to the other

features if a weight is omitted then a default value of is assumed

The table returned by the QbScoreTBFromStr userdened function is joined on the image

handle with the photo col lection table to retrieve the photo id and photographer attributes

and keep the score of the image match with the query

Oracle Visual Image Retrieval Cartridge

Like Informix and IBM Oracle supp orts a comprehensive media management framework named

Oracle Intermedia that incorp orates a numb er of technologies Oracle Intermedia is designed with

the ob jective of managing diverse media by providing many services from long term archival to

contentbased search of text and images to video storage and delivery The Oracle Intermedia

media management suite contains a numb er of pro ducts designed to manage rich multimedia

content particularly in the context of the web Sp ecically it includes comp onents to handle audio

image and video data typ es A sample application for this pro duct would b e an online music store

that wishes to oer music samples photos of the CD cover and p erformers and a sample video

of the p erformers Intermedia is a to ol b ox that includes a numb er of ob jectrelational datatyp es

indices etc that provide storage and retrieval facilities to web servers and other delivery channels

including streaming video and audio servers The actual media data can b e stored in the server for

full control or externally without full transactional supp ort in a le system web server streaming

server or other userdened source Functions implemented by this suite include among others

dynamic image transco ding to provide b oth thumbnails and full resolution images to the client

up on request As part of the Intermedia suite the Oracle Visual Image Retrieval VIR pro duct

8

supplied by Virage provides image search capabilities

VIR supp orts matching on global color lo cal color texture and structure Global color captures

the images global color content while lo cal color takes into account the lo cation of the color in

the image Texture distinguishes dierent patterns and nuances in images such as smo othness

or graininess Structure seeks to capture the overall layout of a scene such as the horizon in a

photo or the tall vertical b oxes of skyscrap ers The pro duct supp orts arbitrary combinations of

the supp orted feature representations as a query Users can adjust the weights asso ciated with the

features in the query according to the asp ects they wish to emphasize A score that incorp orates

the matching of all features is computed for each image via a weighted summation of the individual

8

Virage also provided a version of its image retrieval system to Informix and is supp orted as a Datablade

PREPRINT Please dont distribute

feature matches The score is akin to the distance b etween two images where lower p ositive values

indicate higher similarity while larger values indicate lower similarity Only those images with a

score b elow a given threshold are returned and the remaining images are deemed not relevant

to the query Oracle Visual Image Retrieval uses a proprietary index to sp eed up the matching

referred to as an index of typ e ORDVIRIDX

We now sp ecify the steps needed to build an image retrieval application The example co de

presented b elow uses Oracles PLSQL language extensions PLSQL is a pro cedural extension to

SQL To supp ort image retrieval the following steps are required

Have Oraclei Enterprise Edition and the Visual Image Retrieval pro duct installed and suit

ably congured storage tablespaces

Create a Database to store all tables and auxiliary data needed for our example We will call

this the Gal lery database

CREATE DATABASE Gallery

Create a table with the desired attributes and the image datatyp e

CREATE TABLE photo collection

id number PRIMARY KEY NOT NULL photo

photographer VARCHAR NOT NULL

date VARCHAR NOT NULL

photo ORDSYSORDVir

Insert images into the newly created table In Oracle this will b e done through their PLSQL

language as there are multiple steps to insert an image

DECLARE

image ORDSYSORDVIR

the id NUMBER

BEGIN

id use a serial number the

INSERT INTO photo collection VALUES

the id Ansel Adams

ORDSYSORDVIRORDSYSORDImageORDSYSordsource

BLOB FILE ORDVIRDIR the imagejpg sysdate empty

NULL NULL NULL NULL NULL NULL NULL NULL

SELECT photo INTO image

FROM photo collection

WHERE photo id the id

FOR UPDATE

imageSetProperties

imageimportNULL

imageAnalyze

UPDATE photo collection

SET photo image

id WHERE id the

END

The insert command only stores an image descriptor not the image itself To get the image

rst its prop erties have to b e determined using the SetProperties command Then the

image itself is loaded in with the importNULL command and its features extracted with the

Analyze command Lastly the table is up dated with the image and its extracted features

PREPRINT Please dont distribute

Create an index on the features to sp eed up the similarity queries

CREATE INDEX imgindex

photosphotosignature ON catalog

INDEXTYPE IS ordsysordviridx

PARAMETERS ORDVIR DATA TABLESPACE tbs

INDEX TABLESPACE tbs ORDVIR

and tbs are suitable tablespaces that provide storage Here tbs

Query the catalog photos table

The following example selects images that are similar to an image already in the table with

id equal to

SELECT Tphoto id Tphoto ORDSYSVIRScore SCORE

FROM catalog photos T catalog photos S

WHERE

id Sphoto

AND

ORDSYSVIRSimilarTphotosignature Sphotosignature

globalcolor localcolor texture structure

This statement returns three columns the rst one is the id of the returned image the second

column is the image itself and the third column is the score of the similarity b etween the

query image and the result image the parameter to the VIRScore function is discussed b e

low The query do es a self join to fetch the value Sphotosignature for the image with an id of

which is the signature of the query image The image similarity computation is p erformed

by the VIRSimilar function in the query condition This function has ve arguments

Tphotosignature the compared images features

Sphotosignature the query image features

A string value that describ es the features and weights to b e used in matching This

example has the string

globalcolor localcolor texture structure

The value for a weight indicates the feature is unimp ortant and the value indicates

the highest imp ortance for that feature Only those features listed are used for matching

If for example global color is not needed then it may b e removed from the list In

this example all features are used and their weights are for global color for lo cal

color for texture and for structure

The fourth parameter is a threshold for deciding which images are similar enough to

the query signature to b e returned for the query The Image Retrieval Cartridge uses a

distance interpretation of similarity A score of indicates the signatures are identical

while scores higher than indicate progressively worse matches In this example the

threshold value is ie those images with a score larger than will not b e returned

in resp onse to the query

The last value is optional and is used to recover the computed similarity score The alert

reader may have noticed that the VIRSimilar function is in a where clause a Bo olean

condition and therefore must return true or false as opp osed to the computed similarity

score The function returns true if the computed score is b elow the threshold and false

otherwise If the query wishes to list for each retrieved image its similarity score to the

PREPRINT Please dont distribute

query as is the case here a dierent mechanism is needed to retrieve the score elsewhere

in the query This parameter value is thus used to uniquely identify the similarity score

computed by the VIRSimilar function within the query in order to make it available

elsewhere in the query through the use of the VIRScore function VIRScore retrieves

the similarity score by providing the same numb er as in the VIRSimilar function This

keybased identication mechanism enables multiple calls to scoring functions within the

same query

The nal step in the query is to sort the result in increasing order of SCORE such that the

most similar image will b e the rst one returned

This example uses an image already in the table as the query image but an external image

may also b e used To do this extra steps are needed similar to the insert command where

an external image is read in and its features extracted and used in the VIRSimilar function

This scenario do es not require a self join as the query feature vector is directly accessible

Additional functionality is provided by a thirdparty software package from Visual Technology

This comp onent supp orts sp ecialpurp ose op erators for searching for human faces among images

stored in the database Besides image search the Visual Image Retrieval package oers a numb er of

additional op erational options such as image format conversion and ondemand image transco ding

of query results

Discussion

We have discussed the extensions supp orted for incorp orating images and multimedia into databases

by three of the ma jor DBMS vendors All the vendors discussed oer media asset management suites

to archive and manage digital media Their oerings dier in the details of their comp osition scop e

and source ie third party vs home grown and their maturity The image retrieval capabilities of

all vendors are roughly comparable Despite minor administrative dierences in table and column

setup once the tables and p ermissions are set prop erly the insertion and querying pro cess is

comparable Each of the image retrieval pro ducts discussed ab ove essentially supp orts the base

contentbased image retrieval mo del discussed in Section There is however one dierence

Recall that in Section the mo del p ermits several queryexample images but so far in this

section we only considered singleexampleimage queries Multiple exampleimage query supp ort is

b eyond the current query mo del implemented by these vendors but is not imp ossible to implement

Indeed the mo del can b e incorp orated in a query alb eit in an exp osed fashion Exp osed b ecause

now the user writing the query is exp osed to the retrieval mo del and is resp onsible for formulating

a query prop erly To see how such a query can b e sp ecied we will use Informix as an example

SELECT photo id score score as score

collection FROM photo

WHERE Resembles fv

GetFeatureVectorIfdImgDescFromFileimgqueryjpg

score REAL

AND Resembles fv

GetFeatureVectorIfdImgDescFromFileimgqueryjpg

score REAL

ORDER BY score

This query uses two external images queryjpg and queryjpg and computes the score b etween

each individual image in the table and the queryjpg and queryjpg image feature vectors fv result

ing in one score for each of the two exampleimages Then it combines b oth scores with a weighted

PREPRINT Please dont distribute

Query

P

P

P

P

P

P

P

Pq

queryjpg queryjpg

Q Q

B B

Q Q

B Q B Q

10 5 20 5

Q Q

B B

Q Q

30 40 10 5 20 20 5 5

B Qs R B Qs R

BN BN

v v v v v v v v v v v v

Figure Query example

summation with of weight for queryjpg and of the weight to queryjpg Notice that

b oth Resembles function calls sp ecify a threshold of and that they use dierent weights for

dierent features Figure shows the query tree that corresp onds to this example In this gure

the leaf no des corresp ond to actual values v for the query image i and the feature j

ij

It is not clear if the featurebased mo del describ ed in Section can b e supp orted by existing

systems Furthermore we note that none of the pro ducts currently available is p owerful enough to

supp ort regionbased image retrieval relevance feedback mechanisms or merge functions other

than weighted summation at dierent levels of the query tree Extending image retrieval with these

functionalities is a signicant research challenge which the research community has yet to address

Finally we note that the ab ove description of the image extensions to commercial DBMSs

is certainly not complete Besides contentbased image retrieval pro ducts include many functions

that are designed to handle op erational considerations such as image format conversions and image

pro cessing Interested readers are referred to the pro duct manuals

Current Research

The advent of ob jectrelational technology has greatly facilitated the building of contentbased

image retrieval applications on top of commercial database systems Using ADTs UDFs and

BLOBs supp orted by commercial systems applications can extract the visual features of images

store the features in relational tables along with the raw images themselves and use these features

to compute query matches While this represents signicant progress several prop osals to improve

the ab ove approach have app eared in the research literature One of them is an ecient technique

to implement abstract data typ es on top of relational storage managers Another issue is that

of allowing users to easily integrate multidimensional indexing structures into the DBMS and use

them as access metho ds This will allow applications to build indexes on the image features and use

them to eciently answer contentbased queries To realize the full p otential of the feature indexes

the pro cessing of contentbased queries ie topk and thresholdbased queries must b e pushed

inside the database engine Commercial systems such as those describ ed in Section retrieve all

the matching ob jects from the database with their computed scores and then p erform most of the

pro cessing ie sorting and pruning as an indep endent step This misses out on opp ortunities

for optimization and ecient evaluation of contentbased queries Pushing the pro cessing into the

engine would op en up such optimization opp ortunities leading to tremendous p erformance gains

Another prop osal is that of ecient supp ort for similarity joins to facilitate nding similar pairs of images

PREPRINT Please dont distribute

Implementing ADTs using Relational Storage Managers

While abstract data typ es ADTs have app eared in mainstream commercial databases

they present several challenges in terms of storage management Abstract data typ es

supp ort varied functionalities such as inheritance p olymorphism substitutability encapsulation

structures and collections among others We discuss the storage management problems that arise

when an ADT is dened as an aggregation of base data typ es andor already dened ADTs like

the way structs are dened in C In our discussion we do not consider opaque typ es where

the system treats the typ e as an uninterpreted chunk of memory which is interpreted by the

userdened functions We also do not consider the functions dened for the ADT here but

will cover them in more detail in Section

9

Consider an ADT for a few geometric shap es in two dimensions

Type dShape

Type Point inherits from dShape xy integer

Type Circle inherits from dShape xy radius integer

Type Rectangle inherits from dShape xy xy integer

Type Triangle inherits from dShape xy xy xy integer

Supp ose we want to asso ciate one or more regions with each image in our Gal lery database from

Section The idea is to create image maps for the users to click on Assuming that each region

10

is one of the ab ove d shap es we now create a table to store the regions

regionsregion id integer photo id integer region dShape create table photo

This table would allow a region to b e a p oint a circle a rectangle or a triangle by virtue of typ e

11

substitutability Now we add the following data to our table

insert into photo regions values Point

regions values Circle insert into photo

insert into photo regions values Rectangle

insert into photo regions values Triangle

Here we insert a p oint lo cated at the co ordinates a circle of radius centered at the

p oint a rectangle with opp osing corner p oints and and a triangle with the

three corners and Each tuple ab ove stores one integer for the region id one

id plus an internal xed sized tuple header whose size dep ends on the DBMS integer for the photo

used The rest of the information in all four tuples diers from each other the rst tuple needs

to store two more integers in addition to header region id and photo id the second needs three

more the third needs four more and the fourth needs to store six more integers The question

is how to organize the ab ove tuples on disk in order to handle the diversity among them without

sacricing query eciency The following options are available

One table for each possible data type

Within this scheme although there is only one logical table photo regions the system creates

12

physical tables one for each of dShap e Point Circle Rectangle and Triangle Each

9

Here we have used generic SQL pseudoco de For sp ecic vendor implementations and syntax the appropriate

manual should b e consulted

10

More exible shap es such as p olygons are necessary for such an application Here we restrict ourselves to the

ab ove shap es as our goal is to show the problems faced in storage management of ADTs and not to provide a

complete image mapping solution

11

We assume that the appropriate ob ject constructors have b een dened ie Pointxy has b een dened as a

constructor for the Point typ e

12

There should b e no table for dShap e itself assuming it is a pure abstract data typ e whose only purp ose is to

PREPRINT Please dont distribute

table has a uniform and xed schema and it is up to the query pro cessor to lo ok in all the

physical tables for a query on the logical table The advantages of this approach are that

regular relational storage managers can b e used the tuples have xed length and are therefore

more amenable to optimizations and since the schema for each physical table is known in

advance no dynamic interpretation of ob ject typ es is needed The disadvantages are that

the query pro cessor must decide which tables to search for a query and on some o ccasions

it might b e necessary to search all the physical tables More imp ortantly there could b e an

explosion in the numb er of physical tables if the inheritance hierarchy is deep The Postgres

and Illustra systems used an approach similar to this one

Colocate tuples of dierent types in one table

In this approach a single physical table is used to store all the ve dierent typ es of tuples

Like approach almost no dynamic deco ding of the ob ject typ e is required as the layout

and column information of each tuple typ e is fully precomputed Another advantage is that

all the tuples are stored in the same table avoiding the need for multipletable lo okups The

disadvantage is that there could b e an explosion of the numb er of tuple typ es in the table

This approach is used in the MARS le system

Flatten ADT and map to regular relation

In this approach all the tuples are stored in a single table which is managed by a regular

relational storage manager ie the storage manager need not supp ort multiple tuple typ es

p er relation All the typ es are expanded into their comp onents and stored in individual

columns of the table The columns that are not used are lled with NULL values In this

regions table would have columns region id photo id columns for approach the photo

13

Point for Circle for Rectangle and for Triangle and most of them will contain NULL

values as only those columns that corresp ond to the actual ob ject stored will b e non NULL

ie only columns for a Point ob ject would b e non NULL and the remaining would

b e NULL The advantage of this approach is that it can b e readily implemented in regular

relational storage managers The disadvantages are that space is wasted and additional work

is needed in the query pro cessor to dynamically interpret the correct columns for a tuple Note

that as in the previous approaches there could b e an explosion in storage requirements since

here the numb er of columns needed may increase rapidly

Serialize object inline and dynamical ly interpret content to determine type

In this approach the table schema is exactly as desired with three attributes one each for

region id photo id and dShap e The last attribute is now stored as a variable length column

either inline in the tuple or out of line in a BLOB if its size is to o large This approach

has several advantages It is the most exible since it can most easily handle changes in the

typ e hierarchy the other three approaches must make signicant changes in the schema of

the tables when the typ e hierarchy changes It also avoids the combinatorial explosion and

space overhead problems of the previous approaches The only disadvantage is the overhead

of dynamically interpreting the contents of each tuple to determine its typ e IBM DB follows

a similar approach to store ADTs Sybase also follows a similar approach for Java ob jects

serve as the common sup erclass ie there can b e no instantiation s of this typ e However here we have included

dShap e as a normal ADT

13

In practice one more column is needed to keep track of which ob ject is stored in the table

PREPRINT Please dont distribute

Table T Tuple t Collectionattribute A

Co-location Sidetable

Co-located Base Side Table Table Table t t h h val1 val1 h val2 val2 h val3

Overflow

val3

(a) (b)

Figure Two ways of implementing collections inside attributes a The colo cation approach

b The side table approach

A study on the implementation of ADTs comparing several variations of approaches and can

b e found in Storage management of rich data typ es has also b een addressed by ob jectoriented

databases

Another imp ortant problem in ORDBMSs that is relevant to image retrieval is the management

of collectiontyp e attributes An example of such an attribute is the p olygonal contour shap e

descriptor represented by the corner p oints the numb er of which can vary from shap e to shap e

Such attributes are usually handled by colo cation or by using side tables In the colo cation

approach the items in the collection attribute of a tuple are stored along with the rest of the tuple

with an optional p ointer to an overow area Figure a shows how the items fv al v al v al g in

the collection attribute A of a tuple t would b e stored The advantage of this approach is eciency

as most of the time all the items in the collection will b e colo cated with the tuple ie no overow

p ointer required The disadvantages are that up dates to a tuple may cause the use of the overow

areas thus increasing fragmentation and degrading p erformance Also the storage manager must

b e able to supp ort co existence of tuples from dierent schemas in the same le In the side table

approach there is a base table and there is a separate side table for each column with a collection

attribute Each tuple t in the base table stores a system generated handle h for each collection

attribute A The handle h is a key into a side table where a tuple hh itemi is stored for each item

item in the collection attribute A of tuple t Figure b shows how the same table t with items

fv al v al v alg in the collection attribute A is stored using this approach An advantage of this

approach is that it do es not require any sp ecial supp ort from storage managers as all tuples in

a relation are the same Fragmentation in the side table can also b e avoided by having the le

clustered on the handle A clear disadvantage is that p otentially exp ensive joins or table lo okups are needed

PREPRINT Please dont distribute

Multidimensional Access Metho ds

As discussed in Section image retrieval systems represent the content of the images using visual

features like color texture and shap e Pro cessing contentbased queries on large image collections

can b e sp eeded up signicantly by building indices on the individual features known as the feature

indices or simply Findices and using them to answer contentbased queries Since the feature

spaces are highdimensional in nature ie dimensional color histogram space novel indexing

techniques must b e develop ed and incorp orated into the DBMS cf Chapters and The

purp ose of a feature index is to eciently retrieve the b est matches with resp ect to that feature

by executing a range search or a k nearest neighb or k NN search on the multidimensional index

structure How these individual feature matches returned by the feature indices can b e used to

obtain the overall b est matches will b e discussed in Section In this section we discuss research

issues that arise in designing index structures that can execute range and k NN searches eciently

over image featurespaces in supp orting new typ es of queries for image retrieval applications and

in integrating into the DBMS multidimensional index structures

Designing Index Structures for Image Feature Spaces

The main problem that arises in indexing image feature spaces is high dimensionality For example

the color histograms used in the MARS system are usually or dimensional Many

multidimensional index structures do not scale to such high dimensionalities Designing scalable

index structures has b een an active area of research and is discussed in detail in Chapters and

Supp orting Multimedia Queries on top of Multidimensional Index Structures

Traditionally multidimensional index structures supp ort only p oint range and k NN queries with

a single query p oint and only the Euclidean distance function In multimedia retrieval the retrieval

mo del denes what a match b etween two images means with resp ect to each individual feature The

measure of match rather mismatch is dened in terms of a distance function The retrieval mo del

uses arbitrary distance functions typically an L metric and arbitrary weights along the dimensions

p

of the feature space to capture the visual p erception of the user This implies that the index

structure must supp ort arbitrary distance functions and arbitrary weights along the dimensions

that are sp ecied by the user at query time Such techniques have b een develop ed in

Another requirement of the index structure is to supp ort multiexample queries since the user

might submit multiple images as part of the query see section Such queries are particularly

imp ortant for retrieval mo dels that represent the query using multiple query p oints A

multip oint query Q for a feature F is formally dened as Q hn P W D i where n is the

F F F F F F F

(1) (n )

F

numb er of p oints in the query P fP P g is the set of n p oints in the feature space

F F F F

(1) (n )

F

W fw w g are the corresp onding weights and D is a distance function which given

F F F F

two p oints in the feature space returns the distance b etween them usually a weighted L metric

p

The distance b etween the multip oint query Q and an ob ject p oint O with resp ect to feature F

F F

(i)

is dened as the aggregate of the distances b etween O and the individual p oints P P in

F F F

Q The weighted sum

F

n

F

X

(i) (i)

D Q O w D P O

F F F F F F F

i=1

PREPRINT Please dont distribute

(i)

may b e used as an aggregation function The individual p ointtop oint distance D P O is

F F F

given by a weighted L metric

p

1p

d

F

X

p

(i) (j ) (i)

D P O jP j O j j

F F F F F F

j =1

(j )

denotes the intrafeature weight where d is the dimensionality of the feature space and

F F

asso ciated with the j th dimension This is the aggregation function used in the MARS system

The problem is to nd the k nearest neighb ors of Q using the Findex

F

One way to implement a multip oint query is the multiple expansion approach prop osed by

Porkaew et al and Wu et al The approach explores the nearest neighb ors of each

(i)

individual p oint P using the traditional singlep oint k NN algorithm and combines them An

F

alternate way prop osed in is to develop a new k NN algorithm that can handle multip oint

queries The latter technique involves redening the MINDIST function which is used to

compute the distance of an index no de from the query and using the distance function describ ed

ab ove to compute the distance of a indexed ob ject from the multip oint query Exp eriments show

that the latter technique can pro cess a multip oint query much more eciently compared to the

multiple expansion approach

Integration of Multidimensional Index Structures as Access Metho ds in a DBMS

While there exists several research challenges in designing scalable index structures and developing

algorithms for ecient contentbased search using them one of the most imp ortant practical chal

lenges is that of integration of such indexing mechanisms as access metho ds AMs in a database

management system Building a database server with native supp ort for all p ossible kinds of

complex multimedia features and the featuresp ecic indexing mechanisms along with supp ort for

featuresp ecic queriesop erations is not feasible The solution is to build an extensible database

server that allows the application develop er to dene data typ es and related op erations as well as

indexing mechanisms on the stored data which the database query optimizer can exploit to access

the data eciently As discussed in Section commercial ORDBMSs have started providing

extensibility options for users to incorp orate their own index structures As p ointed out earlier

the interfaces exp osed by current commercial systems are to o lowlevel and place the burden of

writing structural maintenance co de ie concurrency control on the access metho d implementor

The Generalized Search Tree GiST provides a more elegant solution to the ab ove problem

Generalized Search Tree GiST A GiST is a balanced tree of variable fanout b etween k M

2 1

and M k with the exception of the ro ot no de which may have fanout b etween and M

M 2

The constant k is termed the minimum ll factor of the tree Leaf no des in a GiST contain p ptr

pairs where p is a predicate that is used as a search key and ptr is the identier of some tuple in

the database Nonleaf no des contain p ptr pairs where p is a predicate used as a search key

referred to as b ounding predicate BP and ptr p oints to another tree no de The predicates

can b e arbitrary as long as they satisfy the following condition the predicate p in a leaf no de entry

p ptr must hold for the tuple identied by ptr while the b ounding predicate p in a nonleaf no de

entry p ptr must hold for any tuple reachable from ptr A GiST for a key set comprised of d

rectangles is shown in Figure

Generalizing the notion of a search key to an arbitrary predicate makes GiST extensible b oth in

the datatyp es it can index and the queries it can supp ort GiST is like a template the applica

tion develop er can implement a new AM using GiST by simply registering a few domainsp ecic

PREPRINT Please dont distribute

x=6 x=10

y=11

N1 O2 x=3 O1 x=5 P1: P2: O4 x<10 and x>3 and O3 y>4 y<6 y=9

N2 N3 x=11 x=14 P3: P4: P5: P6: x>3 and x<5 and x>6 y<5 and x>11 and O5 y>9 and y<11 x<8 x<14 y=6 O10 y=5 O9 O7 N4 N5 N6 N7 y=4

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10 O6 O8

x=3 x=8

Figure A GiST for a key set comprised of rectangles in dimensional space Note that the

b ounding predicates are arbitrary ie not necessarily b ounding rectangles as in Rtrees

extension metho ds with the DBMS Examples of the extension metho ds are C onsistentE q

which given an entry E p ptr and a query predicate q returns false if p q can b e guaranteed

unsatisable and true otherwise and P enal ty E E which given two entries E p ptr

1 2 1 1 1

E p ptr returns a domainsp ecic p enalty for inserting E into the subtree ro oted at

2 2 2 2

E GiST uses the extension metho ds provided by the AM develop er to implement the stan

1

dard index op erations search insertion and deletion For example the search op eration uses

C onsistentE q to determine which no des to traverse to answer the query while the insert op

eration uses P enal ty E E to determine the leaf no de in which to place the inserted item in

1 2

The AM develop er thus controls the organization of keys within the tree and the b ehavior of the

search op eration thereby sp ecializing GiST to the desired AM The original GiST pap er deals only

with range queries Several extensions to supp ort more general queries ie rankednearest

neighb or queries on top of GiST are prop osed in

Concurrency Control in GiST Although GiST considerably reduces the eort of integrating

new AMs in DBMSs it do es not automatically provide concurrency control It is essential to

develop ecient techniques to manage concurrent access to data via the GiST b efore it can b e

supp orted by commercial strength DBMSs Concurrent access to data via an index structure

intro duces two indep endent concurrency control problems

Preserving consistency of the data structure in the presence of concurrent insertions deletions

and up dates

Protecting search regions from phantoms

Techniques for concurrency control CC in multidimensional data structures and in particular

GiST have b een prop osed recently Developing CC techniques for GiST is particularly

b enecial since the CC co de can b e implemented once by the database develop er the enduser

do es not need to implement individual algorithms for each AM

Preserving Consistency of GiST We rst discuss the consistency problem and its solution

Consider a GiST congured as say and Rtree with a ro ot no de R and two children no des A and

PREPRINT Please dont distribute

B Consider two op erations executing concurrently on the Rtree an insertion of a new key k into

B and a deletion of a key k from B Supp ose the deletion op eration examines R and discovers

that k if present must b e in B Before it can examine B the insertion op eration causes B to

0 0

split into B and B as a result of which k moves to B and subsequently up dates R The delete

op eration now examines B and incorrectly concludes that k do es not exist To avoid the ab ove

problem Kornaker et al prop ose a linkedbased technique that was originally used in Btrees

By adding a right link b etween a no de and its splito right sibling and a no de sequence numb er to

every no de the op erations ie the deletion op eration in the ab ove example can detect whether

the no de has split since the parent was examined and if so can comp ensate for the missed splits

by following the right links

Phantom Protection in GiST We now move on to the problem of phantom protection Consider

a transaction T reading a set of data items from a GiST that satisfy some search predicate Q

Transaction T then inserts a data item that satises Q and commits If T now rep eats its scan

with the same search predicate Q it gets a set of data items known as phantoms dierent from

the rst read Phantoms must b e prevented to guarantee serializable execution Note that ob ject

level lo cking do es not prevent phantoms since even if all ob jects currently in the database

14

that satisfy the search predicate are lo cked concurrent insertions into the search range cannot

b e prevented There are two general strategies to solve the phantom problem namely predicate

locking and its engineering approximation granular locking In predicate lo cking transactions

acquire lo cks on predicates rather than individual ob jects Although predicate lo cking is a complete

solution to the phantom problem it is usually to o costly In contrast in granular lo cking the

predicate space is divided into a set of lo ckable resource granules Transactions acquire lo cks on

granules instead of on predicates The lo cking proto col guarantees that if two transactions request

0 0

conictingmo de lo cks on predicates p and p such that p p is satisable then the two transactions

will request conicting lo cks on at least one granule in common Granular lo cks can b e set and

released as eciently as ob ject lo cks An example of the granular lo cking approach is the multi

granularity locking protocol MGL Application of MGL to the key space asso ciated with a

Btree is referred to as key range locking KRL

In Kornaker et al develop a solution for phantom protection in GiSTs based on predicate

lo cking In the prop osed proto col a searcher attaches its search predicate Q to all the index no des

whose b ounding predicates BPs are consistent with Q Subsequently the searcher acquires shared

mo de lo cks on all ob jects consistent with Q An inserter checks the ob ject to b e inserted against

all the search predicates attached to the no de in which the insertion takes place If it conicts with

any of them the inserter attaches its predicate to the no de to prevent starvation and waits for the

conicting transactions to commit If the insertion causes a BP of a no de N to grow the predicate

attachments of the parent of N are checked with the new BP of N and are replicated at N if

necessary The pro cess is carried out topdown over the entire path where no de BP adjustments

take place Similar predicate checking and replication is done b etween sibling no des during split

propagation The details of the proto col can b e found in

In Chakrabarti and Mehrotra prop ose an alternative approach based on granular lo cking

Note that the granular lo cking technique for Btrees viz key range lo cking KRL cannot b e

applied for phantom protection in multidimensional data structures since it relies on a total order

of key values which do es not exist for multidimensional data Imp osing an articial total order

say a Zorder over multidimensional data to adapt KRL is not a viable technique either The

rst step is to dene lo ckable resource granules over the multidimensional key space One way to

14

These insertions may b e a result of insertion of new ob jects up dates to existing ob jects or rollingback deletions

made by other concurrent transactions

PREPRINT Please dont distribute

dene the granules is to statically partition the key space eg as a grid and treat each partition

ie each grid cell as a granule The problem with such a partitioning is that it do es not adapt

to the key distribution some granules may contain many more keys than others causing them

to b ecome hot sp ots In the authors use the predicate space partitioning generated by the

GiST to dene the granules There is a lo ckable granule TGN asso ciated with each index no de

N of a GiST whose coverage is dened by the granule predicate GP N asso ciated with the no de

GP N is dened as follows Let P denote the parent no de of N P is NULL if N is the ro ot

no de and BP N denote the b ounding predicate of N The granule predicate GP N of no de

N is equal to BP N if N is the ro ot and BP N GP P otherwise For example the granule

predicate asso ciated with the nonleaf no de N in Figure is P x y while that

asso ciated with leaf no de N is P P x y The granules asso ciated with

leaf no des are called leaf granules while those asso ciated with nonleaf no des are called nonleaf

granules Note that the ab ove partitioning scheme do es not suer from the hot sp ot problem of

static partitioning since the granules dynamically adapt to the key distribution as keys are inserted

into and deleted from the GiST

Once the granules are dened the authors develop lo ck proto cols for the various op erations

on the GiST As mentioned b efore for correctness if two op erations conict they must request

conicting lo cks on at least one granule in common The proto col exploits in addition to shared

mo de S and exclusive mo de X lo cks intention mo de lo cks which represent the intention to

set lo cks at ner granularity The compatibility matrix for the various lo ck mo des used by the

proto col can b e found in The lo ck proto col of the search op eration is simple A searcher

acquires commitduration Smo de lo cks on all granules b oth leaf and nonleaf consistent with

its search predicate Note that in this technique unlike the approach of the searcher do es not

acquire ob jectlevel lo cks The lo ck proto col of the insertion op eration is slightly more involved

Let O b e the ob ject b eing inserted and g b e the granule corresp onding to the leaf no de in which

O is b eing inserted The proto col has the following two cases If the insertion do es not cause g

to grow the inserter acquires a commitduration IXmo de lo ck on g where the IX mo de is

an intention mo de intention to set shared or exclusive mo de lo cks at ner granularity and

a commitduration Xmo de lo ck on O Otherwise the insertion acquires a commitduration

IXmo de lo ck on g a commitduration Xmo de lo ck on O and a short duration IXmo de

lo ck on TGLUno de where the LU no de Lowest Unchanged No de denotes the lowest no de in

the insertion path whose GP do es not change due to the insertion The ab ove proto col guarantees

that a transaction cannot insert an ob ject into the search region of another concurrentlyrunning

transaction ie it will request a conicting lo ck on at least one common granule and hence will

blo ck till the search transaction is over The correctness pro ofs and the lo ck proto cols of the other

op erations can b e found

Supp orting Topk Queries in Databases

In contentbased image retrieval almost all images match the query image to some degree or

another The user is typically not interested in all matching images ie all images with degree

of match as that might retrieve the entire database Rather she is interested in only the top

few matching images There are two ways of retrieving the top few images Topk queries return

the k b est matches irresp ective of their scores For example in Section we requested the top

images matching imag epicg if Range queries or alphacut queries return all the images

whose matching score exceeds a usersp ecied threshold irresp ective of their numb er For example

in Section we requested all images whose degree of match to the image tmpj pg exceeds

PREPRINT Please dont distribute

15

the alphacut of Database query optimizers and query pro cessors do not supp ort queries

with usersp ecied limits on result cardinality the limiting of the result cardinalities in the ab ove

examples in Section is achieved at the application level by Excalibur Visual Retrievalware

in Informix QBIC in DB and Virage in Oracle The database engine returns all the tuples

that satisfy the nonimage selection predicates if any and all tuples otherwise the application

then evaluates the image match for each returned tuple and retains only those that satisfy user

sp ecied limit This causes large amounts of wasted work by the database engine as it accesses and

retrieves tuples most of which are eventually discarded leading to long resp onse times Signicant

p erformance improvements can b e obtained by pushing the topk and range query pro cessing inside

the database engine In this section we discuss query optimization and query pro cessing issues

that arise in that context

Query Optimization

Relational query languages particularly SQL are declarative in nature they sp ecify what the

answers should b e and not how they are to b e computed When a DBMS receives an SQL query it

rst validates the query and then determines a strategy for evaluating it Such a strategy is called

the query evaluation plan or simply plan and is represented using an op erator tree For a given

query there are usually several dierent plans that will pro duce the same result they only dier

in the amount of resources needed to compute the result The resources include time and memory

space in b oth disk and main memory The query optimizer rst generates a variety of plans by

cho osing dierent orders among the op erators in the op erator tree and cho osing dierent algorithms

to implement these op erators and then cho oses the b est plan based on the available resources

The two common strategies to compute optimized plans are rulebased optimization and

costbased optimization In the rulebased approach a numb er of heuristics are enco ded

in the form of pro duction rules that can b e used to transform the query tree into an equivalent

tree that is more ecient to execute For example a rule might sp ecify that selections are to b e

pushed b elow joins since this reduces the sizes of the input relations and hence the cost of the

join op eration In the costbased approach the optimizer rst generates several plans that would

correctly compute the answers to a query and computes a cost estimate for each plan The system

maintains some running statistics for each relation ie numb er of tuples numb er of disk pages

o ccupied by the relation etc as well as for each index ie numb er of distinct keys numb er of

pages etc which are used to obtain the cost estimates Subsequently the optimizer cho oses the

plan with the lowest estimated cost

Access Path Selection for Topk Queries Pushing topk query pro cessing inside the database

engine op ens up several query optimization issues One of them is access path selection The access

path represents how the topk query accesses the tuples of a relation in order to compute the result

To illustrate the access path selection problem let us consider an example image database where

the images are represented using two features color and texture Assuming that all the extracted

feature values are stored in the same tuple along with other image information ie the photo id

photographer and date in the example in Section one option is to sequentially scan through

the entire table computing the similarity score for each tuple by rst computing the individual

feature scores and then combining them using the merge function while retaining the k tuples with

the highest similarity scores This option may b e to o slow esp ecially if the relation is very large

15

For b oth typ es of queries the user typically exp ects the answers to b e ranked based on their degree of match

the b est matches b efore the less go o d ones For topk queries a get more feature is desirable so that the user

can ask for additional matches if she wants

PREPRINT Please dont distribute

Another option that avoids this problem is to index each feature using a multidimensional index

structure With the indexes in place the optimizer has several choices of access paths

Filter the images on the color feature using k NN search on the color index access the full

records of the returned images which contain the texture feature values and compute the

overall score

Filter the images on the texture feature analyze the full records of the returned images which

contain the color feature values and compute the overall score

Use b oth the color and texture indexes to nd the b est matches with resp ect to each feature

16

individually and merge the individual results

Note the numb er of execution alternatives increases exp onentially with the numb er of features

The presence of other selection predicates ie the date predicate in the ab ove

example also increases the size of the execution space It is up to the query optimizer to determine

the access path to b e used for a given query Database systems use a costbased technique for access

path selection as prop osed by Selinger et al To apply this technique several issues need to b e

considered In image databases the features are indexed using multidimensional index structures

which serve as access paths to the relation New kinds of statistics need to b e maintained for such

index structures and new cost formulae need to b e develop ed for accurate estimation of their access

costs In addition the cost mo dels for topk queries are likely to b e signicantly dierent from

traditional database queries that return all tuples satisfying the usersp ecied predicates The cost

mo del would also dep end on the retrieval mo del used ie on the similarity functions used for each

individual feature as well as the ones used to combine the individual matches cf Section

Such cost mo dels need to b e designed

In Chaudhari and Gravano prop ose a cost mo del to evaluate the costs of the various

17

execution alternatives and develop an algorithm to determine the cheap est alternative The

cost mo del relies on techniques for estimating selectivity of queries in individual feature spaces

estimating the costs of k NN searches using individual feature indices and probing a relation for a

tuple to evaluate one or more selection predicates Most selectivity estimation techniques prop osed

so far for multidimensional feature spaces work well only for low dimensional spaces but are not

accurate in the highdimensional spaces commonly used to represent images features

More suitable techniques based for instance on fractals are b eginning to app ear in the literature

Work on cost mo dels for range and k NN searches on multidimensional index structures

includes earlier prop osals for low dimensional index structures ie Rtree in and more

recent work for higher dimensional spaces in

Optimization of Exp ensive UserDened Functions The system may need to evaluate

multiple selection predicates on each tuple accessed via the chosen access path Relational query

optimizers typically place no imp ortance on the order in which the selection predicates are evaluated

on the tuples The same is true for pro jections Selections and pro jections are assumed to b e zero

time op erations This assumption is not true in contentbased image retrieval applications where

selection predicates may involve evaluating exp ensive userdened functions Let us consider the

following query on the photo collection table in Section

16

Yet another option would b e to create a combined index on color and texture features so that the ab ove query

can b e answered using a single index We do not consider this option in this discussion

17

The authors do not consider all the execution alternatives but only a small subset of them called searchminimal

executions Also the authors only consider Bo olean queries and do not handle more complex retrieval mo dels

ie weighted sum mo del probabilisti c mo del

PREPRINT Please dont distribute

Retrieve all photographs taken since the year

that match the query image more than

SELECT photo id score

collection FROM photo

WHERE Resemblesfv

GetFeatureVectorIfdImgDescFromFiletmpjpg

score REAL

AND

date

ORDER BY score

The query has two selection predicates the Resembles predicate and the predicate involving the

date The rst one is a complex predicate that may take hundreds or even thousands of instructions

to compute while the second one is a simple predicate that can b e computed in a few instructions

If the chosen access path is a sequential scan over the table the query will run faster if the second

selection is applied b efore the rst since doing so minimizes the numb er of calls to Resembles The

query optimizer must therefore take into account the computational cost of evaluating the UDF

in order to determine the b est query plan Costbased optimizers only use the selectivity of the

predicates to order them in the query plan but do not consider their computational complexities

In Hellerstein et al use b oth the selectivity and the cost of selection predicates to optimize

queries with exp ensive UDFs In Chaudhari and Shim prop oses dynamicprogrammingbased

algorithms to optimize queries with exp ensive predicates

The techniques discussed ab ove deal with UDF optimization when the functions app ear in

the where clause of an SQL query Exp ensive UDFs can also app ear in the pro jection clause as

op erations on ADTs Let us consider the following example taken from A table stores images

in a column and oers functions to crop an image and apply lters to it eg a sharp ening lter

Consider the following query using the syntax syntax of

select imagesharpencrop from image table

The user requests that all images b e ltered using sharpen and then cropped in order to extract the

p ortion of the image inside the rectangular region with diagonal endp oints and

Sharp ening an image rst and then cropping is wasteful as the entire image would b e sharp ened

and of it would b e discarded later assuming the width and height of the images are

Inverting the execution order to imagecropsharpen reduces the total CPU cost It may

not b e always p ossible for the user to enter the op erations in the b est order ie crop function b e

fore the sharpen function sp ecially in the presence of relational views dened over the underlying

tables In such cases the optimizer should reorder these functions to optimize the CPU cost

of the query The Predator pro ject prop oses to use EnhancedADTs or EADTs to address

the ab ove problem The EADTs provide information regarding the execution cost of various op

erations their prop erties ie commutativity b etween sharpen and crop op erations in the ab ove

example etc which the optimizer can use to reorder the op erations and reduce the execution cost

of the query In some cases it might b e p ossible to remove function calls For example if X is an

image rotate is a function that rotates an image and count dierent colors is a function that

counts the numb er of dierent colors in an image the op eration Xrotatecount dierent colors

dierent colors thus saving the cost of rotation do cuments can b e replaced by Xcount

p erformance improvements of up to several orders of magnitude using these techniques

PREPRINT Please dont distribute

Query Pro cessing

In the previous section we discussed how pushing the topk query supp ort into the database engine

can lead to choice of b etter query evaluation plans In this section we discuss algorithms that the

query pro cessor can use to execute topk queries for some of the query evaluation plans discussed

in the previous section

Evaluating Topk Queries Let us again consider an image database with two features color

and texture each of which is indexed using a multidimensional index structure Let F and

color

F denote the similarity functions for the color and texture features individuall y Examples of

textur e

individual feature similarity functions or equivalent distance functions are the various L metrics

p

Let F denote the function that combines or aggregates the individual similarities with resp ect

ag g

to color and texture features of an image to the query image to obtain its overall similarity to

the query image Examples of aggregation functions are weighted summation probabilistic and

fuzzy conjunctive and disjunctive mo dels etc The functions F F and F and their

color textur e ag g

asso ciated weights together constitute the retrieval mo del cf Section In order to supp ort top

k queries inside the database engine the engine must allow users to plug in their desired retrieval

mo dels for the queries and tune the weights and the functions in the mo del at query time The

query optimization and evaluation must b e based on the retrieval mo del sp ecied for the query

We next discuss some query evaluation algorithms that have b een prop osed for the various retrieval

mo dels

One of the evaluation plans for this example database discussed in Section is to use the

individual feature indexes to nd the b est matches with resp ect to each feature individual ly and

then merge the individual results Fagin prop osed an algorithm to evaluate a topk query

eciently according to the ab ove plan The input to this algorithm is a set of ranked lists X

i

generated by the individual feature indices by the k NN algorithm The algorithm accesses each

X in sorted order based on its score and maintains a set L X that contains the intersection

i i i

of the ob jects retrieved from the input ranked lists Once L contains k ob jects the full tuples of all

the items in X are prob ed by accessing the relation their overall scores are computed and the

i i

tuples with the k b est overall scores are returned The ab ove algorithm works as long as F is

ag g

0 0 0

monotonic ie F x x F x x for x x for every i Most of the interesting

ag g 1 m ag g i

m

1 i

aggregation functions like the weighted sum mo del and the fuzzy and probabilistic conjunctive

and disjunctive mo dels satisfy the monotonicity prop erty Fagin also prop oses optimizations to his

algorithm for sp ecic typ es of scoring functions

While Fagin prop oses a general algorithm for all monotonic aggregation functions Ortega et

al prop ose evaluation algorithms that are tailored to sp ecic aggregation functions ie

separate algorithms for weighted sum fuzzy conjunctive and disjunctive mo dels and probabilistic

conjunctive and disjunctive mo dels This approach allows incorp orating functionsp ecic opti

mizations in the evaluation algorithms and is used by the MARS system One of the main

advantages of these algorithms is that they do not need to prob e the relation for the full tuples

This can lead to signicant p erformance improvements since according to the cost mo del prop osed

in the total database access cost due to probing can b e much higher than the total cost due

to sorted access ie accesses using the individual feature indices Another advantage comes from

the demanddriven data ow followed in While Fagins approach retrieves ob jects from the

individual streams and buers them until it reaches the termination condition jLj k and then

returns all the k ob jects to the user the algorithms in retrieve only the necessary numb er of

ob jects from each stream in order to return the next b est match This demanddriven approach

reduces the wait time of intermediate answers in temp orary les or buers b etween the op erators in

PREPRINT Please dont distribute

a query tree On the other hand in the items returned by the individual feature indexes must

wait in a temp orary le or buer until the completion of the probing and sorting pro cess Note

that b oth approaches are incremental in nature and can supp ort the get more feature eciently

Several other optimizations of the ab ove algorithms have b een prop osed recently

An alternative approach to evaluating topk queries has b een prop osed by Chaudhuri and

Gravano It uses the results in to convert topk queries to alphacut queries and

pro cesses them as lter conditions Under certain conditions uniquely graded rep ository this

approach is exp ected to access no more ob jects than the strategy in Much like the former

approach this approach also requires temp orary storage and sorting of intermediate answers b efore

returning the results Unlike the former approaches this approach cannot supp ort the get more

feature without reexecuting the query from scratch Another way to convert topk queries to

threshold queries is presented in This work mainly deals with traditional datatyp es It uses

selectivity information from the DBMS to probabilisticall y estimate the value of the threshold that

would retrieve the desired numb er of items and then uses this threshold to execute a normal range

query Carey and Kossman prop ose techniques to limit the answers in an ORDER BY query to a

usersp ecied numb er by placing stop op erators in the query tree

Similarity Joins Certain queries in image retrieval applications may require join op erations An

example is nding all pairs of images that are most similar to each other Such joins are called

similarity joins A join b etween two relations returns every pair of tuples from the two relations

18

that satisfy a certain selection predicate In a traditional relation join the selection predicate is

precise ie equality b etween two numb ers hence a pair of tuples either do es or do es not satisfy

the predicate This is not true for general similarity joins where the selection predicate is imprecise

ie similarity b etween two images each pair of tuples satises the predicate to a certain degree

but some pairs satisfy the predicate more than other pairs Thus just using similarity as the join

predicate would return almost all pairs of images including pairs with very low similarity scores

Since the user is typically interested in the top few b est matching pairs we must restrict the

result to only those pairs with go o d matching scores This notion of restricted similarity joins

have b een studied in the context of geographic proximity distance and in the similarity

context The problem of a restricted similarity join b etween two datasets A and

B containing ddimensional p oints is dened as follows Given p oints X x x x

1 2 d

A and Y y y y B and a threshold distance the result of the join contains all pairs

1 2 d

X Y is is typically an X Y is less than The distance function D X Y whose distance D

d d

L metric Chapter but other distance measures are also p ossible

p

A numb er of nonindexed and indexed metho ds have b een prop osed for similarity joins Among

the nonindexed ones the nested lo op join is the simplest but has the highest execution cost If jAj

and jB j are the cardinality of the datasets A and B then the nested lo op join has a time complexity

of jAj jB j which degenerates to a quadratic cost if the datasets are similarity sized Among the

indexed metho ds one of the earliest prop osed ones is the RTree based metho d is presented in

This RTree metho d traverses synchronously the structure of two RTrees matching corresp onding

branches in the two trees It expands each b ounding rectangle by on each side along each

dimension and recursively searches each pair of overlapping b ounding rectangles Most other index

based techniques for similarity joins employ variations of this technique The generalization

of the multidimensional similarity join technique describ ed ab ove which can b e applied to obtain

similar pairs with resp ect to individual features to obtain overall similar pairs of images remains

a research challenge

18

Note that the two input relation to a join can b e the same relation known as a self join

PREPRINT Please dont distribute

Conclusions

In this chapter we explored how the evolution of traditional relational databases into p owerful

extensible ob jectrelational systems has facilitated the development of applications that require

storage of multimedia ob jects and retrieval based on their content In order to supp ort content

based retrieval the representation of the multimedia ob ject ob ject mo del must capture its visual

prop erties A user formulates a contentbased query by providing examples of ob jects similar to the

ones heshe wishes to retrieve Such a query is internally mapp ed to a featurebased representa

tion query mo del Retrieval is p erformed by computing similarity b etween the multimedia ob ject

and the query based on the feature values and the results are ranked by the computed similarity

values The key technologies provided by mo dern databases that facilitate the implementation of

applications requiring contentbased retrieval are the supp ort for userdened data typ es including

userdened functions and ways to call these functions from within SQL Contentbased retrieval

mo dels can b e incorp orated within databases using these extensibility options the internal struc

ture and content of multimedia ob jects can b e represented in DBMSs as abstract data typ es and

similarity mo dels can b e implemented as userdened functions Most ma jor DBMSs now sup

p ort multimedia extensions either develop ed in house or by thirdparty develop ers that consist

of predened multimedia data typ es and commonly used functions on those typ es includin g func

tions that supp ort similarity retrieval Examples of such extensions include the Image Datablade

supp orted by the Informix Universal Server and the QBIC extender of DB

While existing commercial ob jectrelational databases have come a long way in providing sup

p ort for multimedia applications we b elieve that there are still many technical challenges that need

to b e addressed in incorp orating multimedia ob jects into DBMSs Among the primary challenges

is that of extensible query pro cessing and query optimization Multimedia similarity queries dier

signicantly from traditional database queries they deal with high dimensional data sets for

which existing indexing mechanisms are not sucient Furthermore a user is typically interested

in retrieving the topk matching answers p ossibly progressively to the query Many novel index

metho ds that provide ecient retrieval over highdimensional multimedia feature spaces have b een

prop osed in the literature Furthermore ecient query pro cessing algorithms for evaluating topk

queries have b een develop ed These indexing mechanisms and query pro cessing algorithms need to

b e incorp orated into database systems To this end database systems have b egun to supp ort ex

tensibility options for users to add typ esp ecic access metho ds However current mechanisms are

limited in scop e and quite cumb ersome to use For example to incorp orate a new access metho d

a user has to address the daunting task of concurrency control and recovery for the access metho d

Query optimizers are still not suciently extensible to supp ort optimizing access path selection

based on userdened functions and access metho ds Research on these issues is ongoing and we

b elieve that solutions will b e incorp orated into future DBMSs resulting in systems that eciently

supp ort contentbased queries on multimedia typ es

Acknowledgements

This work was supp orted by NSF CAREER award I IS and in part by the Army Research

Lab oratory under Co op erative Agreement No DAAL

PREPRINT Please dont distribute

References

Swarup Acharya Viswanath Po osala and Sridhar Ramaswamy Selectivity estimation in

spatial databases In Proc ACM SIGMOD Int Conf on Management of Data pages

K Alsabti S Ranka and V Singh An ecient parallel algorithm for high dimensional

similarity join In Proc Int Paral lel and Distributed Processing Symp Orlando Florida

March

P Aoki Generalizing search in generalized search trees In Proc th Int Conf on Data

Engineering pages

Walid G Aref and Hanan Samet Optimization for spatial query pro cessing In Guy M

Lohman Amlcar Sernadas and Rafael Camps editors Proc th Int Conf on Very Large

Data Bases VLDB pages Barcelona Catalonia Spain Morgan Kaufmann

Walid G Aref and Hanan Samet Cascaded spatial join algorithms with spatially sorted

output In Proc th ACM Workshop on Advances in Geographic Information Systems pages

Jerey R Bach Charles Fuller Amarnath Gupta Arun Hampapur Bradley Horowitz Rich

Humphrey Ramesh Jain and Chiao fe Shu The virage image search engine An op en

framework for image management In Proc SPIE volume Storage and Retrieval for

Stil l Image Video Databases pages

Francois Bancilhon Claude Delob el and Paris Kanellakis Building an ObjectOriented

Database System The Story of O The Morgan Kaufmann Series in Data Management

Systems May

A Belussi and C Faloutsos Estimating the selectivity of spatial queries using the correlation

fractal dimension In Proc st Int Conf on Very Large Data Bases VLDB pages

S Berchtold C Bohm D Keim and H P Kriegel A cost mo del for nearest neighb or search

in high dimensional data spaces In Proc th ACM Symp Principles of Database Systems

PODS pages Tucson AZ May

K Beyer J Goldstein R Ramakrishnan and U Shaft When is nearest neighb or mean

ingful In Proc Int Conf Database Theory ICDT pages Jerusalem Israel

R Bliuhute S Saltenis G Slivinskas and C Jensen Developing a datablade for a new

index In Proc th Int Conf on Data Engineering pages

Thomas Brinkho HansPeter Kriegel and Bernhard Seeger Ecient pro cessing of spatial

joins using RTrees In Peter Buneman and Sushil Ja jo dia editors Proc ACM SIGMOD

Int Conf on Management of Data pages Washington DC USA May

ACM Press

John L Bruno Jose Carlos Brustoloni Eran Gabb er Banu Ozden and Abraham Silb er

schatz Disk scheduling with quality of service guarantees In IEEE Int Conf Multimedia

Computing and Systems pages June

PREPRINT Please dont distribute

Michael J Carey and Donald Kossmann On Saying Enough Already in SQL In Proc

ACM SIGMOD Int Conf on Management of Data pages

Michael J Carey and Donald Kossmann Reducing the Braking Distance of an SQL Query

Engine In Proc th Int Conf on Very Large Data Bases VLDB pages

Michal J Carey David J DeWitt Daniel Frank M Muralikrishna Go etz Graefe Jo el E

Richardson and Eugene J Shekita The Architecture of the EXODUS extensible DBMS In

Proc of the Int Workshop on ObjectOriented database systems pages

Chad Carson Megan Thomas Serge Belongie Joseph M Hellerstein and Jitendra Malik

Blobworld A system for regionbased image indexing and retrieval In Proc of the rd Int

Conf on Visual Information Systems pages June

Kaushik Chakrabarti and Sharad Mehrotra Dynamic granular lo cking approach to phantom

protection in Rtrees In Proc th Int Conf on Data Engineering pages February

Kaushik Chakrabarti and Sharad Mehrotra Ecient concurrency control in multidimensional

access metho ds In Proc ACM SIGMOD Int Conf on Management of Data pages

May

Kaushik Chakrabarti and Sharad Mehrotra The Hybrid Tree An Index Structure for High

Dimensional Feature Spaces In Proc th Int Conf on Data Engineering pages

March

Kaushik Chakrabarti Kriengkrai Porkaew Michael Ortega and Sharad Mehrotra Evaluating

Rened Queries in Topk Retrieval Systems Available as Technical Report TRMARS

University of California at Irvine online at httpwwwdbicsuciedupagespublications

July

Donald D Chamb erlin A Complete Guide to DB Universal Database Morgan Kaufmann

July

Donald D Chamb erlin Morton M Astrahan Kapali P Eswaran Patricia P Griths Ray

mond A Lorie James W Mehl Phyllis Reisner and Bradford W Wade SEQUEL A

Unied Approach to Data Denition Manipulation and Control IBM J Research and De

velopment

Sura jit Chaudhuri and Luis Gravano Optimizing queries over multimedia rep ositories In

Proc ACM SIGMOD Int Conf on Management of Data pages

Sura jit Chaudhuri and Luis Gravano Evaluating topk selection queries In Proc th Int

Conf on Very Large Data Bases VLDB pages Edinburgh Scotland

Sura jit Chaudhuri and Kyuseok Shim Optimization of queries with userdened predicates

In Proc nd Int Conf on Very Large Data Bases VLDB pages

E F Co dd A relational mo del of data for large shared data banks Communications of the

ACM

Stefan Desslo ch and Nelson Mattos Integrating SQL databases with contentsp ecic search

engines In Proc rd Int Conf on Very Large Data Bases VLDB pages

Athens Greece IBM Database Technology Institute Santa Teresa Lab

PREPRINT Please dont distribute

Donko Donjerkovic and Raghu Ramakrishnan Probabilistic optimization of topn queries

In Proc th Int Conf on Very Large Data Bases VLDB pages Edinburgh

Scotland

W Niblack et al The QBIC pro ject Querying images by content using color texture and

shap e In IBM Research Report February

Ronald Fagin Combining fuzzy information from multiple systems In Proc th ACM

Symp Principles of Database Systems PODS pages

Ronald Fagin and Edward L Wimmers Incorp orating user preferences in multimedia queries

In Proc Int Conf Database Theory ICDT pages

C Faloutsos M Flo cker W Niblack D Petkovic W Equitz and R Barb er Ecient

and eective querying by image content Technical Rep ort RJ IBM Research

Rep ort Aug

Christos Faloutsos and Ibrahim Kamel Beyond Uniformity and Indep endence Analysis of

Rtrees Using the Concept of Fractal Dimension In Proc th ACM Symp Principles of

Database Systems PODS pages

Christos Faloutsos and Douglas Oard A survey of information retrieval and ltering metho ds

Technical Rep ort CSTR Dept of Computer Science Univ of Maryland

Christos Faloutsos Bernhard Seeger Agma Traina and Caetano Traina Jr Spatial join

selectivity using p ower laws In Proc ACM SIGMOD Int Conf on Management of

Data pages

S Flank P Martin A Balogh and J Rothey Photole A Digital Library for Image

Retrieval In Proc nd Int Conf on Multimedia Computing and Systems pages

M Flickner Harpreet Sawhney Wayne Niblack and Jonathan Ashley Query by Image and

Video Content The QBIC System IEEE Computer Septemb er

YouChin Gene Fuh Stefan Desslo ch Weidong Chen Nelson Mattos Brian Tran Bruce

Lindsay Linda DeMichiel Serge Rielau and Danko Mannhaupt Implementation of SQL

Sturctured Typ e with Inheritance and Value Substitutability In Proc th Int Conf on

Very Large Data Bases VLDB pages Edinburgh Scotland

Hector GarciaMolina Jerey D Ullman and Jennifer Widom Database System Implemen

tation Prentice Hall

Vibby Gottemukkala Anant Jhingran and Sriram Padmanabhan Interfacing parallel ap

plications and parallel databases In Alex Gray and PerAke Larson editors Proc th

Int Conf on Data Engineering pages Birmingham UK April IEEE

Computer So ciety

J Gray and A Reuter Transaction Processing Concepts and Techniques Morgan Kauf

mann San Mateo CA

U Guntzer W Balke and W Kiebling Optimizing MultiFeature Queries for Image

Databases In Proc th Int Conf on Very Large Data Bases VLDB pages

PREPRINT Please dont distribute

Amarnath Gupta and Ramesh Jain Visual Information Retrieval Communications of the

ACM

A Guttman Rtree a dynamic index structure for spatial searching In Proc ACM

SIGMOD Int Conf on Management of Data pages

L M Haas J C Freytag G M Lohman and H Pirahesh Extensible query pro cessing in

starburst In Proc ACM SIGMOD Int Conf on Management of Data pages

J Hellerstein J Naughton and A Pfeer Generalized search trees in database systems In

Proc st Int Conf on Very Large Data Bases VLDB pages Septemb er

Joseph M Hellerstein Practical predicate placement In Proc ACM SIGMOD Int

Conf on Management of Data pages

Joseph M Hellerstein and Michael Stonebraker Predicate migration Optimizing queries

with exp ensive predicates In Proc ACM SIGMOD Int Conf on Management of Data

pages

G Hjaltason and Hanan Samet Incremental distance join algorithms for spatial databases

In Proc ACM SIGMOD Int Conf on Management of Data pages Seattle

WA USA June

MK Hu Visual Pattern Recognition by Moment Invariants Computer Methods in Image

Analysis IEEE Computer So ciety

Thomas S Huang Sharad Mehrotra and Kannan Ramchandran Multimedia analysis and

retrieval system MARS pro ject In Proc of rd Annual Clinic on Library Application of

Data Processing Digital Image Access and Retrieval

Informix Getting Started with Informix Universal Server Version Informix March

Informix Informix Universal Server Datablade Programmers Manual Version In

formix March

Informix Excalibur Image DataBlade Module Users Guide Version Informix

Informix Media Any kind of content Everywhere you need it Informix

Yoshiharu Ishikawa Ravishankar Subramanya and Christos Faloutsos Mindreader Query

ing databases through multiple examples In Proc th Int Conf on Very Large Data Bases

VLDB pages

H Jiang D Montesi and A Elmagarmid Videotext database system In IEEE Proc Int

Conf on Multimedia Computing and Systems pages June

Setrag Khoshaan and AB Baker Multimedia and Imaging Databases Morgan Kaufmann

Publishers San Francisco CA

Won Kim Unisqlx unied relational and ob jectoriented database system In Proc

ACM SIGMOD Int Conf on Management of Data page

Rob ert R Korfhage Information Storage and Retrieval Wiley Computer Publishing

PREPRINT Please dont distribute

Flip Korn Nikolaos Sidirop oulos Christos Faloutsos Eliot Siegel and Zenon Protopapas

Fast nearest neighb or search in medical image databases In Proc nd Int Conf on Very

Large Data Bases VLDB pages

M Kornacker C Mohan and J Hellerstein Concurrency and recovery in generalized search

trees In Proc ACM SIGMOD Int Conf on Management of Data pages

Nick Koudas and K C Sevcik High dimensional similarity joins Algorithms and p erfor

mance evaluation In Proc th Int Conf on Data Engineering pages

Gerald Kowalski Information Retrieval Systems theory and implementation Kluwer Aca

demic Publishers

Charles Lamb Gordon Landis Jack Orenstein and Dan Weinreb The ob jectstore database

system Communications of the ACM Octob er

ChungSheng Li YuanChi Chang John R Smith Lawrence D Bergman and Vittorio

Castelli Framework for ecient pro cessing of contentbased fuzzy cartesian queries In Proc

SPIE volume Storage and Retrieval for Media Databases pages

David B Lomet Key range lo cking strategies for improved concurrency In Proc th Int

Conf on Very Large Data Bases VLDB pages August

Yossi Matias Jerey Scott Vitter and Min Wang Waveletbased histograms for selectivity

estimation In Proc ACM SIGMOD Int Conf on Management of Data pages

C Mohan ARIESKVL A key value lo cking metho d for concurrency control of multiaction

transactions op erating on Btree indexes In Proc th Int Conf on Very Large Data Bases

VLDB pages August

Ap ostol Natsev Ra jeev Rastogi and Kyuseok Shim Walrus A similarity retrieval algorithm

for image databases In Proc ACM SIGMOD Int Conf on Management of Data pages

J Neivergelt et al The grid le An adaptable symmetric multikey le structure ACM

Trans Database Systems TODS March

Surya Nepal and MV Ramakrishna Query pro cessing issues in image multimedia

databases In Proc th Int Conf on Data Engineering pages

ORACLE Oracle i Concepts Release Oracle Decemb er

ORACLE Oracle i interMedia Audio Image and Video Users Guide and Reference

Oracle February

ORACLE Oracle i Visual Information Retrieval Users Guide and Reference Oracle

ORACLE Oracle Data Cartridge Developerss Guide Release Oracle February

ORACLE Oracle Internet File System Developerss Guide Release Oracle May

ORACLE Oracle Internet File System Users Guide Release Oracle April

PREPRINT Please dont distribute

J Orenstein and T Merett A class of data structures for asso ciative searching In Proc rd

ACM SIGACTSIGMOD Symp Principles of Database Systems pages

Michael Ortega Yong Rui Kaushik Chakrabarti Sharad Mehrotra and Thomas S Huang

Supp orting Similarity Queries in MARS In Proc of ACM Multimedia pages

Michael Ortega Yong Rui Kaushik Chakrabarti Kriengkrai Porkaew Sharad Mehrotra and

Thomas S Huang Supp orting Ranked Bo olean Similarity Queries in MARS IEEE Trans

Know ledge and Data Engineering Decemb er

Michael OrtegaBinderb erger Sharad Mehrotra Kaushik Chakrabarti and Kriengkrai

Porkaew WebMARS A Multimedia Search Engine for Full Do cument Retrieval and

Cross Media Browsing In th International Workshop on Multimedia Information Systems

MIS pages Octob er

BerndUwe Pagel Flip Korn and Christos Faloutsos Deating the dimensionality curse using

multiple fractal dimensions In Proc th Int Conf on Data Engineering pages

Ap ostolos N Papadop oulos and Yannis Manolop oulos Similarity query pro cessing using disk

arrays In Proc ACM SIGMOD Int Conf on Management of Data pages

A Pentland RW Picard and SSclaro Photob o ok Contentbased manipulation of image

databases International Journal of Computer Vision

V Po osala and Y Ioannidis Selectivity estimation without the attribute value indep endence

assumption In Proc rd Int Conf on Very Large Data Bases VLDB pages

Kriengkrai Porkaew Kaushik Chakrabarti and Sharad Mehrotra Query renement for

contentbased multimedia retrieval in MARS Proc of the th ACM Int Conf on Mul

timedia pages

Kriengkrai Porkaew Sharad Mehrotra Michael Ortega and Kaushik Chakrabarti Similarity

search using multiple examples in MARS In Proc Int Conf on Visual Information Systems

pages

F Rabitti and P Stanchev GRIM DBMS A GRaphical IMage Database Management

System In Visual Database Systems IFIP TCWG Working Conference on Visual

Database Systems pages

Yong Rui Thomas S Huang Michael Ortega and Sharad Mehrotra Relevance feedback A

p ower to ol for interactive contentbased image retrieval IEEE Trans Circuits and Systems

for Video Technology Septemb er

Gerard Salton and Michael J McGill Introduction to Modern Information Retrieval

McGrawHill Bo ok Company

Thomas Seidl and HansPeter Kriegel Optimal multistep knearest neighb or search In Proc

ACM SIGMOD Int Conf on Management of Data pages

PREPRINT Please dont distribute

Patricia G Selinger Morton M Astrahan Donald D Chamb erlin Raymond A Lorie and

Thomas G Price Access path selection in a relational database management system In

Philip A Bernstein editor Proc ACM SIGMOD Int Conf on Management of Data

pages Boston MA USA May June ACM

Praveen Seshadri EADTs and Relational Query Optimization Technical rep ort Cornell

University Technical Rep ort June

Praveen Seshadri Enhanced abstract data typ es in ob jectrelational databases VLDB Jour

nal August

John C Shafer and Rakesh Agrawal Parallel algorithms for highdimensional proximity joins

for applications In Proc rd Int Conf on Very Large Data Bases VLDB

pages

Kyuseok Shim Ramakrishnan Srikant and Rakesh Agrawal Highdimensional similarity

joins In Proc th Int Conf on Data Engineering pages

John R Smith and SF Chang Integrated spatial and feature image query Multimedia

Systems Journal

John R Smith and SF Chang Visually Searching the Web for Content IEEE Multimedia

Summer

T G Aguierre Smith Parsing movies in context In Summer Usenix Conference Nashvil le

Tennessee pages

J Srinivasan R Murthy S Sundara N Agarwal and S DeFazio Extensible indexing A

framework for integrating domainsp ecic indexing schemes into oraclei In Proc th Int

Conf on Data Engineering pages

M Stonebraker and G Kemnitz The p ostgres nextgeneration database management system

Communications of the ACM

Michael Stonebreaker and Dorothy Mo ore ObjectRelational DBMSs The Next Great Wave

Morgan Kaufman

Michael J Swain Interactive indexing into image databases In Storage and Retrieval

for Image and Video Databases volume pages

Sybase Java in Adaptive Server Enteprise Version Sybase Octob er

Sybase Sybase Adaptive Server Enterprise Version Sybase

Hideyuki Tamura et al Texture features corresp onding to visual p erception IEEE Trans

Systems Man and Cybernetics SMC June

Yannis Theo doridis and Timos Sellis A Mo del for the Prediction of Rtree Performance In

Proc th ACM Symp Principles of Database Systems PODS pages

L Wu C Faloutsos K Sycara and T Payne Falcon Feedback adaptive lo op for content

based retrieval In Proc th Int Conf on Very Large Data Bases VLDB pages