IEEE TRANSACTIONS ON CIRCUITS AND VIDEO TECHNOLOGY

Relevance Feedback A Power Tool for Interactive

ContentBased Image Retrieval

Yong Rui Thomas S Huang Michael Ortega and Sharad Mehrotra

Beckman Institute for Advanced Science and Technology

Dept of Electrical and Computer Engineering and Dept of Computer Science

University of Illinois at UrbanaChampaign

Urbana IL USA

Email fyrui huanggifpuiucedu fortegab sharadgcsuiucedu

Abstract ContentBased Image Retrieval CBIR has b e three main diculties with this approach ie the large

come one of the most active research areas in the past few

amountofmanual eort required in developing the anno

years Many visual feature representations have b een ex

tations the dierences in interpretation of image contents

plored and many systems built While these research eorts

and inconsistency of the keyword assignments among dif

establish the basis of CBIR the usefulness of the prop osed

approaches is limited Sp ecically these eorts haverela

ferent indexers As the size of image rep ositories

tively ignored two distinct characteristics of CBIR systems

increases the keyword annotation approach b ecomes infea

the gap b etween high level concepts and low level fea

sible

tures sub jectivityofhuman p erception of visual content

This pap er prop oses a relevance feedback based interactive

Toovercome the diculties of the annotation based ap

retrieval approach which eectively takes into accountthe

proach an alternative mechanism ContentBased Image

ab ovetwocharacteristics in CBIR During the retrieval pro

Retrieval CBIR has b een prop osed in the early s

cess the users high level query and p erception sub jectivity

Besides using humanassigned keywords CBIR systems use

are captured by dynamically up dated weights based on the

users feedback The exp erimental results over more than

the visual content of the images such as color texture and

images show that the prop osed approach greatly re

shap e features as the image index This greatly alleviates

duces the users eort of comp osing a query and captures

the diculties of the pure annotation based approach since

the users information need more precisely

the feature extraction pro cess can b e made automatic and

Keywords ContentBased Image Retrieval interactive

multimedia pro cessing relevance feedback

Since its ad the images own contentisalways consistent

vent CBIR has attracted great research attention ranging

from government industry to universities

I Introduction

Even ISOIEC has launched a new

ITH advances in the computer technologies and the

work item MPEG to dene a standard

Web there has b een an adventoftheWorldWide

W

Multimedia Content Description Interface Many sp ecial

explosion in the amount and complexity of digital data b e

issues from leading journals have b een dedicated to CBIR

ing generated stored transmitted analyzed and accessed

and many CBIR systems both com

Much of this information is multimedia in nature includ

mercial and academic

ing digital images video audio graphics and text data

have b een develop ed recently

In order to make use of this vast amount of data ecient

Despite the extensive research eort the retrieval tech

and eective techniques to retrievemultimedia information

niques used in CBIR systems lag b ehind the corresp ond

based on its contentneedtobedevelop ed Among the vari

ing techniques in to days b est text search engines suchas

ous media typ es images are of prime imp ortance Not only

Inquery Alta Vista Lycos etc At the early stage

it is the most widely used media typ e b esides text but it is

of CBIR research primarily fo cused on exploring various

also one of the most widely used bases for representing and

feature representations hoping to nd a b est representa

retrieving videos and other multimedia information This

tion for each feature For example for the texture feature

pap er deals with the retrieval of images based on their con

alone almost a dozen representations have b een prop osed

tents even though the approach is readily generalizable to

including Tamura MSAR Word decomp osi

other media typ es

tion Fractal Gab or Filter and Wavelets

Keyword annotation is the traditional image retrieval

etc The corresp onding system design strat

paradigm In this approach the images are rst anno

egy for early CBIR systems is to rst nd the b est rep

tated manually bykeywords They can then b e retrieved

resentations for the visual features Then

by their corresp onding annotations However there are

During the retrieval pro cess the user selects the visual

features that he or she is interested in In the case of

This work was supp orted in part by NSFDARPANASA DLI Pro

multiple features the user needs to also sp ecify the weights

gram under Co op erative Agreemen t in part by ARL Co

op erative Agreement No DAAL in part by NSF CISE

for each of the features

Research Infrastructure GrantCDA Yong Rui was also sup

Based on the selected features and sp ecied weights the

p orted in part byaCSEFellowship the University of Illinois Michael

retrieval system tries to nd similar images to the users Ortega was also supp orted in part byCONACYT grant

IEEE TRANSACTIONS ON CIRCUITS AND VIDEO TECHNOLOGY

query such that the adjusted query is a b etter approximation to

the users information need In the relevance

We refer to such systems as computer centric systems

feedback based approach the retrieval

While this approach establishes the basis of CBIR the

pro cess is interactive between the computer and human

p erformance is not satisfactory due to the following two

Under the assumption that highlevel concepts can b e cap

reasons

tured by lowlevel features the relevance feedback tech

The gap between high level concepts and low level features

nique tries to establish the link b etween highlevel concepts

The assumption that the computer centric approach makes

the users feedback Further and lowlevel features from

is that the high level concepts to lowlevel features mapping

more the burden of sp ecifying the weights is removed from

is easy for the user to do While in some cases the assump

the user The user only needs to mark which images he or

tion is true eg mapping a high level concept fresh apple

she thinks are relevant to the query The weights embed

to low level features color and shap e in other cases this

ded in the query ob ject are dynamical ly up dated to mo del

may not b e true One example is to map an ancientvase

the high level concepts and p erception sub jectivity

with sophisticated design to an equivalent representation

The rest of the pap er is organized as follows Section

using low level features The gap exists between the two

intro duces a multimedia ob ject mo del which supp orts

levels

multiple features multiple representations and their cor

The subjectivity of human perception

resp onding weights The weights are essential in mo deling

Dierent p ersons or the same p erson under dierent cir

high level concepts and p erception sub jectivity Section

cumstances may perceive the same visual content dier

discusses howthe weights are dynamical ly up dated based

ently This is called human perception subjectivity

on the relevance feedback to track the users information

The sub jectivityexistsatvarious levels For example one

need Sections and discuss the normalization pro ce

p erson maybemoreinterested in an images color feature

dure and dynamic weight up dating pro cess the two bases

while another maybe more interested in the texture fea

of the retrieval algorithm Extensive exp erimental results

ture Even if b oth p eople are interested in texture the way

over more than images for testing b oth the eciency

how they perceive the similarity of texture may be quite

and the eectiveness of the retrieval algorithm are given in

dierent This is illustrated in Figure

Section Concluding remarks are given in Section

Among the ab ove three texture images some maysaythat

a and b are more similar if they do not care for the

II The Multimedia Object Model

intensity contrast while others maysay that a and c are

more similar if they ignore the lo cal prop erty on the seeds

Before we describ e how the relevance feedback technique

No single texture representation can capture everything

can be used for CBIR we rst need to formalize howan

Dierent representations capture the visual feature from

image ob ject is mo deled An image ob ject O is repre

dierent p ersp ectives

sented as

In the computer centric approach the b est features

O O D F R

and representations and their corresp onding weights are

xed whichcannoteectively mo del high level concepts D is the raw image data eg a JPEG image

and users p erception sub jectivity Furthermore sp eci F ff g is a set of lowlevel visual features asso ciated

i

cation of weights imp oses a big burden on the user as it

with the image ob ject such as color texture and shap e

requires the user to have a comprehensive knowledge of

R fr g is a set of representations for a given feature

ij

the low level feature representations used in the retrieval

f eg b oth color histogram and color moments are rep

i

system which is normally not the case resentations for the color feature Note that each rep

resentation r itself maybeavector consisting of multiple

Motivated by the limitations of the computer centric ap

ij

comp onents ie

proach recent research fo cus in CBIR has moved to an

interactive mechanism that involves a human as part of

r r r r

the retrieval pro cess Examples in

ij ij ij k ij K

clude interactive region segmentation interactiveim

where K is the length of the vector

age database annotation usage of supervised

In contrast to the computer centric approachs single rep

learning before the retrieval and interactive in

resentation and xed weights the prop osed ob ject mo del

tegration of keywords and high level concepts to enhance

supp orts multiple representations with dynamically up

image retrieval p erformance

dated weights to accommo date the richcontent in the im

In this pap er to address the diculties faced by the

age ob jects Weights exist at various levels W W and

computer centric approach we present a Relevance Feed

i ij

W are asso ciated with features f representations r

back based approach to CBIR in whichhuman and com

ij k i ij

and comp onents r resp ectively The goal of relevance

puter interact to rene high level queries to representations

ij k

feedback describ ed in the next section is to nd the ap

based on lowlevel features Relevance feedbackisapow

propriate weights to mo del the users information need

erful technique used in traditional textbased Information

Further note that a Retrieval systems It is the pro cess of automatically ad query Q has the same mo del as

justing an existing query using the information fedbackby that of the image ob jects since it is also an image ob ject

the user ab out the relevance of previously retrieved ob jects in nature

RUI HUANG ORTEGA AND MEHROTRA

a b c

Fig Sub jectivity in p erceiving the texture feature

The ob jects in the database are ordered by their overall III Integrating Relevance Feedback in CBIR

similarityto Q The N most similar ones are returned

RT

An image ob ject mo del O D F R together with a set

to the user where N is the numb er of ob jects the user

RT

of similarity measures M fm g sp ecies a CBIR mo del

ij

wants to retrieve

D F R M The similarity measures are used to deter

For eachof the retrieved ob jects the user marks it as

mine how similar or dissimilar twoobjects are Dierent

highly relevant relevant noopinion nonrelevantorhighly

similarity measures may b e used for dierent feature repre

nonrelevant according to his information need and per

sentations For example Euclidean is used for comparing

ception sub jectivity

vectorbased representations while Histogram Intersection

The system up dates the weights describ ed in Section

is used for comparing color histogram representations

according to the users feedbacksuch that the adjusted

Based on the image ob ject mo del and the set of similarity

Q is a b etter approximation to the users information need

measures the retrieval pro cess is describ ed b elow and also

Go to Step with the adjusted Q and start a new

illustrated in Figure

iteration of retrieval

Initialize the weights W W W W toW which

i ij ij k

eights That is every entity is initially

is a set of nobias w . . . O . . . Objects of the same imp ortance

f1i . . . f Features

W W

i i

I

W

W r . . . r r . . . r Representations ij

ij 11j1 i1 ij

J i

WW11k 1jk WW i1k ijk

Similarity measures

W W ij k

ij k r11 . . . r 1j ri1 . . . rij K ij Representations

WW11 1j WWi1 ij

where I is the numb er of features in set F J is the number

i

of representations for feature f K is the length of the ij

i f 1i . . . f Features

presentation vector r ij

W1 Wi

The users information need represented bythe query

Q is distributed among dierent features f accord

ob ject Q Querys

i

ing to their corresp onding weights W

i

Within each feature f the information need is further

Fig The retrieval pro cess

i

distributed among dierent feature representations r ac

ij

cordingtotheweights W In Figure the information need emb edded in Q ows

ij

The ob jects similarity to the queryinterms of r is up while the contentofO s ows down They meet at the

ij

calculated according to the corresp onding similaritymea dashed line where the similarity measures m are applied

ij

sure m and the weights W

to calculate the similarity values S r s between Q and

ij ij k

ij

O s

S r m r W

ij ij ij ij k

Following the Information Retrieval theories

the ob jects stored in the database are considered ob

Each representations similarity values are then com

jective and their weights are xed Whether the query is

bined into a features similarityvalue

X

considered objective or subjective and whether its weights

W S r S f

ij ij i

can b e up dated distinguishes the prop osed relevance feed

j

back approach from the computer centric approach In

the computer centric approach a query is considered ob

The overall similarity S is obtained bycombining indi

jective the same as the ob jects stored in the database

vidual S f s

i

X

and its weights are xed Because of the xed weights

W S f S

i i

this approach can not eectively mo del high level concepts i

IEEE TRANSACTIONS ON CIRCUITS AND VIDEO TECHNOLOGY

and human p erception sub jectivity It requires the user to is the representation vector for image m where K is the

sp ecify a precise set of weights at the query stage which length of vector V r If we put all V s into a matrix

ij m

is normally not p ossible On the other hand queries in form wehavea M K matrix

the prop osed approach are considered as subjective That

V V m M k K

mk

is during the retrieval pro cess the weights asso ciated with

the query can b e dynamical ly up dated via relevance feed

th

where V is the k comp onentinvector V Now the

mk m

back to reect the users information need The burden of

th

k column of matrix V is a lengthM sequence Denote

sp ecifying the weights is removed from the user

this sequence as V Our goal is to normalize the entries

k

Note that in the prop osed retrieval algorithm both S

in each column to the same range so as to ensure that

and S f are linear combinations of their corresp onding

i

each individual comp onent receives equal emphasis when

lower level similarities The basis of the linear combina

calculating the similaritybetween twovectors One wayof

tities tion is that the weights are prop ortional to the en

normalizing the sequence V is to nd the maximum and

k

relative imp ortance For example if a user cares twice

minimum values of V and normalize the sequence to

k

as much ab out one feature color as he do es ab out an

as follows

other feature shap e the overall similaritywould b e a lin

V min

mk k

V

ear combination of the two individual similarities with the

mk

max min

k k

weights b eing and resp ectively Furthermore

where min and max refer to the smallest and the biggest

k k

b ecause of the nature of linearity these twolevels can b e

values in the sequence V Although simple this is not a de

k

combined into one ie

X X

sirable normalization pro cedure Lets consider a sequence

S W S r

ij ij

f g If we use Equation to nor

i j

malize the sequence most of the range will b e taken

awayby a single entry and most of the information

where W s are now redened to b e the weights bywhich

ij

in f g is warp ed into a very narrow range

the information need in Q is distributed directly into r s

ij

A b etter approach is to use the Gaussian normalization

Note that it is not p ossible to absorb W into W since

ij k ij

Assuming the sequence V to b e a Gaussian sequence we

the calculation of S r can be a nonlinear function of

k

ij

compute the mean and standard deviation of the

W s such as Euclidean or Histogram Intersection

k k

ij k

sequence We then normalize the original sequence to a

In the next two sections we will discuss two key com

N sequence as follows

p onents of this retrieval algorithm ie normalization and

weight updating

V

mk k

V

mk

IV Normalization

k

In the retrieval algorithm describ ed in the previous sec

It is easy to prove that after the normalization according

tion we have assumed that the similarity values of each

to Equation the probabilityofanentrys value b eing

representation S r s are of the same dynamic range

ij

in the range of is If we use in the de

k

say from to Otherwise the linear combination of

nominator according to the rule the probabilityofan

S r s to form S Equation b ecomes meaningless

ij

entrys value b eing in the range of is approximately

One S r may overshadow the others just b ecause its

ij

In practice we can consider all the entry values to

magnitude is large For the same reason when calculat

b e within the range of by mapping the outofrange

ing S r s the vector comp onents r s should also b e

ij ij k

values to either or The advantage of this normaliza

normalized b efore applying the similarity measure m We

ij

tion pro cess over Equation is that the presence of a

refer the normalization of r s as intranormalization and

ij k

few abnormally large or small values such as the en

the normalization of S r s as internormalization

ij

try in the example sequence does not bias the imp ortance

of a comp onent r in computing the similarity between

ij k

vectors

A IntraNormalization

This normalization pro cess puts equal emphasis on each

B InterNormalization

comp onent r within a representation vector r To

ij k ij

Intranormalization pro cedure ensures the equal empha

see the imp ortance of this note that dierentcomponents

a representation vector sis of each comp onent r within

ij k

within a vector may b e of totally dierentphysical quanti

r On the other hand the internormalization pro cedure

ij

ties Their magnitudes can vary drastically thereby biasing

ensures equal emphasis of each individual similarityvalue

the similarity measure

S r within the overall similarityvalue S

ij

To simplify the notations dene V r That is ev

ij

Dep ending on the similarity measure m used the val

ij

r will go through the following ery representation vector

ij

ues of S r s can be of quite dierent dynamic ranges

ij

normalization pro cedure

In order to ensure that no single S r will overshadow

ij

Assume there are M images in the database and let m

the others only b ecause it has a larger magnitude inter

b e the image id index then

normalization should be applied This pro cedure is sum

V V V V V V marized as follows

m m m mk mK

RUI HUANG ORTEGA AND MEHROTRA

For any pair of images I and I in the image collection A Update of W interweight

m n ij

compute their similarity S r

mn ij

The W s asso ciated with the r s reect the users dif

ij ij

ferent emphasis of a representation in the overall similarity

S r m r W

mn ij ij ij ij k

The supp ort of dierentweights enables the user to sp ecify

his or her i