A New Data Mo del

PersistentAttributeCentric Ob jects

Terry Jones Ricardo A BaezaYates

Distributed Cognition HCI Lab Dept of Computer Science

Cognitive Science Dept University of Chile

Univ of California Blanco Encalada

San Diego La Jolla Santiago Chile

CA USA rbaezadccuchilecl

terryclisucsdedu

Gregory J E Raw lins

Dept of Computer Science

Indiana University

Blo omington

IN USA

rawlinscsindianaedu

June

Abstract

Trying to nd information on the Web is like trying to nd something at a jumble sale its

fun and you can make serendipitous discoveries but for directed search its b etter to go to a

department store there someone has already done most of the arranging for you Unfortunately

the Webs continuing explosion in size its enormous diversity of topics and its great volatility

make unaided human indexing imp ossible

This problem is just a sp ecial case of the general problem of organizing information to create

knowledge A similar problem arises on the desktop when dealing with le systems where users

must searchby name and often they do not rememb er the les name or lo cation File names

are artifacts of current op erating systems but human understanding neither requires ob jects

to b e named nor do es it have problems with multiple ob jects sharing prop ertiesnames for

instance

The limitations mentioned ab ove result b ecause to days computer systems do not analyze

the les they are asked to store Instead they note only simple attributes like creation date and

le typ e and leave the bulk of organizing naming annotating and nding those les to their

users This mayhave made sense in the s when computers were slow and exp ensive but it

makes no sense to day

We argue for a new data mo del to information representation based on the use of p ersis

tent ob jects with dynamic attributes and search op erations over them This representation is

The work of the rst author was supp orted by Chile Fondecyt Grant and the second author byDARPA

Contract NC and by a grantfromIntel

organizationneutral thereby giving a exible substrate for anyone to build multiple simultane

ous organizations In addition it is uniform allows easy sharing of ob jects and can b e simply

extended to the Web We present an initial prototyp e of this idea AVS and to showitspo

tential two system prototyp es that have as their main goal to organize p ersonal information

DomainView and KnownSpace

Intro duction

Storing and organizing information is the kernel of any computer application The widespread use

and exp onential growth of the World Wide Web as well as other data sources has had a crucial

impact on the problem of managing p ersonal information Currently the abstraction of les and

folders is ubiquitous as the main way to organize and store do cuments This abstraction however

arose when computer resources were exp ensive which is not true nowadays

The real limitations of any information system are not storage space or computing p ower They

are the narrow communication pip elines b etween the computer and the human user typically

through a screen and b etween the users eyes and brain In that sense any information system

should not dep end on the abilities of the user it should try to represent information as closely

as p ossible to the users mo del of it and anyinternal representation should b e transparenttothe

user However the paradigms that are used to day do not satisfy any of these goals Why One

main reason is that the historical developmentofsoftware has forced the user to b e aware of many

unnecessary things

The question is how to organize p ersonal information and how to present it to the user in a

meaningful way through a simple but eective user interface Any system for managing information

implemented either on top of a le system or a database is limited by the underlying data mo del

of the storage mechanism On the other hand if we write down as the goals of the system what

functionality the underlying data mo del should provide we do not necessarily obtain a known data

mo del

We can think of information as organized data which has to b e comprehended and managed

So there are two main problems to b e solved how to organize data and how to communicate to

a p erson that data through a user interface In this pap er we present a new approach for the rst

problem a new mo del of the storage representation and organization of information based on

what we call persistent attributecentric objects PACO PACO is based on

storing data in ob jects having attributes and values in a uniform way eachhaving a dynamic

structure

b eing able to organize ob jects in collections which are also dynamic and indep endentofthe

storage mechanism through p owerful search capabilities

We present our vision of how to manage information relative to our data mo del in That pap er

also discusses a few user interfaces

This pap er is organized as follows We rst present some of the motivations b ehind this pap er

which are the basic functions needed to build ecient user interfaces for managing information

together with a list of the main limitations of to days le systems and databases Next we present

the main concepts b ehind the new data mo del and how they can b e used to manage information

clearly separating the storage representation and organization layers of the data b eing handled

We also compare our data mo del with previous work and discuss some of the main dierences and

consequences of it Next we show a sp ecic generic prototyp e of this data mo del AVS the Attribute

Value System which maintains a collection of ob jects comp osed solely of attributevalue pairs

and which provides facilities for creating altering and lo cating these ob jects This simple substrate

provides exibility in the representation of information emphasizes the role of search generalizes

hierarchical le systems provides for the dynamic construction of arbitrary data structures and

interob ject relationships emphasizes the distinction b etween informational ob jects and structures

that contain and organize them facilitates multiple views of the same information and provides a

novel form of ob ject ownership

To show the p otential of our mo del we summarize the functionalityoftwo systems to manage

p ersonal information DomainView and KnownSpace Although they were develop ed with

out using the data mo del prop osed their essence is based on it In fact the main ideas of this

data mo del were develop ed indep endently by the three authors and later were unied into a single

view We nish by presenting work in progress of the authors and the consequences of our vision

of organizing information with resp ect to programming user interfaces and the Web Since we

change basic assumptions wehave to start from basic principles and levels whichmayseemnaive

for some readers or very radical for others

Motivation Organizing Information

Web search engine queries now often return millions of irrelevant pages Those pages are not

spatially arranged collected by topic or distinguished in anyway other than by their titles so

wehave no idea of the relevance of any page b efore reading it The same is true for mail and

news After wesavewebpages mail messages news articles ftp pages or any pages pro duced

with an editor or any other application those pages then b ecome lost on our desktops They are

not analyzed in anyway group ed according to our interests or laid out spatially to show their

similarities to other pages already there

Even after we organize the pages on our desktops by hand there is no automation to help us

reorganize them search them navigate through them or nd more pages like them In short our

computers dont help us manage our own data and as we discuss in the next section wemust do

that task ourselves by organizing information in a le system

Hierarchical le systems are valid as an internal storage mechanism for op erating systems but

the user need not b e aware of them Many users do not even understand this concept and typically

use applications that each put all their les in one directory a at hierarchy Not all information

can b e classied as a tree there are many other ways to classify a group of ob jects and many

hierarchical ways to classify the same information

Let us start over without assuming anything ab out computer resources What mightbethe

rightway to store and organize information Which functionality should a data mo del provide to

b est help users manage their information Wegive one p ossible answer to this question

To help make our claims concrete we present them in the context of aiding the developmentof

two prototyp es that organize p ersonal information They are DomainView DV and KnownSpace

KS DomainView is a desktop user interface that organizes information in domains chosen by

the user using a uniform and simple interface The interface is based on retrieval of do cuments by

attribute values including the content in a at and dynamic universe of domains The desktop

is tailored to and by a user who creates his or her own do cumentdriven and knowledgedomain

world KnownSpace is an adaptive visual and autonomous information manager of all of a

users information whether that information is data or programs and whether it originates on the

Web via mail news ftp an editor or any other application It is not a search engine browser

desktop or op erating system although it shares elements of all four programs

Both prototyp es manage information using collections of ob jects eachobjecthaving attributes

with values In our systems ob jects are also called entities pages or do cuments and collections

are also called clusters or domains In b oth DV and KS the basic data relationship is the collection

of ob jects While interacting with the DVorKSinterface the user maywanttodoanyofthe

following

navigate through a space of ob jects

searchforvarious kinds of ob jects

organize ob jects into groups and regroup ob jects already in groups

markup various ob jects either singly or in groups and edit or delete markups and

browse create and delete ob jects

Note that these groupings are not mutually exclusive A single ob ject can b e in several dozen or

more dierent groupings of ob jects simultaneously The user maycho ose to view any of these

groupings at any time

To supp ort these actions the interface needs to b e able to p ose the following queries to ob jects

of the underlying data mo del

do you have attribute X do your attributes satisfy prop ertyX

are you an attribute of any ob ject which ob jects are you an attribute of

what collections do you b elong to do you b elong to collection X

what are your readablemo diable attributes

are you lo cked at the moment is your attribute X lo cked at the moment

can I readwrite your attribute X can I adddelete your attribute X

and the following requests

give me readapp endmo dify access to all the ob jects with prop ertyX

tell me the value of your attribute X

change the value of your attribute X to Y

add attribute X to yourself

delete your attribute X

tell me when your attribute X changes

tell me when your attributes no longer satisfy prop ertyX

attach the following new or old ob ject to yourself

delete ob ject X from yourself

give me a list of all the ob jects attached to you whose attributes satisfy prop ertyX

delete all ob jects attached to you whose attributes satisfy prop ertyX

Note that we dont really need all these requestssome can b e expressed in terms of others

The only waywehave found to supp ort all of these abilities is to use the exible data mo del

outlined in this pap er Anyobjectmayhaveany other ob ject of whatever typ e as an attribute

Further those attributes may in turn haveyet more attributes of their own and so on recursively

This underlying data mo del gives b oth DV and KS the ability to do arbitrarily sophisticated things

in their interfaces eachofwhichmay b e replaced by another at runtime However presently this

comes at great cost since to maintain maximum exibility our current implementation is quite

slow This is the single biggest issue to b e worked on next and by formalizing the mo del we make

one more step on that direction In the next section we discuss why current data mo dels do es not

allow easy implementation of the capabilities ab ove

What is Wrong with File Systems and Databases

File systems and databases have the same nal goal to store and organize information However

in b oth cases they trade exibility of data organization for access sp eed and reduced space used

Although more recent database mo dels ob ject oriented multimedia are more generic they still

have limitations For example typically ob ject structure is static and cannot change at run

time Other limitations are the lack of a standard query language for example SQL in relational

databases and the p ower of the query language itself File systems on the other hand are even

more rigid and we will concentrate on this issue as our prop osal aims to replace b oth le systems

and databases which in fact should b e the same thing In fact the nal goal should b e to havethe

capabilities of le systems as well as all database mo dels without the current problems arising when

we use dierentdatatyp es or if we try to integrate two dierent database mo dels In particular

wewould liketohave at the same time all the go o d things of relational databases and of fulltext

databases

A collection of named les of information lo cated in a hierarchical le system HFS is p erhaps

the most ubiquitous feature of mo dern op erating systems Virtually everyone who sits in frontof

a computer stores information in les and places these les within a HFS As well as explicitly

storing information in les we store information in the hierarchical structure itselfmainly in le

namesand rely on our memory to maintain information ab out what we create Our applications

also op erate within this framework often enco ding information in sp ecic formats within les

This working environment is so widespread that it is easy to forget that computational systems

were once without the luxury of convenient information containers les and a structure in whichto

place them directories It has b ecome hard to imagine an op erating system without the familiar

backdrop of les and folders However although there are o ccasions when it makes go o d sense

to store information in a hierarchy of named les b eing obliged to do so is a burden when the

information b eing stored do es not t this paradigm or has little or no relevance to the rest of the

hierarchic structure

The semantics of a do cumentle dep ends on its use It could dep end on information ab out

its creation and denition a sp ecic context asso ciations structure and of course its content

Some prop osals like semantic le systems try to cover all these views using several approaches

A subtle assumption is that a do cument has contentaswell as metadata that is attributes with

valueswhich are data related to the content This asymmetry is based on our physical abstraction

of a do cument but it is not necessary for the storage mechanism A do cument could b e just a set

of attributes and values only one of them b eing the content For dierent applications dierent

attributes will b e more imp ortant than others but intrinsically there is no reason to separate data

and metadata

There are several studies ab out how users organize and retrievedocuments in a desktop metaphor

Retrieval is usually based on lo cation that is where the do cument is on the screen contentby

using a le searching to ol history which le was b eing used b efore in an applicationthis is

known as reminding and in most cases by name that is where it is stored in the HFS known

as archiving In all cases we are relying on the users memory of past events p ositions and

names There is no real mo del for the users actions

Although searching bycontent is closer to a semantic search it to o can return nonrelevantdoc

uments b ecause in a dierentcontext the same searchkeywords are valid these are problems with

p olysemy and synonymy This problem can b e partially solved by adding additional information

for example if weknow that the do cument is not to o old and is small However this kind of fuzzy

information usually cannot b e used and even when it can there is a lack of go o d systems that

integrate searchbycontent text databases and searchby attribute values relational databases

Part of the problem is due to the asymmetry mentioned b efore

Users accept HFSs b ecause they are not hard to understand they resemble physical archiving

andmainlyb ecause there is no alternative However HFSs haveseveral disadvantages First

they try to simultaneously solve dierent problems storing representing and organizing infor

mation Second they rely on the users memory b ecause every le and folder has to b e named

and named consistently Third the only semantic information ab out a le is given by the name

path to it which could have b een named by another p erson or without enforcing a sp ecic naming

strategyFourth naming is not easily scalable typically there are limitations on how long names

can b e and what symbolscanbeusedaswell Fifth many les could conceptually b elong to more

than one folder and although feasible in some systems using symb olic links or their equivalent

it is to o awkward to b e done routinely by most users In summarywe often have problems nding

a sp ecic le b ecause we do not rememb er where it is and what it is named

Sp ecic problems with the HFS usually include the following

Files can only live in one HFS Although UNIX symb olic links and roughly equivalent mech

anisms in other op erating systems can b e used to p oint at ob jects in another le system les

actually live in a single le system These mechanisms are really just stopgap attempts to x

a fundamental problem They create problems of their own since they essentially maintain a

copy of a name of a le that is elsewhere Problems arise when the destination le disapp ears

or moves and attempts to deal with these issues are exp ensive Not dealing with these issues

as in some typ es of UNIX leaves dangling symb olic links another problem

A HFS is the only organizational structure in which a le can app ear and every le must

b e in a HFS It is not p ossible to natively maintain a set of les in for example a priority

queue or a linked list or other organizational structure Of course this illusion can b e created

with external programs and interfaces but the underlying storage is still a HFS A le cannot

b e on disk but not in a HFS Thus there is no nativeprovision for the organization of les

other than in a hierarchical tree

Files cannot b e annotated without b eing disturb ed File formats and le and directory

p ermissions often make it imp ossible to annotate existing les Permissions will typically

prevent a users from mo difying another users les and sp ecic le formats often makeit

imp ossible to read or alter an existing le using anything but a single application whichmay

not b e available or even known to a user wanting to annotate a le

A single organization of les into directories must b e chosen The problem with cho osing the

directory structure ahead of time is that very often ones rst choice of organization will not b e

the b est Once made the directory structure in use can b e awkward to change Reorganizing

large collections of les and directories into an alternate organization is not a wellsupp orted

activity and is fraught with dangers Once done there is no supp ort for reverting to the

former organization

It can b e dicult to nd les either by name or content or esp ecially b oth UNIX for

example stores le information in three lo cations ino des p ermissions size dates etc

directory les le name and the les themselves content This organization makes it

awkward to for example simultaneously search for les by name and content as dierent

to ols to read these dierent sources of information must b e invoked and their results somehow

combined While this at least can b e done it is very awkward

The hierarchical structure can only contain les A hierarchical organization structure can b e

appropriate for many tasks but a HFS only makes this structure available for the organization

of les It is not p ossible to maintain a hierarchy of desktops or databases unless information

representing these complex ob jects is somehowencodedinto and deco ded from les The HFS

do es not separate the logical organization of les into a hierarchy from the ob jects les that

it organizes into this structure With a separation b etween these two hierarchies of other

ob jects could b e supp orted

The use and organization of an HFS relies heavily on our brains People typically maintain a

tremendous amount of information ab out a HFS in their heads For example we know what

kinds of names we tend to use for les and that email corresp ondence can b e found in a top

level directory called mail Users are often forced to deal with le name extensions whose

meaning the are exp ected to rememb er Users are exp ected to rememb er the structure of

the hierarchies they create as well as asp ects of the underlying system hierarchy and often

equally idiosyncratic hierarchies created by other users

File names must b e chosen and rememb ered A big part of working with a HFS requires users

to constantly b e cho osing and rememb ering le names Conventions arise and must also b e

rememb ered

File lo cations must b e chosen and rememb ered As well as rememb ering names and naming

conventions users must rememb er p ositional information ab out the placement within the tree

of les and directories

Further symptoms of the problems with the hierarchical structure found in to days op erating

systems are the very presence of symb olic links magic numb ers used to indicate le content

typ e socalled invisible les that follow a naming convention that applications can b e

separately written to optionally ignore and the very frequent enco ding of information ab out

le content in le names All of these mechanisms were afterthoughts designed to get around

problems inherent in the original underlying organizational design

Note that some of these problems are also valid for databases

Adding to the restrictions inherent in the HFS to days applications and user interfaces provide

almost nothing to help the user deal with these problems There is typically little or no supp ort for

deducing p otential names or lo cations of les for retrieval based on content rather than name for

categorizing existing les into more useful organizational structures for navigation other than the

onedimensional up and then down walk through a hierarchy for helping to reorganize hierarchical

structures or to supp ort multiple simultaneous organizations or for anything but the most primitive

notions of history of user b ehavior

In short b eing forced to op erate on les with coarse ownership often holding data in xed

formats all organized within a single hierarchical tree presents a wide range of problems Making

matters worse applications and user interface software typically oer virtually nothing to help

alleviate these problems

A New Data Mo del

We can distinguish three dierentlayers in a data mo del data storage physical data represen

tation logical and data relationships organizational In most data mo dels used to day suchas

a data structure implemented in a programming language a HFS or a relational database those

three layers are closely b ound together for dierent reasons eciency consistency etc We b elieve

that this binding restricts the way programmers designers and users can manipulate data and it

forces early design decisions that aect whole systems and even as weve seen user interfaces

Our data mo del explicitly separates these three layers First we do not imp ose any condition

on how ob jects should b e stored as that dep ends on technology We only want our ob jects to

be persistent In fact AVS uses a relational database to store ob jects DV uses a HFS to store

do cuments and KS uses Java ob jects to store entities Moreover our mo del allows ob jects to b e

distributed and it makes that transparent to the programmer and to the user

The second layer of our data mo del the representation layer is crucial for it is that which

presents objects with dynamic attributes and values In normal programming ob jects have a xed

numb er of attributes and dynamic values In most OO programming languages ob jects with more

attributes can b e created through inheritance or delegation In our case we go one step further we

can add and delete attributes to an existing ob ject at runtime as in prototyping languages like

Self This means that we can also add and delete attributes to a collection of ob jects Formally

an ob ject in our mo del is

A set of attribute instances whichmaybeempty

Each attribute instance is a value and an ob ject either or b oth of whichmay b e empty

Avalue can b e any predened atomic data typ e or it can b e a reference to an ob ject

Hence our ob jects have identity as in OO programming attributes have names and values are

typ ed An attribute may contain an ob ject b ecause wewanttohave attributes of attributes and

so on As attributes are dynamic ownership and p ermissions should b e on attribute instances and

not on ob jects The same is true for concurrency and lo cking is also done on attribute instances

This representation choice has some fundamental consequences on the relationship layer In

particular it do es not restrict the relationships to compile time or to tables in a database etc

By adding attributes we can create any and many relationships b etween dierent groups of ob jects

at any time For example the same data can simultaneously app ear in a search tree and a hash

table simply by manipulating attributes This is dicult to do in a normal programming language

To stress this idea we allow search capabilities on attributes and their values such that wecan

dynamically create a set of ob jects satisfying anygiven attributevalue query Consequentlythe

representation layer gives p ower and exibility to the relationship layer

Formallywewant supp ort for the following typ e of queries and op erations on ob jects

Search for ob jects having a given attribute or with an attribute value satisfying a certain

prop erty The prop erty will dep end on the value typ e and could b e a range if it is a number

or a regular expression if it is a string and so on We do not imp ose any restriction on what

this prop erty can b e

Same as b efore but with any level of recursion suchthatwe can query on subattributes and

so on

Op erations on sets of ob jects union intersection and subtraction

The addition or removal of attributes or sets of them to sets of ob jects

Although a PACO system might use a backend relational database as a storage to ol it is

not itself a database The data mo del weadvo cate prescrib es only ob ject structure attributes

and values It is indep endent of ob ject contentandsays nothing ab out information organization

A relational database do es just the opp osite The design of a set of database tables is heavily

dep endent on a priori knowledge of content Given this knowledge of content a database implements

a single organization of the data The prior conditions and the result are quite dierent Our

data mo del contains no a priori view of how information should b e organized and p ermits many

simultaneous organizations but a database is an explicit implementation of a single view of the

organization of given information This is notwithstanding database research on materialized views

since for the most part they are merely subsets of the database and not truly new extensions of

the data However recentwork on adapting traditional relational databases to the exigencies of

the web is increasing along the lines of evolvable materializedviews

An imp ortant part of our data mo del is that ob jects have no sp ecial piece of contenttowhich

attributes are attached Although this has b een done in the past see the discussion of related

work it seems to b e a move that has b een ignored by more recentwork We b elieve this is an

imp ortant step b ecause

It allows ob jects that have no content but other useful attributes Such ob jects might for

example hold attributes that serve to organize other ob jects

Ob jects mayhavemany pieces of content For example dierent translations of the same

do cument all equally imp ortant or various versions of a do cument or program

Attributes may exist b efore the content do es For example notes byvarious p eople regarding

a meeting may collect as attributes of an ob ject b efore the minutes of the meeting the main

content is available and added to the ob ject as the value of yet another attribute

Attributes may exist after the contentgoesawayAttributes may summarize a large ob ject

such as an image or b o ok It should b e p ossible to remove the original content and not have

all attributes suddenly vanish to o

The fo cus of an ob ject maychange The content which provokes the creation of a new ob ject

mayover time b ecome less imp ortant than some other attribute in the same ob ject By not

treating the original material in a sp ecial fashion there is no problem with changing fo cus

It increases uniformity of storage and access Sp ecial contentdoesnothavetobestoredor

accessed dierently from other information in the ob ject This means that awkward situations

requiring the use and combination of multiple to ols to access the same conceptual ob ject do

not arise

It is in accord with our general aims of providing more freedom to p eople who end up using

our system

While it is true that some of these problems can b e avoided in a system that adds attributes to a

sp ecial content ob ject by simply leaving that ob ject empty that approach is less attractive Under

that approach everyone using the system must deal with the decision to always have a sp ecial

content ob ject around which attributes aggregate Instead with no sp ecial content systems that

require all ob jects to have suchcontent can simply add it as an attribute and treat it sp ecially

Systems that do not require this do not inherit the obligation to deal with this irrelevant to them

content ob ject

Comparison with Related Work

Using attributes as a mechanism of avoiding the diculties of les or simply as a exible storage

layer is an approach that has b een taken in several earlier pro jects We can classify the ma jority

of related work by the characteristics describ ed in the following sections

Ob ject Persistence

Persistent ob jects are not new They are the core of ob ject oriented databases which tipically

are implemented over the standard le system Another related topic is p ersistent programming

languages where everything p ersist from one run to the next storing the state of a program in

secondary memory The main dierence of our prop osal to standard ob ject oriented databases is

that ob jects have dynamic attributes That changes completely how they should b e stored and also

how the database can b e queried This dierence also applies to p ersistent programming languages

In addition our prop osal replaces b oth a database and a le system layer It is a diiferent paradigm

to access information

Adding Attributes to Existing Ob jects

The addition of attributes to sp ecial ob jects Placeless Do cuments do cuments Tap estry mail

messages news articles Semantic File Systems les SHORE les the BeOS

File System les the Synopses les

Our data mo del do es not treat as sp ecial any preexisting ob ject such as a do cument or le

or other content This approachwas taken earlier in work on Entities and Self but is a

departure from more recentwork in which attributes were added to sp ecial ob jects as mentioned

ab ove A discussion on this can b e found in the section on our data mo del

Our ob jects could b e implemented using asso ciative arrays as in AWK or p erl or string hashing

tables in Java However in these contexts implementing the search capabilities of our mo del would

b e extremely inecient On the other hand ob ject p ersistence is related to p ersistent programming

languages and ob ject oriented databases

Our work is closely related in spirit to the Placeless Do cuments pro ject at XeroxPARC

That pro ject has a wide range of aims for building do cument management systems based on

attributes and search

Emphasis on Automated Pro cesses

Some related work on attributes emphasizes automated pro cesses which are used to gather infor

mation and incorp orate it into the system as attributes either from outside sources or by lo oking

at lo cal ob jects mainly les Semantic File Systems Harvest

As well as allowing for the automated addition of attributes to ob jects esp ecially in KnownSpace

our work also explicitly emphasizes the imp ortance of ad ho c addition of attributes to ob jects par

ticularly by normal users interacting with information through graphical interfaces In this it has

similarities with the Prestos Vista browser with a distinct fo cus on enabling the end user to

interact in a exible fashion with information by means of attributes

Applications Versus Platforms

Very often the move to use attributes is made as a means to an end building a single application

It is clear that information pro cessing via attributes is a exible and p owerful approach but it

has not resulted in the separation into an attributebased information system that may b e used to

build multiple applications Exceptions to this rule are Entities and the BeOS le system

We b elieve it is time that the p ower and exibility of the attributebased approachwas separated

from any given application and taken down to a more fundamental level in computational systems

In this way the advantages of the approachwillbeavailable to any application that cares to build

on this foundation This seems to have b een an aim in the BeOS le system BFS the NT

le system and others The AVS system describ ed in this pap er will eventually aim for a low

level implementation such as that taken with the BFS though the aims of the BFS implementers

seem to have b een less broad directed mainly towards allowing attributes for a small number of

applications eg a mail reader using a small numb er of attributes An imp ortant dierence is

our moveaway from monolithic le ob jects with coarse ownership to which attributes are attached

Instead we use PACOs p ersistent ob jects comp osed solely of attributes and values

Tuples

Our work has similarities with Linda and its derivatives Java Spaces from SUN and T Spaces

Both present a exible p ersistent forum for communications amongst applications and the

ob jects involved have a uniform representation that is not based on any preexisting content The

fo cus of these tuple systems is centered on communication probably reecting the initial fo cus of

Linda The AVS system describ ed in this pap er also provides a p ersistent forum for this kind of

interapplication communication though that is not its main fo cus We are more concerned with

exible p ersistent information representation and organization

An Instance of Our Mo del Attribute Value Systems

AVS is a prototyp e implementation of the ab ove data mo del Its explicit aim is to provide a

convenient and exible storage substrate and hence computational environment Manyofthe

problems that AVS seeks to address stem from the diculties inherent in a HFS discussed earlier

A traditional HFS provides just one organizational structure for information and mo dern op erating

systems insist that we use it as a basis for storage The ob ject storage les and organization a

hierarchy are tightly b ound While it is clearly p ossible to use this base for many things there are

imp ortant common op erations that are very awkward AVS is designed to make these op erations

simple

Conceptual Mo del

AVS attempts to provide a exible information managementenvironment for programmers users

groups of user and applications running on their b ehalf It do es this by providing a very general

form of ob ject to hold information and a exible attributebased metho d of creating relationships

between these ob jects AVS is based on Ob jects comp osed solely of attributevalue pairs

Editing of these attributes and values and Ob ject relationships built via assigning attributes to

ob jects and discovered via subsequent search These mechanisms generalize the HFS and provide

a natural basis for dealing with information

There is no a priori set of attributes that ob jects contain Ob jects by virtue of their attributes

may exist in many organizational structures eg on graphs on spreadsheets on a desktop in

hierarchies simultaneously and exist there for real not just as a copy of the name of an ob ject in

a distant structure which then has to b e used to fetch the actual ob jectsupp osing it still exists

The search system provides access to attribute names and values equallyallowing searches for

ob jects based on their prop erties Ob jects may b e annotated by the addition of attributes without

disturbing the original content

The remainder of this section presents a broad outline of a system whichprovides an environment

in which all these ob jectives are satised in a natural fashion

Attribute Value Ob jects

In AVS ob jects are collections of named attributes and their corresp onding values There are no

sp ecial attributes and there is no separate entity to which the attributes are attached An ob ject

corresp onding to what wecallalewillhave an attribute p erhaps named content whose value

contains the content of the le It will likely have other attributes to hold information suchas

name size date etc

Any user or application may attach attributevalue pairs to any ob ject they are able to nd

This allows the addition of information to the ob ject without disturbing the existing content

Content in a proprietary format can b e augmented by attributes containing comments annotations

summaries questions additional content etc The addition of these attributes do es not require

to ols whichmay not b e known available or usable to edit the existing attribute values

Attribute names exist within namespaces Namespaces allow applications to use simple attribute

names AVS always contains a namespace called public with common attributes that applications

mightwanttoattach to ob jects and share eg text creationdate

There are a small numb er of basic op erations in AVS These are the addition and removal of

attributes tofrom ob jects the altering of attribute names and values and search This simplicity

means that implementations can b e small and easily written and increases the p otential for co de

reuse Op erations suchasmoving an ob ject within a le system changing the name of an ob ject

adding comments to an ob ject causing the ob ject to app ear in an application all reduce to these

op erations

Search

In AVS search based on attribute names and values is fundamental and ubiquitous Applications

use search to lo cate ob jects and collections of ob jects Ob jects are assigned attributes that cause

them to b e found in later searches A hierarchy manager will add attributes to an ob ject to cause

that ob ject to app ear at some lo cation in the hierarchy An application putting ob jects onto a

graph gives the ob jects attributes whose values store the co ordinates of the ob ject These will later

b e retrieved via search by the application that displays the graph A desktop manager will add

attributes that cause an ob ject to app ear on a desktop at a given lo cation All these attributes

may b e completely indep endent of one another or they may b e shared byvarious applications In

each case the ob ject is as much a memb er of the hierarchy graph desktop etc as it is of any

other structure

Information ab out searches need not b e discarded Ob jects maycontain a search query

allowing a later identical or similar search a copy of search results or references to found

ob jects Search and its results are treated as imp ortantmemb ers of the information system There

is lowlevel supp ort to ensure that saving search information is a natural op eration

In AVS all actions reduce to attribute editing and search All applications and users wishing

to op erate on ob jects within the system must rst nd the ob jects they wish to act on though

references to previously found ob jects may also b e used Search do es not discriminate amongst

attribute names or values There are no sp ecial attributes Finding a le with content matching a

regular expression and whose name has a certain sux is dicult in an HFS b ecause the content

and name are stored separately and treated dierently by the le system In AVS such distinctions

and thus problems simply do not exist

Ob jects are not Owned Attributes Are

The ob jects in AVS are not owned They are comp osed of owned attributes Ob jects may frequently

b e comp osed of attributevalue pairs added byseveral users or applications The original attributes

may b e later deleted but the ob ject continues to exist The ob ject remains even if all its attributes

are deleted Such an empty ob ject might b e used for later communication b etween applications as

a place to announce events etc

This arrangement reects the aggregation of information into and around ob jects in the real

world Memories opinions comments and various other attributes exist regarding ob jects and

the fact that some attributes may disapp ear or never app ear do es not alter the fact that there is

still an ownerless conceptual ball of information concerning the original material

UbiquityofAttribute Editing Op erations

There are a small numb er of basic op erations in AVS These are the addition and removal of

attributes tofrom ob jects the altering of attribute names and values and search This extreme

conceptual simplicity means that implementations can b e small and easily written and increases

the p otential for co de reuse Op erations suchasmoving an ob ject within a le system changing the

name of an ob ject adding comments to an ob ject causing the ob ject to app ear in an application

all reduce to these simple op erations

An Interpreted Ob ject Language

The AVS will provide an interpreted language allowing op erations on ob jects and attributes This

language will b e used by applications to sp ecify op erations to b e executed up on sp ecied events

attribute assignment or deletion ob ject found in a search etc The language may also serveas

an extension language for applications

Permissions

The p ermissions mo del of AVS places restrictions on attribute names and their values A user or

application that creates an attribute controls whether other users and applications can see the

new attribute name ie that the attribute exists andor that an ob ject has an instance of the

attribute create or delete instances of the new attribute and read or write the value of

attribute instances

This allows users to assign private attributes to ob jects A user may build ob jects comp osed

entirely of attributes that cannot b e seen by other users or applications This oers a form of

privacy In AVS if you cannot nd an ob ject by search you cannot view or alter it Ob jects may

thus b e inaccessible no attributes visible partly accessible some attributes visible alterable etc

or fully accessible all attributes visible according to the p ermissions on its individual attributes

Attribute p ermissions exist indep endentofany instance of an attribute and serve to implement

a p ermissions p olicy for future instances of the attribute In addition attribute instances may carry

p ermissions information relevant to the instance in the ob ject that is asso ciated with each attribute

In the absence of these instancesp ecic p ermissions the p ermissions for an attribute instance are

inherited from those of its attribute

In order to prevent normal users from altering p ermissions on arbitrary attributes or attribute

instances the system initially creates various imp ortant attributes and gives itself the p ermission

to add delete and alter these in ob jects Various system prop erties that exist in normal PACO

ob jects are protected in this way with attributes that are owned by the system It is envisaged

that to ols such as compilers will have attributes such as executable that only the compiler may

attach to ob jects

Attribute Managers

Applications in AVS will usually implementanAttribute Manager comp onent A standard task

for an application will b e to manage a set of attributes which it routinely attaches to and manages

in ob jects that are or b ecome relevant to the application A hierarchy manager might add a

name attribute whose value enco des the lo cation of the ob ject in the hierarchy Although no

other attributes are necessary to maintain a hierarchysuch a manager might also manage other

attributes such as dates and access historyormay split the name into a path attribute and a name

attribute etc Other applications will manage sets of attributes in a similar fashion For example a

desktop application might manage a set of attributes deskx desky deskicon which it attaches

to ob jects that should app ear on its desktop and an annotation application might manage a set of

comment offset date and author attributes

Data Relationships via Attributes and Search

In AVS as relationships b etween ob jects arise they are made real by the addition of attributes

linking the ob jects involved These ob jects and their relationships are dynamic and are later

discovered via search rather than through formal predened data structures or following p ointers

There are no a priori assumptions ab out data structures Multiple relationships b etween ob jects

can exist simultaneously allowing multiple views of the same ob jects no one more imp ortant than

any other

An ob ject with a collection of attributes and corresp onding values is an instance of some data

structure recall that data structures in programming languages are comp osed of elds and values

However unlike data structures as they are used in programming languages in AVS noone need

anticipate the relationships that may exist amongst ob jects or the elds attributes that might

comprise an ob ject Similarly building relationships via the addition of attributes is quite dierent

from building the same relationships through the design of ob jectoriented class hierarchies

Predetermined data structures and class hierarchies are of limited use in a world full of un

exp ected relationships b etween ob jects programmers and users with widely diering brains p er

sp ectives and needs By using attributes and search in place of formal data structures classes and

p ointers AVS eliminates the need to anticipate relationships or to supp ort a xed numb er of them

Similar comments apply to the restrictions imp osed by traditional databases

Applications of the Mo del

In this section we briey presenttwo prototyp es of systems that use the prop osed data mo del to

manage p ersonal information

DomainView

DomainView is a desktop metaphor which tries to address the user interface problems mentioned

in Section Although the initial motivation for DomainView was to use retrieval bycontentin

a smarter way and to present a simpler interface for the user we later realized that a dierent

conceptual framework was needed to organize information and store do cuments Wenow summarize

how the prop osed data mo del is used in DV

An ob ject is a do cument which has a dynamic numb er of attributes Naming an ob ject is

optional its name is just one of the p ossible attributes of a do cument Attributes can b e p oten

tially added by the user or by applications Some initial attributes are creation time and creating

application size and content User do cuments are organized in collections each one dening an

information or application domain Domains have a at organization That is there is no hierar

chy asso ciated with them Nevertheless domains can naturally nest or overlap However they are

dynamic so those set relationships are not xed Hence a do cumentmay b elong to more than one

domain

Domains can b e predened andor created by the user and are dynamic Each domain has

asso ciated a set of words which denes it chosen from a global thesaurus Predened domains could

dep end on sp ecic user tasks or applications As do cuments domains mayhave a name but this

is optional The thesaurus can b e initially dened by a system manager extracted automatically

from a subset of do cuments or created by the user In all cases the thesaurus is dynamic and is

mo died by user actions Do cuments or a set of do cuments can b e retrieved by using a set of words

which will b e searched on all attributes andor sp ecic values in sp ecic do cument attributes such

as date size etc Queries are stored and their result can dene a new domain Notice that there

is no notion of a HFS and a do cument can only b e retrieved using the value or a range of values

of one or more attributes

Do cuments can only b e used through the desktop interface That means that the concept of

an application op ening a do cument do es not exist moreover it is forbidden as was the original

intent of one of the Apple Macintosh designers Applications are asso ciated with domains

and executed by do cuments and not the other way around To create a new do cument there

is a generic new do cument with no attributes which can call any application or the applications

asso ciated to the do cument domain if any

The user interface allows the following capabilities

Visualizing domains that is sets of ob jects In particular their intersection

Visualizing a given domain or the result of a search that is a set of do cuments

Retrieving sets of do cuments by searching on do cument attributes by ranges in numerical

attributes or by fulltext retrieval op erations in strings

Clearly all these capabilities are easily implemented in our data mo del Our user interface prototyp e

has b een implemented in Java and has almost all the functionality already describ ed

Our desktop can b e seen as a simple interface to a dierent op erating system where there is

no HFS but a uniform universe of ob jects in our case called do cuments We think that this

metaphor is closer to reality and do es not rely so much on the user of the memory although we

exp ect normal users to name all domains However the numb er of domains will b e in general small

so this organization is much more scalable than a HFS On the other hand a user can have just

one domain hisher universe and retrieveeverything bycontent That should b e the nal goal

KnownSpace

Data within KnownSpace is stored in Entities with Attributes Entities maybeanything emails

webpages desktop do cuments or more abstract things likePersons Organizations and Websites

Attributes may also b e arbitrary date added phones numb ers contained in website p ointed to by

and so on

As with DomainView a typical KnownSpace interface presents a universe of entities each

of whichmay represent a do cumenta webpage an email message and so on but it may also

presententities representing more abstract things that seemingly have nothing whatso ever to do

with do cuments in the traditional senselikea Person A Person howeverifapowerful organizing

principle in daily life A Person sends email a Person has a phone numb er and that phone number

may b e stored in an email form yet another Person A Person has an address and so on All these

pieces of information can b e hung on one entity inside KnownSpace These users can browse not

just traditional do cuments but also any arbitrary collection of pieces of information the interface

cho oses to supp ort

Further interfaces in KnownSpace are not tied to the system in the sense that most other

interfaces are tied to their systems KnownSpace has many faces It already has veinterfaces

and more are on the way Each user can p otentially have a unique interface since part of what

KnownSpace supp orts is the ability to build new interfaces inside KnownSpace this work is not yet

complete Consequently users mayeven have one KnownSpace interface they use when theyre at

work and at home use a completely dierent one although p erhaps carrying over the same set of

icons for eachentity

Within a particular interface users may browse delete create edit or markup anyentityHow

ever KnownSpace itself further marks up those entities and anyentities it fetches autonomously

for presentation to the user KnownSpace uses all that markup all those attributes to cluster the

entities in many dierentways eachofwhichisavailable to anyKnownSpace interface It is up

to the interface designer to cho ose which particular views of the data space it will emphasize to its

users

Tomake all of this exibility p ossible KnownSpace dep ends heavily on the exible ob ject

attribute mo del we presented earlier

Concluding Remarks

Arguments for our data mo del include its underlying simplicity and exibility the removal of a

priori assumptions ab out data structures and relationships b etween information ob jects ease of use

and implementation its natural representation of real world information the supp ort for multiple

views of the same information and the fact that it generalizes the current structural imp erative

the hierarchical le system

The new data mo del should make it simpler for users programmers and applications to work

with information to accumulate it share it add to it delete it and to organize retrieve and view

it in manyways This data mo del is intended to replace or supplement the traditional HFS with

a substrate that b etter reects the typ es of op erations on information that we hop e to p erform in

many mo dern computational environments

Our mo del can also easily b e extended to the Web For example all Web ob jects could b e

wrapp ed in XML where wehave attributes and values indep endent of the typ e of ob ject HTML

image etc To maintain the XML philosophy binaries would b e converted to visible ASCI I with

no distinction b etween data and metadata although for complex ob jects it would b e b etter to use

a link to them and maintain them in their native format We think that this is a very uniform

and p ortable ob ject mo del for the Web Consequentlysearching the Web with agents would b e

much easier and anysearch engine index would b e much more p owerful b ecause attributes give

semantics to the data which is also one of the goals of XML All of our systems add richer layers

of markupinterpretation to data whichmayormay not already b e in some XML format

Future work for AVS will revolve around evaluating how programmers nd the task of designing

applications using the data mo del that we prop ose Based on our exp erience with DV and KS

having AVS would have greatly simplied the development of those prototyp es Another imp ortant

evaluation is how eciently this mo del can b e implemented either from scratch or on top of a HFS

or a relational data mo del although the later imp oses many restrictions that are already mentioned

Finally our mo del leaves op en many problems that wehave not b eing able to address yet

References

Ricardo BaezaYates Terry Jones and Gregory Rawlins New Approaches to Managing

Information AttributeCentric Data Systems Submitted

Ricardo BaezaYates and Claudio Mecoli DomainView A Desktop Metaphor based on

User Dened Domains Dept of Computer Science Univ of Chile

Deb orah Barreau and Bonnie A Nardi Finding and Reminding File Organization from

the Desktop SIGCHI Newsletter Vol No July

Mic Bowman and Ranjit John The Synopsis File System From Files to File Ob jects

Position pap er for the Joint WCOMG Workshop on Distributed Ob jects and Mobile

Co de June

C Mic Bowman Peter B Danzig Darren R Hardy Udi Manb er and Michael F

Schwartz The Harvest information discovery and access system In Proc nd Int WWW

Conf pages Octob er

Carey M DeWitt D Naughton J Solomon M et al Shoring Up Persistent Appli

cations Pro c of the ACM SIGMOD Conference Minneap olis MN May

Craig Chambers David Ungar BayWei Chang and Urs Holzle Parents are Shared Parts

Inheritance and Encapsulation in Self In Lisp and Symb olic Computation Kluwer

Academic Publishers June

Paul Dourish W Keith Edwards Anthony LaMarca John lamping Karin Petersen

Michael Salisbury Douglas B Terry and james Thornton Extending Do cument Manage

ment Systems with UserSp ecic Active Prop erties XeroxPARCWorking Pap er

Paul Dourish W Keith Edwards Anthony LaMarca and Michael Salisbury Presto An

Exp erimental Architecture for Fluid Interactive Do cument Spaces XeroxPARCWorking

Pap er

Dominic Giampaolo Practical File System Design with the Morgan Kauf

mann

David Gelernter Generative communication in Linda ACM Transactions on Program

ming Languages and Systems January

David K Giord Pierre Jouvelot Mark Sheldon and James OTo ole Semantic le sys

tems In th ACM Symp osium on Principles of Programming Languages Octob er

David Goldb erg David Nichols Brian M Oki and Douglas Terry Using Collab orative

Filtering to Weave an Information Tap estryCommunications of the ACM v pp

December

Terry Jones Attribute Value Systems An Overview Dept of Cognitive Science Univ of

California at San Diego

Gregory J Rawlins KnownSpace httpwwwknownspaceorg

Elke A Rundensteiner Andreas Ko eller Xin Zhang Amy J Lee and Anisoara Nica

Evolvable View Environment EVE NonEquivalent View Maintenance under Schema

Changes SIGMOD Software system demonstration Philadelphia USA May

Bruce Tognazzini Tog on Software Design Addison Wesley

Andrew John Wilkes Workstation Design for Distributed Computing see Chapter

Entities PhD Dissertation UniversityofCambridge Repro duced as Hewlett

Packard Technical Rep ort ACS April

PWycko S W McLaughry T J Lehman and D A Ford T Spaces IBM Systems

Journal v No JavaTechnology Aug