Garbage Collection in Ob ject Or iente d Databas e s Us ing

Transactional Cyclic



S Ashwin Prasan Roy S Se shadr i Avi Silb erschatz

S Sudarshan

1 2

Indian Institute of Technology Bell Lab orator ie s

Mumbai India Murray Hill NJ

sashwincswi sce du avib elllabscom

fprasans e shadr isudarshagcs eiitber netin

Intro duction

Ob ject or iente d databas e s OODBs unlike relational

Ab stract

databas e s supp ort the notion of ob ject identity and

ob jects can refer to other ob jects via ob ject identi

ers Requir ing the programmer to wr ite co de to track

Garbage collection i s imp ortant in ob ject

ob jects and the ir reference s and to delete ob jects that

or iente d databas e s to f ree the programmer

are no longer reference d i s error prone and leads to

f rom explicitly deallo cating memory In thi s

common programming errors such as memory leaks

pap er we pre s ent a garbage collection al

garbage ob jects that are not referre d to f rom any

gor ithm calle d Transactional Cyclic Refer

where and havent b een delete d and dangling ref

ence Counting TCRC for ob ject or iente d

erence s While the s e problems are pre s ent in tradi

databas e s The algor ithm i s bas e d on a var i

tional programming language s the eect of a memory

ant of a reference counting algor ithm pro

leak i s limite d to individual runs of programs s ince

p os e d for functional programming language s

all garbage i s implicitly collecte d when the program

The algor ithm keeps track of auxiliary refer

terminate s The problem b ecome s more s er ious in p er

ence count information to detect and collect

s i stent ob ject store s s ince ob jects outlive the programs

cyclic garbage The algor ithm works correctly

that create and acce ss them Automate d garbage col

in the pre s ence of concurrently running trans

lection i s e ss ential in an ob ject or iente d databas e to

actions and system f ailure s It do e s not ob

protect f rom the errors mentione d ab ove In f act

tain any long term lo cks thereby minimizing

the Smalltalk binding for the ODMG ob ject databas e

interference with transaction pro ce ss ing It

standard require s automate d garbage collection

us e s recovery subsystem logs to detect p ointer

We mo del an OODB in the standard way as an ob

up date s thus exi sting co de nee d not b e re

ject graph where in the no de s are the ob jects and the

wr itten Finally it exploits schema informa

arcs are the reference s b etween ob jects The graph has

tion if available to re duce costs We have im

a p ers i stent root All ob jects that are reachable f rom

plemente d the TCRC algor ithm and pre s ent

the p ers i stent ro ot or f rom the trans ient program state

re sults of a p erformance study of the imple

of an ongoing transaction are live while the re st are

mentation

garbage We often call ob ject reference s as pointers



There have b een two approache s to garbage collec

Currently at the University of Wisconsin Madison

tion in ob ject or iente d databas e s Copying Col lector

Permission to copy without fee al l or part of this material is

bas e d YNY and Mark and Sweep bas e d AFG

granted provided that the copies are not made or distributed for

direct commercial advantage the VLDB copyright notice and

The copying collector algor ithm travers e s the entire ob

the title of the publication and its date appear and notice is

ject graph and copie s live ob jects into a new space the

given that copying is by permission of the Very Large Data Base

entire old space i s then reclaime d In contrast the

Endowment To copy otherwise or to republish requires a fee

Mark and Sweep algor ithm marks all live ob jects by

andor special permission from the Endowment

travers ing the ob ject graph and then travers e s sweeps

Pro cee dings of the rd VLDB Conference

Athens Greece the entire databas e and delete s all ob jects that are un

marke d The copying collector algor ithm reclusters ob ing algor ithm to handle cyclic data have b een prop os e d

jects dynamically the recluster ing can improve lo cality in the programming language community including

of reference in some cas e s but may de stroy program Bro Bro PvEP More recent work in thi s

mer sp ecie d cluster ing re sulting in wors e p erformance area include s Lin MWL JL

in other cas e s The garbage collection algor ithms of In thi s pap er we cons ider a vers ion of reference

YNY as well as AFG handle concurrency con counting prop os e d by Brownbr idge Bro Bro

trol and recovery i ssue s for functional programming language s which handle s

s elf referential cycle s of garbage We pre s ent an al

With b oth the ab ove algor ithms the cost of tra

gor ithm calle d Transactional Cyclic Reference Count

vers ing the entire ob ject graph can b e prohibitively

ing TCRC bas e d on Brownbr idges algor ithm which

exp ens ive for databas e s larger than the memory s ize

i s suitable for garbage collection in an OODB The sa

particularly if there are many crosspage reference s

lient feature s of the TCRC algor ithm are

In the worst cas e when the buer s ize i s a small f rac

tion of the databas e s ize and ob jects in a page refer to

It detects all s elf referential cycle s of garbage un

ob jects in other page s only there may b e an IO for

like bas ic reference counting and the partitione d

every p ointer in the databas e To alleviate thi s prob

garbage collection algor ithms

lem earlier work YNY AFG has attempte d to

divide the databas e into partitions cons i sting of a few

It p erforms a very lo calize d vers ion of markand

page s Each partition store s interpartition reference s

sweep to handle cyclic data with each markand

that i s reference s to ob jects in the partition f rom ob

sweep likely to acce ss f ar fewer ob jects than a

jects in other partitions in a p ers i stent data structure

global markandsweep Thus it do e s not have

Ob jects referre d to f rom other partitions are treate d

to examine the entire databas e while collecting

as if they are reachable f rom the p ers i stent ro ot and

garbage except in the worst cas e

are not garbage collecte d even if they are not referre d

to f rom within the partition Each partition i s garbage It allows transactions to run concurrently and

collecte d indep endent of other partitions reference s to

do e s not obtain any long term lo cks thereby min

ob jects in other partitions are not followe d Thus par

imizing interference with transaction pro ce ss ing

titioning make s the traversal more ecient the smal

It i s integrate d with recovery algor ithms and

ler the partition the more ecient the traversal with

works correctly in spite of system crashe s It also

maximum eciency o ccurr ing if the whole partition ts

us e s recovery subsystem logs to detect p ointer up

into the buer space

date s thus exi sting application co de nee d not b e

Unfortunately small partitions increas e the probab

rewr itten

ility of s elfreferential cycle s of garbage that cross par

tition b oundar ie s such cyclic garbage i s not detecte d

It exploits schema information if available to re

by the partitione d garbage collection algor ithms Pre

duce costs In particular if the schema graph i s

vious work has maintaine d that such cross cycle struc

acyclic no cyclic reference s are p oss ible in the

ture s will b e few and will probably not b e a prob

databas e and TCRC b ehave s identically to refer

lem However s imulations by CWZ showe d that

ence counting

even small increas e s in databas e connectivity can pro

duce s ignicant amounts of such garbage Therefore

A pro of of correctne s s of the TCRC algor ithm i s

+

it i s not clear that partition s ize s can b e made very

pre s ente d in ARS De s igning a cyclic referencing

small without e ither f ailing to collect large amounts

counting algor ithm which allows concurrent up date s

of garbage or employing sp ecial and exp ens ive tech

and handle s system crashe s i s rather nontr ivial and

nique s to detect such cyclic garbage

to our knowle dge has not b een done b efore we b elieve

thi s i s one of the central contr ibutions of our pap er

A natural alter native i s Reference Counting Refer

A problem often cite d against reference counting

ence Counting i s bas e d on the idea of keeping a count of

the numb er of p ointers p ointing to each ob ject When scheme s i s the overhead of up dating reference counts

the reference count of the ob ject b ecome s zero it i s However each p ointer up date can only re sult in at

garbage and eligible for collection Reference count most one reference count b e ing up date d Thi s over

ing has the attractive prop ertie s of lo calize d and in head will have only a small impact on p erformance if

as we exp ect i s true in any reali stic scenar io p ointer

cremental pro ce ss ing Unfortunately bas ic reference

up date s are only a small f raction of the overall up

counting cannot deal with s elfreferential cycle s of ob

jects each ob ject could have a p os itive reference count date s For TCRC moreover the overhead i s os et by

yet all the ob jects in the cycle may b e unreachable f rom the re duce d cost of traversals while collecting garbage

the p ers i stent ro ot and therefore b e garbage However We have implemente d a prototyp e of the TCRC al

a numb er of extens ions of the bas ic referencing count gor ithm as well as the partitione d mark and sweep

algor ithm on a storage manager calle d Brahma de and are delete d

velop e d in I IT Bombay We pre s ent a p erformance However pr ior cyclic reference counting algor ithms

study of TCRC bas e d on the implementation the study including Brownbr idges algor ithm were de s igne d for

clearly illustrate s the b enets of TCRC a s ingle us er system They cannot b e us e d in a multi

us er environment with concurrent up date s to ob jects

and do not deal with p ers i stent data and f ailure s Our

Brownbr idges Cyclic Reference

contr ibutions lie in extending Brownbr idges algor ithm

Counting Algor ithm

to a us e logs of up date s to detect change s to ob ject

Our Transactional Cyclic Reference Counting al reference s b to work in an environment with con

gor ithm i s bas e d on the Cyclic Reference Counting current up date s c to work on p ers i stent data in the

CRC algor ithm prop os e d by Brownbr idge Bro pre s ence of system f ailure s and transaction ab orts d

Bro in the context of functional programming lan handle a batch of up date s at a time rather than one up

guage s date at a time and e optimize the lo calize d mark and

sweep s ignicantly by following only strong p ointers

The bas ic idea b ehind the Cyclic Reference Count

ing CRC algor ithm of Brownbr idge Bro Bro i s

to lab el e dge s in the ob ject graph as strong or weak System Mo del and As sumptions

The lab elling i s done such that a cycle in the ob ject

In thi s s ection we de scr ib e our system mo del and

graph cannot cons i st of strong e dge s alone it must

outline the architectural assumptions on which our

have at least one weak e dge Two s eparate reference

garbage collector i s bas e d which i s very s imilar to the

counts for strong and for weak e dge s denote d SRefC

mo del and assumptions in AFG

and WRefC re sp ectively are maintaine d p er ob ject It

In our mo del transactions log undo and re do in

i s not p oss ible in general to cheaply determine whether

formation for all up date s Undo and re do records

lab elling a new e dge as strong create s a cycle of strong

are repre s ente d as undotid oid os et oldvalue and

e dge s or not Hence in the abs ence of further informa

re dotid oid os et newvalue where tid denote s a

tion the algor ithm take s the cons ervative view that la

transaction identier and oid an ob ject identier Ob

b elling a new e dge strong could create a cycle of strong

ject creation i s logge d as objectallo cationtid oid The

e dge s and lab els the new e dge weak

commit log i s repre s ente d as committid and the

The SRefC and WRefC are up date d as e dge s are

ab ort log i s repre s ente d as ab orttid We require that

create d and delete d If for an ob ject S the SrefC as

f rom the oid we can identify the typ e of the ob ject p er

well as WrefC i s zero then S i s garbage and S and

haps by rst fetching the ob ject and f rom the os et

the e dge s f rom it are delete d If the SrefC i s zero but

we can determine if the value that has b een up date d

WrefC i s nonzero there i s a chance that S i s involve d

i s a p ointer eld The s e requirements are sati se d by

in a s elf referential cycle of garbage If the SrefC of

most databas e systems

an ob ject S i s greater than zero then S i s guarantee d

We make the following imp ortant assumption ab out

to b e reachable f rom the ro ot however our TCRC

transactions

algor ithm do e s not guarantee thi s last prop erty

As sumption Transactions fol low strict two

If the ob ject graph did not have any garbage b e

phase locking on objects That is transactions acquire

fore the deletion of an e dge to S then the only p oten

read or write locks on objects as appropriate and hold

tial candidate s for b ecoming garbage are S and ob jects

read as wel l as write locks until end of transaction 2

reachable f rom S If SrefC of S i s zero and WrefC of

S i s nonzero a lo calize d mark and sweep algor ithm

As with any other garbage collection scheme we as

detects whether S and any of the ob jects reachable

sume that an ob ject identier i s valid only if it i s e ither

f rom S are indee d garbage The lo calize d mark and

a p ers i stent ro ot or i s pre s ent in a p ointer eld of an

sweep p erforms a traversal f rom S and identie s all

ob ject in the databas e or i s in the trans ient memory

ob jects reachable f rom S and colours them re d Let

program var iable s or regi sters of an active transac

us denote the ab ove s et by R It then colours green

tion that read the value f rom an ob ject in the databas e

every ob ject in R that has a reference f rom an ob ject

Note that thi s preclude s transactions f rom pass ing oids

outs ide R detecte d us ing reference counts It also

to other transactions and f rom stor ing oids in exter nal

colours green all ob jects reachable f rom any green ob

p ers i stent storage

ject Dur ing thi s green marking phas e some p ointer

Our algor ithms can b e us e d in centralize d as well

strengths are up date d to ensure that every ob ject has

as clients erver s ettings Let us cons ider rst the cent

at least one strong p ointer to it We will de scr ib e thi s

ralize d s etting

p ointer strength up date in detail in the context of our

transactional cyclic reference counting algor ithm At As sumption In the centralized setting we as

the end all ob jects in R not marke d green are garbage sume that transactions fol low str ict WAL that is

they log both the undo and the redo value before actu The TCRC algor ithm also maintains a p ers i stent

al ly performing the update 2 table the Weak Reference Table WRT which con

tains oids for the ob jects which have a zero SRefC ie

Our algor ithms also work in a datashipping client

no strong p ointers incident on them The p ers i stent

s erver environment under the following assumptions

ro ot i s never put into the WRT

All the ab ove information can b e constructe d f rom

As sumption In the clientserver setting we as

the ob ject graph and therefore it could b e made tran

sume that clients fol low

s ient However we would then have to reconstruct the

str ict WAL with respect to the server That is

information after a system crash by scanning the entire

before any data is received by the server the undo

databas e which would b e exp ens ive Hence we make

as wel l as redo information for the data must have

it p ers i stent Up date s to SRefC and WRefC up date

already been received by the server

of the strength bit of an ob ject or of a p ointer and the

ins ert or delete of entr ie s f rom the WRT are logge d as

force with respect to the server That is be

part of the transaction whos e p ointer up date caus e d

fore the transaction commits al l the updated data

the information to b e up date dins erte ddelete d

must have been received by the server 2

There i s also a nonp ers i stent table which i s us e d

dur ing garbage collection the Red Reference Table

The s e assumptions make the client transaction b e

RRT thi s table asso ciate s with some ob jects a

have as f ar as the s erver i s concer ne d just like a lo cal

strong red reference count SRe dRefC a weak red ref

transaction that follows str ict WAL

erence count WRe dRefC and a bit that indicate s

Our technique s are not aecte d by the unit of data

whether the colour of the ob ject i s re d or green Thi s

shipping such as page or ob ject and whether or not

table i s store d on di sk s ince the s ize of thi s table could

data i s cache d at the client The clients can retain

b e large in the worst cas e but up date s to thi s table are

copie s of up date d data after it has b een s ent to the

not logge d

s erver

Most of the assumptions ab ove are sati se d by typ Finally s imilar to AFG TCRC maintains an

nonp ers i stent inmemory table calle d the Temporary

ical storage managers for ob jector iente d databas e s

Reference Table TRT which contains all thos e oids

Our client s erver assumptions are also very s imilar to

such that a reference to the ob ject was adde d or de

thos e of AFG

lete d by an active transaction or the ob ject was cre

ate d by the transaction Such an oid may b e store d in

Transactional Cyclic Reference

the trans ient memory of an active transaction although

Counting

the ob ject may not b e reference d by any other ob ject

We will now de scr ib e the Transactional Cyclic Refer

in the databas e An ob ject whos e oid i s in TRT may

ence Counting TCRC algor ithm We rst de scr ib e

not b e garbage even if it i s unreachable f rom any other

the data structure s nee de d by the transactional cyclic

ob ject s ince the transaction may store a reference to

reference counting algor ithm

the ob ject back in the databas e Up date s to TRT are

also not logge d The TRT also provide s a s imple way

Data Structure s

of handling the p ers i stent ro ot its oid i s entere d in

the TRT at system start up and i s never remove d

Asso ciate d with each ob ject we p ers i stently maintain

Thi s prevents the garbage collector f rom collecting the

a strong reference count SRefC giving the numb er of

p ers i stent ro ot

strong p ointers p ointing to the ob ject a weak refer

ence count WRefC giving the numb er of weak p oint

The Algor ithm

ers p ointing to the ob ject and a strength bit for the

ob ject Each p ointer also has a strength bit Both

TCRC cons i sts of two di stinct algor ithms run by

strength bits are p ers i stent The p ointer i s strong if

dierent pro ce ss e s The rst i s the loganalyzer al

the strength bit in the p ointer and the strength bit in

gor ithm The s econd algor ithm i s the actual garbage

the ob ject p ointe d to have the same value otherwi s e

col lection algor ithm We de scr ib e them b elow

the p ointer i s weak Thi s repre s entation of strength

us ing two bits i s an imp ortant implementation tr ick

Log analyzer

f rom Brownbr idge Bro Bro It make s very e

cient the op eration of ipping the strength of all p oint The loganalyzer algor ithm analyze s log records gener

ers to an ob ject that i s making all strong p ointers to ate d by the transaction and p erforms var ious actions

the ob ject weak and all weak p ointers to the ob ject bas e d on the log records As part of its actions it may

strong All that nee d b e done i s to ip the value of the also ins ert records into the log We shall assume it i s

strength bit in the ob ject run as part of the transaction its elf i s invoke d each

time a log record i s app ende d to the system log tail

and i s atomic with re sp ect to the app ending of the log

record Pro ce dure CollectGarbage f

acquire gcLatch

In the actual implementation it i s p oss ible to run

RRT fg

the loganalyzer as a s eparate thread and when a

S for each oid in WRT that i s not in TRT

transaction app ends a log record to the system log it

Re dTravers eoid

actually only delivers it to the loganalyzer which then

S for each oid RRT

app ends the log record to the system log In particu

latch the reference count entry of oid

lar in the clients erver implementation the loganalyzer

if S Ref C W Ref C

pro ce ss i s run at the s erver end not at the client

oid oid

S RedRef C W RedRef C

The loganalyzer make s us e of the following pro ce d

oid oid

mark oid as green

ure s Pro ce dure DeletePointer decrements the WRefC

unlatch reference count entry of oid

or SRefC for an ob ject when a p ointer to the ob ject

for each oid RRT that i s marke d green

i s delete d If the SRefC f alls to zero after the decre

if S Ref C S RedRef C

ment then the ob jects oid i s put into WRT Pro ce dure

oid oid

all exter nal p ointers

AddPointer by def ault s ets the strength of the p ointer

to the object are weak

to b e weak and increments the WRefC of the ob ject

if S Ref C oid i s in WRT

p ointe d to The strength i s s et to weak so that cycle s

oid

remove oid f rom WRT

of strong e dge s are not create d however we will s ee

ip the strength of all p ointers to oid

in Section that we may b e able to make some new

swap S Ref C and W Ref C

p ointers strong

oid oid

GreenTravers eoid

The pro ce dure LogAnalyzer works as follows First

done FALSE

analyzer latch which i s also acquire d it obtains the log

S while done FALSE

by the garbage collection thread to e stabli sh a con

done TRUE

s i stent p oint in the log The latch i s obtaine d for the

analyzer latch acquire log

duration of the pro ce dure The log i s analyze d by the

S for each oid RRT that i s marke d re d

log analyzer and dep ending on the typ e of the log re

if oid TRT

cord var ious actions as outline d b elow are taken For

analyzer latch releas e log

undore do log records caus e d by p ointer up date s the

GreenTravers eoid

reference counts for the aecte d ob jects are up date d

done FALSE

Thi s i s done by DeletePointer in cas e of undo logs and

acquire log analyzer latch

AddPointer in cas e of re do logs For log records cor

releas e log analyzer latch

re sp onding to the allo cation of ob jects the reference

S for each oid RRT that i s marke d re d

counts for the new ob ject are initialize d to zero and

Collectoid

the oid of the ob ject i s ins erte d into the WRT In all

releas e gcLatch

the ab ove cas e s ie for p ointer up date s and ob ject al

remove all agge d entr ie s f rom TRT

lo cation the oid of the aecte d ob ject i s ins erte d into

g

the TRT with the tid of the transaction that generate d

the record

Pro ce dure GreenTravers eoid f

For endoftransaction commit or ab ort log re

starting with oid as the ro ot do a

cords the algor ithm rst tr ie s to get the gcLatch If the

depthrst traversal re str icte d to

latch i s obtaine d imme diately then garbage collection

the objects marke d re d in RRT

i s not in progre ss and all the oid entr ie s for the termin

when vi s iting an object dur ing the traversal

ating transaction f rom the TRT are remove d and the

mark the object green

gcLatch releas e d thereafter However if the gcLatch

make strong all p ointers f rom the object

cannot b e obtaine d imme diately then a garbage col

to any re d object not yet vi s ite d

lection i s in progre ss concurrently In thi s cas e the

make weak all p ointer f rom the object to

oid entr ie s for the terminating transaction are not re

any green object already vi s ite d

move d but instead agge d for later removal by the

g

garbage collector

All op erations on p ointer strengths and reference

counts are protecte d by a latch on the ob ject p oin

te d to although not explicitly mentione d in our al

Figure Ps eudo Co de for Garbage Collector

gor ithms Acce ss to WRT and TRT are also protecte d

by latche s

Garbage Collector analyzer latch for the last time in the while lo op the log

at step S as T Thi s guarantee s that all ob jects in

The garbage collection algor ithm i s activate d p er io dic

RRT that are marke d re d at step S are not in TRT

ally p oss ibly dep ending on availability of f ree space

according to log at T

The algor ithm make s us e of the following supp ort func

tions Pro ce dure Collect actually delete s an ob ject

Supp ort for Logical Undo by the Recov

b efore doing so it delete s all p ointers out of the ob

ery Manager

ject up dating the store d reference counts of the ob

The TCRC algor ithm nee ds some supp ort f rom the re

jects p ointe d to It also delete s the ob ject f rom RRT

covery manger in the form of supp orting logical undos

and WRT

to ensure correctne s s There are some actions whos e

Pro ce dure Re dTravers e p erforms a reachability scan

undos have to b e p erforme d logically and not phys

f rom the sp ecie d ob ject following only strong p oint

ically We di scuss them b elow and di scuss what the

ers and marks all reachable ob jects re d and puts

logical undo should do in each cas e

them in RRT Re dTravers e also maintains for each

Pointer Deletion and Strength Up date Undo

ob ject pre s ent in RRT two counts S RedRef C and

of a p ointer deletion or strength up date if p erforme d

W RedRef C giving re sp ectively the numb er of strong

naively may intro duce strong cycle s in the graph

and weak p ointers to the ob ject f rom all other ob jects

which can aect the correctne ss of the algor ithm The

pre s ent in RRT The s e counts are maintaine d on the

r ight way to undo a p ointer deletion i s to re ins ert the

y dur ing the traversal in order to do so Re dTravers e

p ointer with the strength s et to b e weak even if it

also maintains the s e counts for ob jects that are reach

was strong earlier Similarly the undo of a p ointer

able by a s ingle weak e dge f rom ob jects in RRT s ince

strength up date done in cas e of system crash dur ing

such ob jects may b e adde d to RRT later in the scan

the garbage collection phas e i s to s et the strength

The garbage collection algor ithm i s implemente d by

of the p ointer as weak irre sp ective of the or iginal

Pro ce dure CollectGarbage shown in Figure Initially

strength

all no de s reachable f rom ob jects in WRT us ing only the

Reference Counts Up date The reference counts

strong p ointers are coloure d re d and put in RRT by

of an ob ject O can b e concurrently up date d by multiple

calling Re dTravers e Thi s function p erforms a fuzzy

transactions including the garbage collector through

lo calize d traversal of the ob ject graph dur ing which

dierent ob jects which are lo cke d by the transactions

no lo cks are obtaine d on the ob jects b e ing travers e d

The ob ject O its elf nee d not b e lo cke d s ince only a ref

Short term latche s may b e obtaine d on ob jects or page s

erence to it i s b e ing up date d Only short term latche s

to ensure phys ical cons i stency

are nece ssary for maintaining phys ical cons i stency If a

After thi s in Step S some no de s are marke d green

transaction that up date d the reference count of an ob

bas e d on the value s of the ir WRefCSRefC and WRe

ject ab orts it should b e logically undone the undo of

dRefCSRe dRefC WRe dRefC i s the numb er of weak

a reference count increment i s a decrement of the same

p ointers p ointing to an ob ject amongst p ointers f rom

reference count while the undo of a reference count

ob jects in RRT Similarly SRe dRefC i s the numb er of

decrement i s always an increment of WRefC s ince a

strong p ointers p ointing to an ob ject amongst p oint

re ins erte d p ointer i s always weak

ers f rom ob jects in RRT The expre ss ion WRe dRefC

SRe dRefC counts how many p ointers to a no de s

Correctne s s

are f rom no de s in RRT If thi s count i s le ss than the

Theorem The TCRC algorithm

total numb er of p ointers to no de s there must b e an

exter nal to ob jects in RRT p ointer to s and s i s

eventual ly col lects any object that is garbage

not garbage Such ob jects are marke d green in Step

S The Pro ce dure GreenTravers e calle d in Pro ce dure

does not incorrectly reclaim live objects as

CollectGarbage can b e found in Figure

garbage 2

Next in Step S any ob jects in RRT that are in

TRT are also marke d green s ince the ir reference s may The ab ove theorem e stabli she s the correctne ss of the

+

still b e store d in an ongoing transaction and store d TCRC algor ithm a pro of i s pre s ente d in ARS

back in the databas e Ob jects that are reachable f rom The theorem holds in the pre s ence of concurrent trans

actions and system f ailure s

the ab ove ob jects are also marke d green by invoking

GreenTravers e The reason for p erforming Step S re An intere sting p oint to note i s that Re dTravers e fol

p eate dly in the while lo op at Step S i s to e stabli sh lows only strong p ointers and not weak p ointers in

a cons i stent p oint in the log at which no ob ject in the contrast to MarkandSweep Our pro of of correctne ss

RRT i s in TRT thi s helps s implify the pro of of correct shows that every garbage ob ject i s e ither in WRT or

ne ss Let us denote the time instant when we acquire i s reachable by a s equence of strong e dge s f rom an

ob ject in WRT and thus Re dTravers e nds all garbage ject manager calle d Brahma develop e d at I IT Bom

ob jects We also show that all nongarbage ob jects col bay Brahma supp orts concurrent transactions us ing

oure d re d are later coloure d green by a call on Green two phas e lo cking and a complete implementation of

Travers e even though GreenTravers e only follows e dge s the ARIES recovery algor ithm It provide s extend

+

through re d ob jects ible hash indice s as well as B tree indice s as addi

Another intere sting p oint i s that although our tional acce ss mechani sms

traversals b oth Re dTravers e and GreenTravers e are

The WRT i s implemente d as a p ers i stent extendible

fuzzy that i s they do not acquire any long term lo cks

indexe d on the oid while the TRT i s an

the algor ithms are still correct The TRT also us e d

inmemory hash table indexe d s eparately on the oid

by AFG plays an imp ortant role here s ince any

and the transaction id to allow easy deletion of all

p ointers that are adde d or delete d dur ing the traversal

entr ie s of a transaction The reference counts SRefC

are ins erte d into the TRT Ob jects reachable f rom

and WRefC are store d with the ob ject its elf The

TRT are not garbage collecte d

only p ers i stent structure s require d by PMS are one

A badly de s igne d garbage collection algor ithm could

Incoming Reference List IRL p er partition which i s

+

create innite work for its elf by leaving oids in WRT

maintaine d as a p ers i stent B tree

which will b e travers e d by another garbage collection

Our p erformance study in thi s s ection i s bas e d on

phas e which in tur n leave s oids in WRT ad innitum

the standard OO b enchmark CDN In particular

We now state a theorem which guarantee s that thi s

we worke d on the standard smal l datas et in OO

do e s not happ en that i s in the abs ence of up date s

which was also us e d in YNY for the ir s imulation

the system eventually reache s a state where garbage

study The OO parameters and the ir value s for thi s

collection thread do e s no more work

datas et are given in Table and are explaine d b elow

The OO datas et i s comp os e d of a numb er of mod

Theorem If there are no updates from the begin

ules sp ecie d by NUMMODULES Each mo dule con

ning of one garbage col lection phase to the end of the

s i sts of a tree of ob jects calle d ass emblie s The tree

next garbage col lection phase no object wil l be in WRT

i s a complete tree with a f anout of NUMASSMPER

at the end of the second garbage col lection phase 2

ASSM and has NUMASSMLEVELS levels The last

+

level of the tree i s calle d a bas e ass embly while the

The pro of i s pre s ente d in ARS

upp er levels are calle d complex ass emblie s In addi

tion each mo dule cons i sts of NUMCOMPPERMOD

Us ing the Schema Graph

ULE comp os ite ob jects The bas e ass emblie s p oint

We now s ee how to us e information f rom the data

to NUMCOMPPERASSM of the s e comp os ite ob jects

bas e schema to optimize TCRC The schema graph i s

Many bas e ass emblie s may share a comp os ite ob ject

a directe d graph in which the the no de s are the class e s

Each comp os ite ob ject p oints to a a pr ivate s et of

in the schema An e dge f rom no de i to no de j in the

NUMATOMICPERCOMP atomic objects b a di s

schema graph denote s that Class i has an attr ibute that

tingui she d atomic ob ject calle d the composite root

i s a reference to Class j The p ointers in the schema

and c a document ob ject An atomic ob ject has a

graph thus form a template for the p ointers b etween

xe d numb er of connections sp ecie d by NUMCON

the actual instance s of the ob jects If an e dge E in

NPERATOMIC out of it to other atomic ob jects in

a schema graph i s not involve d in a cycle then ne ither

the same s et A connection i s its elf mo dele d as an

can an e dge e in the ob ject graph for which E i s the

ob ject calle d a connection object p ointe d to by the

template

source of the connection and in tur n p oints to the de s

We lab el e dge s which are not part of a cycle in the

tination of the connection The connections connect

schema graph as acyclic and the others as cyclic When

the atomic ob jects into a cycle with chords We will

adding an e dge e to the ob ject graph if its corre sp ond

call a comp os ite ob ject along with its pr ivate s et of

ing template e dge in the schema graph i s acyclic the

atomic ob jects connection ob jects and the do cument

strength of e i s s et to b e strong Dur ing garbage col

ob ject together as an object composite All ob ject ref

lection in Re dTravers e we do not follow strong e dge s

erence s in the b enchmark have invers e s and we always

whos e template e dge i s acyclic In the extreme cas e

ins ert or delete reference s in pairs the reference and

where the schema graph i s acyclic no e dge s are tra

its invers e

vers e d and TCRC b ehave s just like reference counting

The datas et cons i ste d of ob jects o ccupying

re ducing the cost s ignicantly

megabyte s of space Each ob ject comp os ite con

s i ste d of ob jects and had a s ize of byte s Dur

Performance Evaluation

ing the cours e of exp er iments the s ize was maintaine d

We implemente d the TCRC algor ithm and the Parti constant by adding and deleting the same amount of

tione d Mark and Sweep PMS algor ithm on an ob data The ob ject manager us e d a buer p o ol cons i sting

Parameter Value

PMS for thi s workload Three f actors contr ibute to

the overall p erformance the f requency of invo cation

NUMMODULES

of the garbage collector the overhead dur ing a garbage

NUMCOMPPERMODULE

collection pass and the overhead due to normal pro

NUMCONNPERATOMIC

ce ss ing We study the s e three f actors in detail now

NUMATOMICPERCOMP

NUMCOMPPERASSM

NUMASSMPERASSM

Invo cation Frequency

NUMASSMLEVELS

We checke d the databas e s ize at the end of every up date

pass and invoke d the garbage collector if the databas e

Table Parameters for the OO b enchmark

of KB page s The IO cost i s measure d in terms

s ize excee de d MB TCRC collects all garbage and

of the numb er of KB page s read f rom or wr itten to

therefore the amount of garbage which i s generate d at

the di sk All the complex and bas e ass emblie s form

the rate of byte s p er up date pass excee de d

ing the tree structure were clustere d together We also

MB and thus the total databas e s ize excee de d MB

clustere d together all the ob jects create d for a comp os

after s even up date pass e s Thus garbage collection in

ite

cas e of TCRC i s cons i stently invoke d after every s even

up date pass e s

For PMS the data was divide d into partitions

each partition ts in memory The interpartition ref

The patter n i s more intere sting in the cas e of PMS

erence s were kept very small All the complex and

Approximately one out of fty comp os ite s spanne d par

bas e ass emblie s forming the tree structure were put in

titions such a comp os ite which i s cyclic i s never col

the same partition Approximately one out of every

lecte d Thi s caus e d the databas e s ize to increas e with

comp os ite s spanne d partitions

time Since the thre shold remaine d xe d at MB thi s

caus e d the garbage collection to b e invoke d more f re

We conducte d two s ets of exp er iments the rst was

quently as time progre ss e d Dur ing the cours e of the

bas e d on structure mo dications sugge ste d in the OO

up date pass e s TCRC garbage collector was invoke d

b enchmark while the s econd mo die s complex ass em

time s while PMS was invoke d time s Initially the

blie s We di scuss each in tur n

PMS collector was invoke d every s even up date pass e s

then every s ix up date pass e s and by the end of the

Structure Mo dications

up date pass e s every ve up date pass e s By the end of

The workload in thi s exp er iment cons i ste d of re

the up date pass e s there were byte s of uncol

p eate dly ins erting ve ob ject comp os ite s and attaching

lecte d garbage for PMS

each comp os ite to a di stinct bas e ass embly ob ject and

then pruning the newly create d reference s to the same

Overhe ad of a Garbage Collection Pas s

ve ob ject comp os ite s we call thi s whole s et of in

s erts and delete s an up date pass Thi s corre sp onds

The table b elow give s the average IO overhead and

to the structure mo dication op erations of the OO

the amount of logs generate d by TCRC and PMS for

b enchmark Thi s workload repre s ents the cas e when

an invo cation of the collector To get the total cost the

an application create s a numb er of temp orary ob jects

gure s have to b e multiplie d by the numb er of invo ca

dur ing execution and di sp os e s them at the end of the

tions which i s for PMS and for TCRC

execution The re sults pre s ente d are over up date

pass e s intersp ers e d with garbage collection garbage

Metr ic TCRC PMS

collection i s invoke d when the databas e s ize cross e s

Logs MB

MB recall the steady state databas e s ize i s MB

IOReadWr ite

We rst pre s ent the cumulative overheads cost dur

ing dur ing normal pro ce ss ing as well as the overhead

Since garbage collection was invoke d r ight after the

due to the garbage collection thread for thi s workload

ins ertions TCRC found all the ob jects that it had to

travers e in the cache and incurre d no reads PMS

Metr ic TCRC PMS

nee de d to make a reachability scan f rom the ro ot and

Logs MB

therefore had to vi s it all of the ob jects in the

IOReadWr ite

datas et Thi s accounts for the exce ss ive reads incurre d

by PMS The logs generate d by TCRC i s however big

Although the amount of logs generate d by the ger than PMS s ince i the s ize of an ob ject i s bigger

TCRC algor ithm i s more than that of the PMS al due to the pre s ence of reference counts and therefore

gor ithm the overall IO p erformance including the the logs corre sp onding to the deletion of garbage ob

IOs for logs of TCRC i s ab out b etter than jects are larger and ii the garbage ob jects are delete d

f rom WRT and the s e deletions have to b e logge d re of ob jects TCRC had to travers e In thi s exp er iment

call that all newly create d ob jects will b e in WRT s ince we rep ort only on the overheads of the garbage collec

all new p ointers are weak tion pass The normal pro ce ss ing overheads are very

s imilar to the previous exp er iment s ince we are creat

ing some numb er of ob jects and pruning reference s to

Normal Pro ce s s ing Overhe ads

others like the previous exp er iment The cost of the

The following table shows the amount of IO p erforme d

garbage collection phas e for TCRC i s tabulate d b elow

and the amount of logs generate d dur ing normal pro

ce ss ing when the collector i s not running over the

Metr ic Level of Ro ot of Subtree

cours e of the up date pass e s

Logs MB

Metr ic TCRC PMS

IORead

Logs MB

IOWr ite

IOReadWr ite

The cost of the garbage collection phas e for PMS i s

The algor ithms have to maintain the p ers i stent data

tabulate d b elow

structure s cons i stent with the data dur ing normal pro

ce ss ing In the cas e of PMS the only p ers i stent data

Metr ic Level of Ro ot of Subtree

structure i s the IRL which i s up date d quite rarely On

the other hand in the cas e of TCRC the reference

LogsMB

counts as well as the WRT may b e up date d The

IORead

amounts of log generate d show the additional logging

IOWr ite

that has to b e p erforme d by TCRC for maintaining

the s e p ers i stent structure s The additional logs ac

The re sults show that numb er of reads by TCRC i s

count for ab out extra wr ite s for TCRC The re st

smaller than the numb er of reads by PMS for mo dica

of the extra wr ite s p erforme d by TCRC ab out

tions at the lower levels but degrade s for mo dications

are due to wr iting parts of WRT back as a re sult of

higher up the hierarchy Thi s i s exp ecte d s ince TCRC

normal cache replacement The amount of reads p er

p erforms a lo cal traversal The numb er of reads for

forme d by TCRC i s s ignicantly smaller that PMS b e

PMS i s the same for mo dications at all levels No

caus e the cache i s not di sturb e d much by the garbage

tice however that even though PMS travers e s the en

collection thread in the cas e of TCRC In the cas e of

tire graph the cost of TCRC i s s ignicantly higher

PMS at the end of the collection pass the cache could

than PMS for mo dications higher up the hierarchy

contain many ob jects f rom the ass embly tree which are

There are two reasons for thi s The rst i s that TCRC

not require d dur ing normal pro ce ss ing

reads all ob jects as it encounters the ir reference s dur

ing the traversals unlike PMS which follows only intra

Up dating Complex As s emblies

partition reference s Thi s re sults in exce ss ive read

overhead s ince there i s a lot of cache conicts for ob

In thi s s et of exp er iments we up date d the ass embly

jects on dierent page s Secondly the RRT i s di sk re s

hierarchy tree by replacing a subtree ro ote d at a com

ident and as its s ize grows there i s extra IO overhead

plex ass embly by a dierent one The lowe st level bas e

for acce ss ing RRT In contrast our implementation of

ass emblie s in the new hierarchy tree p ointe d to the

PMS assume s information ab out which ob jects in a

same comp os ite ob jects In thi s exp er iment we mo di

partition have b een marke d dur ing the mark phas e can

e d the OO b enchmark by removing the back p ointers

b e maintaine d in memory its elf

to the bas e ass embly ob jects f rom the comp os ite ob

jects Thi s provide s acyclic data which enable s us to

The amount of logs generate d by TCRC a for

te st our schema graph optimization It also limits the

the amount of logs generate d indicate s that the amount

of logs generate d i s le ss than KB grows in compar traversal of TCRC

i son to the logs generate d by PMS as the level numb er We var ie d the level of the ro ot of the the subtree

grows s ince GreenTravers e up date s p ointer strengths that we were replacing The level was var ie d f rom two

th

of ob jects which are also logge d The more the ob to s ix level n corre sp onds to the level which i s the n

jects travers e d the more the numb er of p ointers whos e level upwards f rom the bas e ass emblie s Notice that

strengths get change d In f act most of the informa the subtree that was replace d i s garbage after thi s up

tion in the logs generate d by the TCRC i s very small date After such a up date we invoke d the garbage col

e ither a p ointer strength up date an up date to WRT lector The higher the level of the ro ot of the subtree

or an up date to the reference count However each b e ing replace d the more the numb er of ob ject com

of the s e logs has a s ignicant log header overhead in p os ite s reachable and therefore the more the numb er

the Brahma system In a system which can club all Acknowle dgments

the s e logs under a s ingle log header along with the log

We thank Je Naughton and Jiebing Yu for giving

for the actual p ointer up date the overheads will come

us a vers ion of the ir garbage collection co de which

down drastically We are currently mo difying the log

provide d us ins ight into garbage collection implement

subsystem in Brahma to do thi s

ation We also thank Sandhya Jain for br inging the

The TCRC algor ithm can b e optimize d by us ing s e

work by Brownbr idge to our notice

mantics available f rom the schema graph Notice that

the template for the p ointer f rom a complex ass embly

Reference s

to a bas e ass embly i s acyclic and therefore nee d not

AFG L Amsaleg M Franklin and O Grub er Ef

b e travers e d by the Re dTravers e algor ithm thus pre

cient Incremental Garbage Collection for Client

venting TCRC f rom unnece ssar ily travers ing the ob

Server Ob ject Databas e Systems In Procs of the In

ject comp os ite s The cost of the TCRC garbage collec

ternational Conf on Very Large Databases Septem

tion pass when the exp er iment was rep eate d with thi s

b er

schemabas e d optimization are tabulate d b elow It can

+

ARS S Ashwin Prasan Roy S Se shadr i Avi Silb er

b e s een that TCRC with the optimization outp erforms

schatz and S Sudarshan Garbage Collection in Ob

the bas ic TCRC as well as the PMS algor ithm

ject Or iente d Databas e s Us ing Transactional Cyclic

Reference Counting Technical rep ort Indian Insti

Metr ic Level of Ro ot of Subtree

tute of Technology Mumbai India June

Bro DR Brownbr idge Recursive Structures in Com

LogsMB

puter Systems PhD the s i s Univers ity of Newcastle

IORead

up on Tyne Unite d Kingdom Septemb er

IOWr ite

Bro DR Brownbr idge Cyclic Reference Counting for

Combinator Machine s In JeanPierre Jouannaud

Conclus ions and Future Work

e ditor ACM Conf on Functional Programming Lan

guages and Computer Architecture page s

We have pre s ente d a garbage collection algor ithm

Spr ingerVerlag

calle d TCRC bas e d on cyclic reference counting and

CDN M Carey D DeWitt and J Naughton The

prove d it correct in the f ace of concurrent up date s and

OO Benchmark In Proc of the ACM SIGMOD

system f ailure s We have implemente d and te ste d the

Int Conf Washington DC May

algor ithm

CWZ J Co ok A Wolf and B Zor n Partition Selec

Our p erformance re sults indicate that TCRC can

tion Policie s in Ob ject Databas e Garbage Collection

b e much cheap er at least in certain cas e s than par

In Procs of the ACM SIGMOD Conf on Manage

titione d markandsweep s ince it can concentrate on

ment of Data page s May

lo cal cycle s of garbage We b elieve our algor ithm will

JL Richard E Jone s and Raf ael D Lins Cyclic

lay the foundation for cyclic reference counting in data

we ighte d reference counting Technical rep ort

bas e systems

Univers ity of Kent Canterbury Unite d Kingdom

We plan to explore s everal optimizations of the

Decemb er

TCRC algor ithm in the future For instance we

Lin Raf ael D Lins Cyclic reference counting with lazy

have obs erve d that just after creation of the datas ets

markscan Technical rep ort Univers ity of Kent

garbage collection has to p erform extra work to convert

Canterbury Unite d Kingdom June

weak p ointers into strong p ointers However once the

MWL AD Martinez R Wachenchauzer and Ra

convers ion has b een p erforme d a go o d s et of strong

f ael D Lins Cyclic reference counting with lo cal

p ointers i s e stabli she d and the further cost of garbage

markscan Information Processing Letters

collection i s quite low It would b e intere sting to de

velop bulkloading technique s for re ducing the cost of

PvEP EJH Pep els MCJD van Eekelen and MJ

s etting up p ointer strengths

Plasme ijer A cyclic reference counting algor ithm and

We plan to optimize Re dTravers e by only following

its pro of Inter nal Rep ort Univers ity of Nijme

a strong p ointer into an ob ject if all other strong p oint

gen Nijmegen

ers into that ob ject have b een already encountere d

YNY V Yong J Naughton and J Yu Storage Re

Thi s will greatly re duce the numb er of ob jects tra

clamation and Reorganization in ClientServer Per

vers e d and may lead to s ignicant p erformance b e

s i stent Ob ject Store s In Proc of the Data Engineer

nets Finally another intere sting extens ion of the

ing Int Conf page s February

TCRC algor ithm would b e to develop a partitione d

TCRC algor ithm in which dur ing a lo cal mark and

sweep only intrapartition e dge s are travers e d