Garbage Collection in Object Oriented Databases Using
Total Page:16
File Type:pdf, Size:1020Kb
Garbage Collection in Ob ject Or iente d Databas e s Us ing Transactional Cyclic Reference Counting S Ashwin Prasan Roy S Se shadr i Avi Silb erschatz S Sudarshan 1 2 Indian Institute of Technology Bell Lab orator ie s Mumbai India Murray Hill NJ sashwincswi sce du avib elllabscom fprasans e shadr isudarshagcs eiitber netin Intro duction Ob ject or iente d databas e s OODBs unlike relational Ab stract databas e s supp ort the notion of ob ject identity and ob jects can refer to other ob jects via ob ject identi ers Requir ing the programmer to wr ite co de to track Garbage collection i s imp ortant in ob ject ob jects and the ir reference s and to delete ob jects that or iente d databas e s to f ree the programmer are no longer reference d i s error prone and leads to f rom explicitly deallo cating memory In thi s common programming errors such as memory leaks pap er we pre s ent a garbage collection al garbage ob jects that are not referre d to f rom any gor ithm calle d Transactional Cyclic Refer where and havent b een delete d and dangling ref ence Counting TCRC for ob ject or iente d erence s While the s e problems are pre s ent in tradi databas e s The algor ithm i s bas e d on a var i tional programming language s the eect of a memory ant of a reference counting algor ithm pro leak i s limite d to individual runs of programs s ince p os e d for functional programming language s all garbage i s implicitly collecte d when the program The algor ithm keeps track of auxiliary refer terminate s The problem b ecome s more s er ious in p er ence count information to detect and collect s i stent ob ject store s s ince ob jects outlive the programs cyclic garbage The algor ithm works correctly that create and acce ss them Automate d garbage col in the pre s ence of concurrently running trans lection i s e ss ential in an ob ject or iente d databas e to actions and system f ailure s It do e s not ob protect f rom the errors mentione d ab ove In f act tain any long term lo cks thereby minimizing the Smalltalk binding for the ODMG ob ject databas e interference with transaction pro ce ss ing It standard require s automate d garbage collection us e s recovery subsystem logs to detect p ointer We mo del an OODB in the standard way as an ob up date s thus exi sting co de nee d not b e re ject graph where in the no de s are the ob jects and the wr itten Finally it exploits schema informa arcs are the reference s b etween ob jects The graph has tion if available to re duce costs We have im a p ers i stent root All ob jects that are reachable f rom plemente d the TCRC algor ithm and pre s ent the p ers i stent ro ot or f rom the trans ient program state re sults of a p erformance study of the imple of an ongoing transaction are live while the re st are mentation garbage We often call ob ject reference s as pointers There have b een two approache s to garbage collec Currently at the University of Wisconsin Madison tion in ob ject or iente d databas e s Copying Col lector Permission to copy without fee al l or part of this material is bas e d YNY and Mark and Sweep bas e d AFG granted provided that the copies are not made or distributed for direct commercial advantage the VLDB copyright notice and The copying collector algor ithm travers e s the entire ob the title of the publication and its date appear and notice is ject graph and copie s live ob jects into a new space the given that copying is by permission of the Very Large Data Base entire old space i s then reclaime d In contrast the Endowment To copy otherwise or to republish requires a fee Mark and Sweep algor ithm marks all live ob jects by andor special permission from the Endowment travers ing the ob ject graph and then travers e s sweeps Pro cee dings of the rd VLDB Conference Athens Greece the entire databas e and delete s all ob jects that are un marke d The copying collector algor ithm reclusters ob ing algor ithm to handle cyclic data have b een prop os e d jects dynamically the recluster ing can improve lo cality in the programming language community including of reference in some cas e s but may de stroy program Bro Bro PvEP More recent work in thi s mer sp ecie d cluster ing re sulting in wors e p erformance area include s Lin MWL JL in other cas e s The garbage collection algor ithms of In thi s pap er we cons ider a vers ion of reference YNY as well as AFG handle concurrency con counting prop os e d by Brownbr idge Bro Bro trol and recovery i ssue s for functional programming language s which handle s s elf referential cycle s of garbage We pre s ent an al With b oth the ab ove algor ithms the cost of tra gor ithm calle d Transactional Cyclic Reference Count vers ing the entire ob ject graph can b e prohibitively ing TCRC bas e d on Brownbr idges algor ithm which exp ens ive for databas e s larger than the memory s ize i s suitable for garbage collection in an OODB The sa particularly if there are many crosspage reference s lient feature s of the TCRC algor ithm are In the worst cas e when the buer s ize i s a small f rac tion of the databas e s ize and ob jects in a page refer to It detects all s elf referential cycle s of garbage un ob jects in other page s only there may b e an IO for like bas ic reference counting and the partitione d every p ointer in the databas e To alleviate thi s prob garbage collection algor ithms lem earlier work YNY AFG has attempte d to divide the databas e into partitions cons i sting of a few It p erforms a very lo calize d vers ion of markand page s Each partition store s interpartition reference s sweep to handle cyclic data with each markand that i s reference s to ob jects in the partition f rom ob sweep likely to acce ss f ar fewer ob jects than a jects in other partitions in a p ers i stent data structure global markandsweep Thus it do e s not have Ob jects referre d to f rom other partitions are treate d to examine the entire databas e while collecting as if they are reachable f rom the p ers i stent ro ot and garbage except in the worst cas e are not garbage collecte d even if they are not referre d to f rom within the partition Each partition i s garbage It allows transactions to run concurrently and collecte d indep endent of other partitions reference s to do e s not obtain any long term lo cks thereby min ob jects in other partitions are not followe d Thus par imizing interference with transaction pro ce ss ing titioning make s the traversal more ecient the smal It i s integrate d with recovery algor ithms and ler the partition the more ecient the traversal with works correctly in spite of system crashe s It also maximum eciency o ccurr ing if the whole partition ts us e s recovery subsystem logs to detect p ointer up into the buer space date s thus exi sting application co de nee d not b e Unfortunately small partitions increas e the probab rewr itten ility of s elfreferential cycle s of garbage that cross par tition b oundar ie s such cyclic garbage i s not detecte d It exploits schema information if available to re by the partitione d garbage collection algor ithms Pre duce costs In particular if the schema graph i s vious work has maintaine d that such cross cycle struc acyclic no cyclic reference s are p oss ible in the ture s will b e few and will probably not b e a prob databas e and TCRC b ehave s identically to refer lem However s imulations by CWZ showe d that ence counting even small increas e s in databas e connectivity can pro duce s ignicant amounts of such garbage Therefore A pro of of correctne s s of the TCRC algor ithm i s + it i s not clear that partition s ize s can b e made very pre s ente d in ARS De s igning a cyclic referencing small without e ither f ailing to collect large amounts counting algor ithm which allows concurrent up date s of garbage or employing sp ecial and exp ens ive tech and handle s system crashe s i s rather nontr ivial and nique s to detect such cyclic garbage to our knowle dge has not b een done b efore we b elieve thi s i s one of the central contr ibutions of our pap er A natural alter native i s Reference Counting Refer A problem often cite d against reference counting ence Counting i s bas e d on the idea of keeping a count of the numb er of p ointers p ointing to each ob ject When scheme s i s the overhead of up dating reference counts the reference count of the ob ject b ecome s zero it i s However each p ointer up date can only re sult in at garbage and eligible for collection Reference count most one reference count b e ing up date d Thi s over ing has the attractive prop ertie s of lo calize d and in head will have only a small impact on p erformance if as we exp ect i s true in any reali stic scenar io p ointer cremental pro ce ss ing Unfortunately bas ic reference up date s are only a small f raction of the overall up counting cannot deal with s elfreferential cycle s of ob jects each ob ject could have a p os itive reference count date s For TCRC moreover the overhead i s os et by yet all the ob jects in the cycle may b e unreachable f rom the re duce d cost of traversals while collecting garbage the p ers i stent ro ot and therefore b e garbage However We have implemente d a prototyp e of the TCRC al a numb er of extens ions of the bas ic referencing count gor ithm as well as the partitione d mark and sweep algor ithm on a storage manager calle d Brahma de and are delete d velop e d in I IT Bombay We pre s ent a p erformance However pr ior cyclic reference counting algor ithms study of TCRC bas e d on the implementation the study including Brownbr