Generational Reference Counting: a Reduced-Communication Distributed Storage Reclamation Scheme

Generational Reference Counting: A Reduced-Communication Distributed Storage Reclamation Scheme Benjamin Goldberg Department of Computer Science New York University 251 Mercer Street New York, NY 10012 Abstract study of efficient distributed methods for storage reclamation. New reclamation algorithms are es- This paper describes generational reference pecially important for loosely-coupled (distributed counting, a new distributed storage reclamation memory) multiprocessors in which each processor is scheme for loosely-coupled multiprocessors. It has a responsible for allocating and reclaiming structures significantly lower communication overhead than dis- residing in its local memory. The reclamation schemes tributed versions of conventional reference counting. that were suitable for uniprocessors must be altered to Although generational reference counting has greater work on loosely-coupled multiprocessors. Additional computational and space requirements than ordinary message passing and synchronization is generally re- reference counting, it may provide a significant sav- quired to maintain the correctness of these schemes. ing in overall execution time on machines in which Many current loosely-coupled multiprocessors have message passing is expensive. a very large communication cost associated with send- The communication overhead for generational ref- ing a message. Each message may consume hundreds erence counting is one message for each copy of an in- or even thousands of processor cycles. For many dis- terprocessor reference (pointer). Unlike conventional tributed algorithms, it is worth sacrificing space and reference counting, when a reference to an object is computation in order to avoid communication. copied no message is sent to the processor on which In this paper we present a distributed storage recla- the object lies. A message is sent only when a refer- mation algorithm that makes such a sacrifice. The ence is discarded. Unfortunately, generational refer- algorithm, called generational reference counting ence counting shares conventional reference counting’s or GRC, is based on the conventional reference count- inability to reclaim cyclical structures. ing scheme. Although it has somewhat greater storage In this paper, we present the generational reference and computational requirements then reference count- counting algorithm, prove it correct, and discuss some ing, GRC has one-third the communication overhead refinements that make it more efficient. We also com- of distributed reference counting. pare it with weighted reference counting, another dis- The GRC algorithm was developed independently tributed reference counting scheme described in the of the work on weighted reference counts[3,17] and literature. has similar properties, namely a communication overhead of one message per interprocessor reference and extra space associated with each reference. 1 Introduction The relative merits of reference counting versus garbage collection have been well argued (see [5] for The emergence of LISP and functional language im- a survey of the field). A method based on reference plementations for multiprocessors has encouraged the counting is of interest to us for the usual reasons: 0 It is incremental: Storage reclamation occurs throughout the computation. The computation Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM does not have to be interrupted for a significant copyright notice and the title of the publication aqd its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. A preliminary draft of this paper was presented to the To copy otherwise, or to republish. requires a fee and/or specific permission. Fourth Conference on Hypercubes, Concurrent Computers, and 0 1989 ACM O-8979 I -306-X/89/0006/03 13 I I SO Applications. 313 period of time for storage reclamation to proceed. ence count. Likewise, if a remote reference is dis- This is especially valuable in real time applica- carded then a decrement message must be sent. How- tions. ever, this simple extension of reference counting does not preserve the correctness of uniprocessor reference l The storage reclamation overhead depends on the amount of garbage in the system. This is not counting. It is possible for an object to be reclaimed the case with mark/sweep garbage collection. If while references to it still exist. memory is nearly full (with useful data), garbage Suppose an object o resides on a processor P, and collection will have to be invoked often and at the only o-reference p resides on a remote processor great expense. PP. In this case o’s reference count will be one. Now suppose that p is copied to create a new o-reference q l Reference counting, even in loosely-coupled mul- that resides on a third processor Pp. When p is copied, tiprocessors, is relatively simple. This makes Pp sends an increment message, call it IP, to PO. At storage reclamation easier to implement and some point after q arrives at Pq, q will be discarded. prove correct. When q is discarded, Pq sends a decrement message A number of interesting algorithms based on D, to PO. Likewise, a decrement message D, will mark/sweep garbage collection have been proposed to eventually be sent to PO when p is discarded. The solve the problems listed above. In particular, these following two scenarios result in o’s reference count algorithms have striven to make mark/sweep garbage being prematurely set to zero. collection incremental, both for uniprocessor and mul- tiprocessor architectures [8,1,5,12,2,13]. These algo- If q is discarded soon after being created, D, may rithms, however, have tended to be rather complex arrive at PO before Ip, even though IP was sent and require elaborate proofs of correctness. In addi- first. A decrement is performed on o’s reference tion, some incremental mark/sweep garbage collection count making its value zero even though p still schemes require at least two concurrent processes, a exists. mutator and a collector. In current loosely-coupled If p is discarded soon after q is created, Dp could multiprocessors, both of these processes would have arrive before IP . This can only happen in systems to be implemented (via context-switching) by each in which messages do not necessarily arrive in the processor in the system. order in which they are sent. This would result in We find distributed reference counting especially at- o’s reference count being set to zero even though tractive for its simplicity. The disadvantage of refer- Q still exists. ence counting is that it cannot collect cyclical structures. Although the algorithm we present in this pa- A solution to these problems was published in [16]. per has the same problem, the modifications that have The protocol described in that paper is as follows: been proposed [9,4,6] for reclaiming cyclical structures using uniprocessor reference counting are also applica- l In order to solve the second problem listed above, ble to generational reference counting (with the con- the protocol assumes that the architecture pre- comitant loss of simplicity). serves the order of messages sent by each processor. That is, if two messages are sent from pro cessor Pi to processor Pj, they will arrive at Pj 2 Distributed Reference in the order in which Pi sent them. This assump- tion can be enforced either if the system provides Counting fixed routing of messages or if messages have tags that indicate the order in which they are sent. Distributed reference counting is a simple extension to uniprocessor reference counting in which the each l When p is copied, Pp sends a create message to object o keeps count of the number of outstanding P, indicating that a new o-reference has been cre- references (pointers) to it. When a reference to o ated. When q is created on P4, Pq sends an ac- (which we will refer to as an o-reference after [IS]) is knowledge message to PO. Later, when q is dis- copied, o’s reference count is incremented. When an carded, Pq sends a delete message to PO (notice o-reference is discarded, o’s reference count is decre- that the delete message cannot be received before mented. If o’s reference count becomes zero, then o is the acknowledge message). garbage and can be reclaimed. The object o keeps track of the number of create, On a loosely-coupled system, the creation of a new acknowledge, and delete messages it receives and o-reference on a remote processor requires that a mes- is only reclaimed when an equal number of each sage be sent to o in order to increment o’s refer- has been received. 314 While this protocol does provide a correct dis- each. Thus, a and b (and any of their descendants) tributed reference counting scheme, it requires an access o through i, using an extra level of indirec- overhead of three messages per interprocessor refer- tion. When any of these references are discarded, the ence. delete messages are sent to i and i’s reference count is reduced appropriately. When i’s reference count 2.1 Weighted Reference Counts becomes zero a delete message containing a weight of one (a’s original weight) is sent to the object. Weighted Reference Counting (WRC) is a distributed An unfortunate aspect of the use of an indirection reference counting scheme recently presented in [3] pointer is that a reference, its indirection pointer, and and [17] (although Watson&Watson attribute it to the object may reside on different processors. If so, [18]) in which only one message is required for each accessing the object would require an extra message. interprocessor reference. The WRC scheme associates In section 4.2 we describe how this can be avoided a weight with each reference such that the sum of the (but only at the price of extra space).

Load more