Packer: an Innovative Space-Time-Efficient Parallel Garbage Collection Algorithm Based on Virtual Spaces

Packer: an Innovative Space-Time-Efficient Parallel Garbage Collection Algorithm Based on Virtual Spaces Shaoshan Liu 1, Ligang Wang 2, Xiao-Feng Li 2, and Jean-Luc Gaudiot 1 1 EECS, University of California, Irvine 2 Intel China Research Center ABSTRACT— The fundamental challenge of garbage compacting GCs are introduced. Compaction collector (GC) design is to maximize the recycled space with eliminates fragmentation in place by grouping live minimal time overhead. For efficient memory management, objects together in the heap and freeing up large in many GC designs the heap is divided into large object contiguous spaces for future allocation. As multi-core space (LOS) and non-large object space (non-LOS). When architectures prevail, parallel compaction algorithms one of the spaces is full, garbage collection is triggered even though the other space may still have a lot of free room, thus have been designed to achieve time efficiency. leading to inefficient space utilization. Also, space However, large object compaction is hard to parallelize partitioning in existing GC designs implies different GC due to strong data dependencies such that the source algorithms for different spaces. This not only prolongs the object can not be moved to its target location until the pause time of garbage collection, but also makes collection object originally in the target location has been moved not efficient on multiple spaces. To address these problems, out. Especially, the parallelism is seemingly inadequate we propose Packer, a space-and-time-efficient parallel when there are few large objects. garbage collection algorithm based on the novel concept of In this paper, we propose Packer, a space-time- virtual spaces. Instead of physically dividing the heap into efficient parallel garbage collection algorithm based on multiple spaces, Packer manages multiple virtual spaces in one physically shared space. With multiple virtual spaces, the novel concept of virtual spaces. Unlike some Packer offers the advantage of efficient memory management. conventional garbage collectors [2] that physically At the same time, with one physically shared space, Packer divide the heap into multiple spaces, Packer manages avoids the problem of inefficient space utilization. To reduce multiple virtual spaces in one physical space. With the garbage collection pause time of Packer, we also propose multiple virtual spaces, Packer offers the advantage of a novel parallelization method that is applicable to multiple efficient memory management, so that different virtual virtual spaces. We reduce the compacting GC parallelization spaces can employ best suitable collection algorithms. problem into a tree traversal parallelization problem, and Meanwhile, with one physical space, Packer avoids the apply it to both normal and large object compaction. problem of inefficient space utilization, since there is no space partitioning problem any more. Object 1. INTRODUCTION allocation is highly efficient in Packer. The free space in the physical heap is centrally controlled by a tree Garbage collection technology has been widely used in structure. When one of the virtual spaces needs more managed runtime systems, such as Java virtual space, then it searches the tree to fetch a suitable free machine (JVM) and Common Language Runtime region. In particular, normal object allocation is done (CLR) systems. For efficient memory management, a in thread local blocks with bump-pointers, requiring no modern high performance GC usually manages large synchronization. Garbage collection is triggered only and normal objects separately such that the heap is when the heap contains no free region, thus divided into large object space (LOS) and non-large guaranteeing that the heap is fully utilized. Packer object space (non-LOS). However, the object size supports both compaction and mark-sweep for large distribution varies from one application to another and objects. Hence, it incorporates the advantages of both from one execution phase to the next even in one the Mark-Sweep and Compaction algorithms, and is application, thus it is impossible to predefine a proper able to achieve high performance when either heap partitioning for LOS and non-LOS. Existing GCs algorithm is suitable. To further reduce the garbage with separate allocation spaces mostly suffer from the collection pause time of Packer, we reduce the problem that they do not fit well with the dynamic compacting GC parallelization problem into a tree variations of object size distribution at runtime. This traversal parallelization problem, and apply it to both problem leads to imbalanced space utilization and thus normal and large object compaction. negatively impacts the overall GC performance. For In this paper we present the design details of the garbage collection algorithms, conventional mark- proposed algorithms and evaluate their efficiencies. sweep and reference counting collectors are susceptible These algorithms are implemented in Apache Harmony, to fragmentation. To address this problem, copying or a product-quality open source JAVA SE implementation. The rest of the paper is organized as fundamentally different: they have different design follows. Section 2 discusses the related work. Section 3 goals, evaluation metrics, and algorithms: concurrent introduces the basic algorithm designs of Packer. GC is designed to reduce pause time, whereas STW Section 4 presents the parallelization of normal and GC is designed to increase throughput. In this paper, large object compaction in Packer. Section 5 presents we only focus on STW GC design. the evaluation results with Specjbb2005 and Dacapo For efficient memory management, Caudill and benchmark suites. Finally, section 6 summarizes the Wirfs-Brock first propose to use separate spaces to project and discusses future work. manage objects of different sizes, large object space (LOS) for large objects and non-large object space 2. RELATED WORK (non-LOS) for normal objects [1]. Hicks et al . have done a thorough study on large object spaces [2]. The Conventional garbage collectors utilize Mark-Sweep results of this study indicate three problems for LOS algorithms [14, 15, 16] to manage the whole heap. designs. First, LOS collection is hard to parallelize. When the heap has no more free space, GC is triggered Second, LOS shares the same heap with non-LOS, thus and the collectors start to trace the heap. If an object it is hard to achieve full utilization of the heap space. can be reached, then it is a live object and the object Third, LOS and non-LOS collections are done in header is marked. After tracing, all unmarked objects different phases, which may affect the scalability of are swept and their occupied spaces are recycled. The parallel garbage collection. In [3], Soman et al . discuss free space in the heap is managed with a linked list. about applying different GC algorithms in the same During allocation, the allocators traverse this linked list heap space, but their work does not involve to fetch a suitable free region. The main advantage of dynamically adjusting the heap partitioning. The study this algorithm is that no data movement is necessary, done by Barrett and Zorn [4] is the only known such that it incurs low overhead during garbage publication that studies space boundary adjustment, collection. However, it also has three disadvantages: and their work aims at meeting the resource constraints first, this algorithm introduces heavy fragmentation on such as pause time. By contrast, Packer does not the heap, leading to inefficient space utilization. require any boundary adjustment mechanism. Instead, Second, its fragmentation property destroys the spatial it manages different virtual spaces in the same physical locality of data allocation, leading to inefficient data space such that it avoids the problem of inefficient access. Last, each data allocation requires a linear space utilization while keeping the advantage of search on the linked list that manages the free regions, efficient memory management. thus it is inefficient. As exemplified by the LISP2 algorithm [5], 3. BASIC ALGORITHM DESIGNS IN compaction algorithms are utilized in GC designs to address the disadvantages of mark-sweep algorithms. PACKER However, compaction usually imposes lengthy pause In this section, we first introduce the basic heap design time. To reduce pause time, several parallel of Packer and compare it to other heap designs. Then compaction algorithms have been proposed. Flood et we present the data allocation scheme and garbage al . [6] present a parallel compaction algorithm that runs collection algorithm in Packer. At last, we discuss three passes over the heap. First, it determines a new further implications of the Packer design. location for each object and installs a forwarding pointer, second it fixes all pointers in the heap to point 3.1 The Basic Design of Packer to the new locations, and finally, it moves all objects. To make this algorithm run in parallel, the heap is split As shown in Figure 1, with the Move-Compact into N areas such that N threads are used to compact algorithm (GC-MC), when the heap is partitioned into the heap into N/2 chunks of live objects. The main multiple spaces, for instance LOS and non-LOS, disadvantage of this design is that the resulted free garbage collection is triggered when either space is full. space is noncontiguous. Abuaiadh et al. [7] propose a In times when garbage collection is triggered by one three-phase parallel compactor that uses a block-offset space while the other space is partially filled, the heap array and mark-bit table to record the live objects is not fully utilized. Consequently, it leads to more moving distance in blocks. Kermany and Petrank [8] frequent collections and lower performance. propose the Compressor that requires two phases to compact the heap; also Wegiel and Krintz [9] design the Mapping Collector with nearly one phase. Both approaches depend on the virtual memory support from the underlying operating system. Note that concurrent Figure 1: Compacting GC (GC-MC) with separate GC and Stop-The-World (STW) GC designs are allocation spaces The key question here is why one space would get full before the other one does.

Load more