Gang Scheduling Java Applications with Tessellation
Total Page:16
File Type:pdf, Size:1020Kb
Gang Scheduling Java Applications with Tessellation ∗ Benjamin Le Jefferson Lai Wenxuan Cai John Kubiatowicz {benjaminhoanle, jefflai2, wenxuancai}@berkeley.edu, [email protected] Department of Electrical Engineering and Computer Science University of California, Berkeley Berkeley, CA 94720-1776 ABSTRACT 1. INTRODUCTION Java Virtual Machines (JVM) are responsible for perform- The Java Virtual Machine (JVM) abstraction provides a ing a number of tasks in addition to program execution, consistent, contained execution environment for Java appli- including those related to garbage collection and adaptive cations and has played a significant role in Java's popularity optimization. Given the popularity of the Java platform and success in recent years. The JVM abstraction encapsu- and the rapid growth in use of cloud computing services, lates many tasks and responsibilities that make development cluster machines are expected to host a number of concur- faster and easier. For example, by handling the translation rently executing JVMs. Despite the virtual machine ab- of portable Java bytecode to machine code specific to the straction, interference can arise between JVMs and degrade lower level kernel and system architecture, the JVM allows performance. One approach to prevent such multi-tenancy developers to write an application once and run it in a num- issues is the utilization of resource-isolated containers. This ber of different environments. While there exist a number of paper profiles the performance of Java applications execut- JVM implementations, maintained both privately and pub- ing within the resource-isolated cells of Adaptive Resource- licly, in general, implementations adhere to a single JVM Centric Computing and the Tessellation OS. Within this specification. We refer the reader to [16] for a complete setup, we demonstrate how overall performance and JVM specification, but summarize the roles of two key, highly efficiency for the DaCapo benchmarks and Cassandra on researched components of the JVM: the garbage collector the Yahoo Cloud Serving Benchmark degrade. We compare (GC) and the adaptive optimization component of the \just the effect of gang scheduling the threads of a cell together in time" (JIT) compiler [20]. with using a simple global credit scheduler. We show that Garbage collection is the process of cleaning up unused from the limited results we observed, there is no convincing memory so that developers do not need to manage mem- evidence of gang scheduling yielding any kind of benefit, but ory themselves. While this makes development easier and also that a more complete evaluation may still be needed. can drastically reduce the number of memory related bugs, garbage collectors are very complex and consumes resources, Categories and Subject Descriptors including time. Many techniques for garbage collection have been developed, from simple reference counting, to paral- D.4.1 [Operating Systems]: Process Management { Schedul- lelized stop-and-copy, to generational garbage collection [17]. ing; D.4.8 [Operating Systems]: Performance { Measure- Each technique differs in how to locate \live" objects, when ments, Monitors to run, whether and when execution needs to be halted, what memory needs to be touched, and how objects in memory General Terms should be moved. The JIT compiler, on the other hand, is responsible for Multicore, parallel, quality of service, resource containers compiling segments of bytecode into native machine instruc- tions. Modern JIT compilers are extended with adaptive op- Keywords timization techniques [23], which attempt to optimize per- Adaptive resource management, performance isolation, qual- formance by dynamically compiling and caching these seg- ity of service ments of code during run time. The result is a drastic perfor- mance improvement over simply interpreting the bytecode. ∗The Parallel Computing Laboratory, UC Berkeley, CA, Exactly how much compilation occurs before versus after a USA program begins executing and how much of a given program is interpreted varies across JVMs and significantly affects the performance characteristics of the program. While more advanced optimization results in higher performing code, as with GC techniques, it generally also requires more resources Permission to make digital or hard copies of all or part of this work for to perform, incurring higher overhead and impact on the personal or classroom use is granted without fee provided that copies are program's performance. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to Together, garbage collection and adaptive optimization republish, to post on servers or to redistribute to lists, requires prior specific require the JVM to perform many background tasks in ad- permission and/or a fee. dition to program execution. While this design works well Copyright 2014 ACM X-XXXXX-XX-X/XX/XX ...$15.00. when there is a small number of Java applications on a ma- make the, we believe valid, assumption that our discussion chine, it may not scale well when there are many Java appli- applies wholly to the OpenJDK implementation. cations running concurrently on a single machine. This type The HotSpot JVM is designed as a high-performance en- of situation can often be found on the machines in cloud com- vironment for Java applications with a focus on performance puting clusters. Given the ubiquity of Java based software optimization. On top of implementing the features required applications, as demand for cloud computing and hosting by the aforementioned Java Virtual Machine specification, services increase, understanding the types of multi-tenancy HotSpot provides a number of options and operating modes issues that may arise becomes critical to progress. For ex- for tuning for different use cases. This focus is embodied ample, a machine in the cloud could be asked to host multi- in HotSpot's implementation of the garbage collector and ple instances of Hadoop File System (HDFS) [22], Hadoop adaptive optimizer. We now summarize the main features MapReduce [4], and Spark [25], simultaneously, with each of these two components and describe how the synchroniza- instance having a dedicated JVM. As all of these long-running tion characteristics of each make gang scheduling an appeal- JVMs must ultimately be multiplexed onto a single set of ing option. hardware, interference may arise between the tasks of each JVM. For certain applications, such as those with strict tim- 2.1.1 HotSpot Garbage Collection ing requirements or quality of service (QoS) guarantees, this The garbage collector in HotSpot is a generational garbage interference may lead to unacceptable performance. collector [19] that partitions the heap into two generations: In addition to investigating the significance of the issues the young generation and the old generation. The idea be- named above, we consider a potential solution made possi- hind this is that most allocated objects have short lives and ble with the Tessellation operating system architecture and do not survive more than a small number of garbage col- the Adaptive Resource-Centric Computing (ARCC) system lections. Thus, efficiency is improved by primarily allocat- design paradigm [6, 7, 18]. In ARCC, applications execute ing from the smaller young generation and collecting this within stable, isolated resource containers called cells. In ad- frequently while holding long-living, tenured objects in the dition to implementing cells, the ARCC-based Tessellation larger old generation and collecting it less frequently. These kernel uses two-level scheduling, which decouples the allo- significantly differing characteristics mean that the young cation of resources to cells (the first level) from scheduling and old generations can, and should, be collected using dif- how these resources are used within cells (the second level). ferent collection algorithms in the interest of performance In this paper, we apply and evaluate gang scheduling [9] as and efficiency. a second level scheduling policy. In particular, we execute The young generation is separated into an eden space and a single application in each cell and run multiple cells on two survivor spaces. Most new objects are allocated from a single, multi-core machine, gang scheduling each cell's re- the eden space.1 When that becomes full a minor collection sources together. By measuring the performance of these occurs, which collects space from dead objects and moves applications with and without gang scheduling, we show the them between the survivor spaces. At the end of the collec- effect of gang scheduling at mitigating interference for Open- tion the only one survivor space contains active data. These JDK's HotSpot JVM. are the live objects that survived collection but have not The remainder of this paper is organized as follows. Sec- been promoted to the old generation.2 HotSpot allows this tion 2 provides background on the HotSpot JVM and the collection to be performed serially or in parallel. While the Tesselation architecture. Section 3 and section 4 describe serial version may be applicable for small scale, personal our experimental setup and results with two different Java uses, we focus on the the parallel version, which is more ap- benchmark suites. Section 5 describes our results when using propriate and the default for the server class machines we a gang scheduler. In section 6 we present a survey of related are interested in. In both cases, minor