Dynamically Loaded Classes As Shared Libraries: an Approach to Improving Virtual Machine Scalability
Total Page:16
File Type:pdf, Size:1020Kb
Dynamically Loaded Classes as Shared Libraries: an Approach to Improving Virtual Machine Scalability Bernard Wong Grzegorz Czajkowski Laurent Daynès University of Waterloo Sun Microsystems Laboratories 200 University Avenue W. 2600 Casey Avenue Waterloo, Ontario N2L 3G1, Canada Mountain View, CA 94043, USA [email protected] {grzegorz.czajkowski, laurent.daynes}@sun.com Abstract machines sold today have sufficient amount of memory to hold a few instances of the Java virtual machine (JVMTM) Sharing selected data structures among virtual [2]. However, on servers hosting many applications, it is machines of a safe language can improve resource common to have many instances of the JVM running utilization of each participating run-time system. The simultaneously, often for lengthy periods of time. challenge is to determine what to share and how to share Aggregate memory requirement can therefore be large, it in order to decrease start-up time and lower memory and meeting it by adding more memory is not always an footprint without compromising the robustness of the option. In these settings, overall memory footprint must be system. Furthermore, the engineering effort required to kept to a minimum. For personal computer users, the main implement the system must not be prohibitive. This paper irritation is due to long start-up time, as they expect good demonstrates an approach that addresses these concerns TM responsiveness from interactive applications, which in in the context of the Java virtual machine. Our system contrast is of little importance in enterprise computing. transforms packages into shared libraries containing C/C++ programs typically do not suffer from excessive classes in a format matching the internal representation memory footprint or lengthy start-up time. To a large used within the virtual machine. We maximize the number extent, this is due to the intensive use of shared libraries. of elements in the read-only section to take advantage of Applications share a set of shared libraries that only needs cross-process text segment sharing. Non-shareable data to be loaded once into memory and is simultaneously are automatically replicated when written to due to the mapped into the address spaces of several running operating system's streamlined support for copy-on-write. processes. This, combined with demand paging, which Relying on the existing shared libraries manipulation loads only the actually used parts of the libraries, can infrastructure significantly reduces the engineering effort. significantly reduce the physical memory usage. Furthermore, performance and start-up time are improved as the required data is already loaded into memory and 1. Introduction does not need to be retrieved from disk. The JVM itself is typically compiled as a shared TM The Java programming language [1] has, in less than library. However, the actual programs that are a decade, become very popular for use in the development dynamically loaded from disk or network are stored in the of a variety of personal and enterprise-level applications. private memory space of each process. This can account Its features, ease of use, and “write once run anywhere” for a significant amount of the total memory usage. methodology make the lives of programmers easier and Sharing certain data structures across virtual machines can considerably reduce program development time. may lower the memory footprint, as well as improve the These characteristics are often overshadowed by the Java performance and decrease the start-up time of platform's performance, as some users feel that its start-up applications if the format in which the data is shared time is too slow, its memory footprint is too large, and requires less effort to transform into the run-time format that in general the underlying resources are not used than standard Java class files. optimally. One approach to sharing is to execute multiple Different types of users are affected in varying degrees applications in the same instance of the JVM, modified to by these problems. For the majority of personal computer remove inter-application interference points [3], [4]. users, footprint is not necessarily a critical issue as most Virtually all data structures can be shared among computations, and the resource utilization improves according to the transformations described in Section 4. In dramatically when compared to running multiple particular, the shared libraries contain bytecodes and unmodified JVMs. However, if any application causes the constant pools in a format similar to virtual machine's virtual machine to crash or malfunction, all of the other internal representation of these data structures, and programs executing in the faulty JVM will be affected. A auxilliary data structures used to quickly construct other different approach to sharing is demonstrated by ShMVM portions of run-time class representation. Read-only and (for “Shared Memory Virtual Machine”) [5], where read-write sections of the shared libraries contents are certain meta-data is shared among virtual machines, each separated, with the intent to maximize the amount of read- of which executes in a separate operating system (OS) only items (Fig. 1). process. The shared region can dynamically change, The generation process consists of two steps. First, which poses various stability and robustness problems. In each class named either explicitly or implicitly through its fact, a corruption of the shared region can lead to the package name is transformed into an object file, whenever cascading failures of all virtual machines participating in necessary invoking native methods that interface with a the sharing. The design of ShMVM required complex library of shared object manipulation routines. Then all engineering, in part due to the aggressive choices with object files corresponding to classes of the same package respect to what to share. are linked into a shared library, named after the package This work is motivated by the systems mentioned name (e.g., libjava.util.so). Lastly, class files are inserted above, and in particular by the lessons learned from verbatim into the object file as read-only entries. This is ShMVM. We show a design for cross-JVM meta-data needed to compute certain transient load-time information sharing which is simpler, relies on existing software and and to construct string symbols. The addition of class files OS mechanisms, and matches the performance increases the libraries size by between 16-34%, all of improvements of ShMVM. At the same time, there is no which is read-only data, and thus of low impact. loss of reliability of the underlying JVM. At this point the libraries are ready to be used when The main idea is to encode Java class files as ELF SLVM executes applications. The information stored in shared libraries, in a parsed, verified, and mostly resolved the generated shared libraries describes the transformed format almost identical to the internal representation used classes completely, and is sufficient to build a runtime by the JVM. Each application executes in its own virtual representation of classes during program execution. machine in a separate OS process, mapping such shared SLVM is a fully-compliant JVM and can execute libraries into its address space. Efficient use of resources ordinary classes, whether stored as separate files or in jar is accomplished by maximizing the read-only parts of the archives, but the performance and memory utilization libraries, which enables cross-process sharing of read-only improvements are realized only when the classes are physical memory pages. Run-time system robustness is encoded as shared libraries. These can co-exist: some preserved via relying on the copy-on-write mechanism to classes can be loaded as class files while the others can be lazily replicate process-private data. Finally, relative ease encoded in a shared library. of engineering is a result of using standard shared libraries To use the pre-generated shared libraries, SLVM manipulation tools. should be executed with the -XSLVMObjectPath:{Path to Challenges of this approach include storing meta-data object files} flag. Whenever a class needs to be loaded, outside of the regular heap, modifying the garbage the virtual machine tries first to locate a shared library collector to operate correctly on the modified heap layout, with the required class; if such a library does not exist, avoiding internal fragmentation in the generated shared regular class loading takes place. An optional libraries, and maximizing the amount of pre-initialization -XSLVMOnly flag can be specified to ensure that the and read-only data structures in the class-encoding shared classes are taken from the shared libraries only. This libraries. The prototype, SLVM (“Shared Library Virtual causes SLVM to exit if any required class can not be Machine”), based on the Java HotSpotTM virtual machine found in the shared libraries. (or HVSM) [6], is a fully compliant JVM. Except for the optional -XSLVM flags described above, SLVM can be executed in the same way as HSVM. At the 2. Overview of SLVM virtual machine and OS levels, the difference between HSVM and SLVM becomes apparent, as the use of This section gives an overview of using SLVM. The SLVM's shared libraries moves certain meta-data outside virtual machine can be run in two modes, (i) to generate of the heap, and the libraries themselves can be shared shared libraries, or (ii) to execute applications, whenever across multiple instances of SLVM (Fig. 2). possible using the generated libraries. The shared libraries are generated by the SLVMGen 3. Design decisions program executed on SLVM with a -XSLVMOuput:class flag. The program reads in a list of names of classes Several design decisions are described below, along and/or packages from a file and produces shared libraries with alternatives we considered during SLVM's construction. They include (i) the choice of an existing shared libraries format over a potentially more flexible customized sharing protocol, (ii) the extent of sharing, (iii) the choice of shareable data structures, and (iv) the granularity of the contents of shared libraries.