Viprof: Vertically Integrated Full-System Performance Proﬁler

VIProf: Vertically Integrated Full-System Performance Profiler Hussam Mousa, Chandra Krintz, Lamia Youseff, and Rich Wolski Dept. of Computer Science University of California, Santa Barbara {husmousa, ckrintz, lyouseff, rich}@cs.ucsb.edu Abstract use virtualization to isolate system instances. Key to our approach is our assumption that a single application executes In this paper, we present VIProf, a full-system, perfor- at a time. This assumption holds for both batched, cluster mance sampling system capable of extracting runtime be- systems that execute scientific applications, as well as for havior across an entire software stack. Our long-term goal a wide range of web, Grid service, and application server is to employ VIProf profiles to guide online optimization systems. Virtualization enables us to customize the system of programs and their execution environments according to stack for a particular application, i.e., the application that is the dynamically changing execution behavior and resource currently executing (or will execute once scheduled), very availability. VIProf thus, must be transparent while produc- aggressively. Then, when the application completes, vir- ing accurate and useful performance profiles. tualization facilitates the replacement of the entire system We overview the design and implementation of VIProf stack with another one (e.g., for another application). and empirically evaluate the system using a popular soft- Given this execution model, i.e., isolated application in- ware stack – one that includes a Linux operating system, stances that execute within guest operating systems over a Java Virtual Machine, and a set of applications. This a virtualizing software layer, we can consider novel tech- composition is commonly employed and important for high- niques for optimization and specialization across all lay- end systems such as application and web servers as well ers of the software system. In particular, we are interested as Computational Grid services. We show that VIProf in- in automatic and dynamic adaptation, customization, and troduces little overhead and is able to capture accurate integration of the application, runtime, and operating sys- (function-level) full-system performance data that previ- tem according to the dynamically changing characteristics ously required multiple profiles and extensive, manual, and of program behavior and underlying resource availability. offline post-processing of profile data. Our approach is called VIVA – Vertically Integrated Vir- tualizAtion, and our preliminary results indicate significant 1. Introduction1 potential and opportunity [12, 32, 31, 30]. As a first step toward enabling dynamic customization, Recent advances in virtualization techniques expose a in this paper, we investigate an efficient and accurate pro- number of new opportunities for applications that execute filing system that captures performance data transparently using them. Virtualization systems are increasingly pop- while the application executes. Such information is vital for ular software systems that multiplex lower-level resources guiding effective runtime (re-)optimization and (re-) adap- among higher level software programs and systems. Exam- tation according to dynamically changing conditions. ples of virtualization systems include a vast body of work Current profiling systems are limited in that they operate in the area of operating systems [22, 14, 7], high-level lan- on a single layer of the execution stack at a time. Ana- guage virtual machines such as those for Java and .Net, and, lyzing profiles and identifying common bottlenecks across most recently, virtual machine monitors (VMM) [28, 20, 1]. layers (OS, JVM, application) is thus, currently not possi- In our research, we investigate opportunities for perfor- ble without manual, offline, and tedious post-processing of mance optimization across the entire system stack (OS, Java profiles collected via different tools. Moreover, even across Virtual Machine, application server, application), when we layers, existing profiling techniques have difficulty captur- ing performance data on all code that executes, e.g. code 1 This work was funded in part by NSF grants CNS-0546737 and ST- that is dynamically compiled. If we are to optimize across HEC-0444412 and Microsoft. the different layers of the software stack, we must be able 1-4244-0910-1/07/$20.00 c 2007 IEEE. to sample all code that executes, and interrelate the performance data across all software layers. To address this need services, and Java middleware in concert with OS perfor- for a transparent, online, full-system profiling system, we mance and OS-JVM interaction. Extant recent approaches present VIProf, a Vertically Integrated Profiler. to JVM profiling, in particular vertical profiling [8], provide VIProf is an extension of the system wide profiler, OPro- a thorough examination of JVM and Java program internals, file, for Linux [19]. We extend OProfile by integrating its but do not capture fine-grain OS activity and the JVM-OS sampling system to access program-level information from interaction. To address the limitations of these two types any Application Virtual Machine. We implement a VIProf of profiling systems, we have developed VIProf. VIProf is prototype using the open source Jikes Research Virtual Ma- able to correlate hardware performance counter events with chine (JikesRVM) from IBM T.J. Watson Research center. all code that executes in the system regardless of where it We evaluate the overhead and efficacy of this VIProf pro- executes (user or kernel space) and how it is generated and totype and detail how VIProf captures full-system perfor- linked (dynamically or statically). mance behavior. An alternative approach to the one that we take with In the sections that follow, we present the background VIProf, is that of the Performance and Event Monitoring and motivation for our research, as well as other related (PEM) infrastructure from IBM Research [27]. PEM com- work in this area. We then describe VIProf in Section 3. bines information collected by several distinct monitoring In Section 4, we demonstrate the potential benefits of verti- streams that are provided by agents instrumented into the cally integrated profiling and empirically evaluate the over- application or operating system (K42), or by native hard- head and efficacy of our system. Finally, we conclude and ware counter agents on the supported hardware (Power Mac discuss our future research plans in Section 5. G5). VIProf, however, is a lightweight, whole-system, hardware-event profiler that provides a unified perspective of the entire execution stack, and requires no instrumenta- 2. Background and Related Work tion or combination of multiple monitored streams. More- over, VIProf, as a result of our use of OProfile as a base in- To enable dynamic and adaptive customization of the frastructure, supports different operating systems and hard- system stack (all software layers including the application), ware platforms – extant profiling system commonly support we require techniques that capture accurate performance a single OS or architecture. Finally, our implementation is data across the system that we can use to guide optimization simple and general enough to support a wide range of vir- decisions. Currently, extant approaches to profiling capture tual execution environments (multiple Java virtual machines only single-layer information, e.g., within a virtual execu- as well as Microsoft .Net common language runtimes). tion environment (JVM or CLR) [2, 9, 5, 17, 11, 21], across the system but agnostic to JVM-process internals [19, 15], employ complex software instrumentation [13, 23, 26], or 3. Vertically Integrated Profiler (VIProf) require hardware extensions [16, 18, 10, 4]. To achieve VIProf extends the OProfile to enable integrated profil- efficient, full-system profiling currently requires multiple, ing across the virtual layers of a system. OProfile con- complex, profiling tools and tedious, inaccurate, offline in- sists of a Linux kernel module, and a user level applica- tegration of profile data. To address these limitations, we in- tion. The kernel module sets up the hardware performance vestigate a full-system profiling system that captures light- counters (HPCs) with the user’s settings and requests a Non- weight hardware performance event data across both run- Maskable Interrupt (NMI) to be raised whenever a con- times for high-level languages, e.g., Java and Microsoft figurable threshold value is reached. When the prescribed .Net, and for code within the operating system. Our sys- number of hardware events occur, an HPC overflows and tem is called the Vertically Integrated Profiler (VIProf). an NMI is raised by the OS. The kernel module handles VIProf is based on a popular, software-based, operating the interrupt by reading the active Program Counter (PC) system profiler called OProfile [19] that we extended to en- value from the processor. OProfile matches the PC value able the integration of internal JVM profile data. OProfile to a virtual memory region, and identifies the correspond- is a system-wide profiling system that captures hardware ing binary or library. Further, OProfile computes the offset performance counter events and correlates them with vari- into the corresponding object file to pinpoint the method ous parts of the executing system, including applications, li- that was executing at the time the interrupt is raised. OPro- braries, and operating system functions. This system, how- file adds this information,

Viprof: Vertically Integrated Full-System Performance Proﬁler

Bypassing Portability Pitfalls of High-Level Low-Level Programming

Here I Led Subcommittee Reports Related to Data-Intensive Science and Post-Moore Computing) and in CRA’S Board of Directors Since 2015

Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior Gilberto Contreras Margaret Martonosi

Fedora Core, Java™ and You

Java in Embedded Linux Systems

Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine

Full-System Simulation of Java Workloads with RISC-V and the Jikes Research Virtual Machine

A Deployable Sampling Strategy for Data Race Detection

Jikes RVM and IBM DK for Java Understanding System Behavior Other Issues 4

Code Coagulation Is an Effective Compilation Technique

Valor: Efﬁcient, Software-Only Region Conﬂict Exceptions ∗

GNU Classpath