RPPA: a Remote Parallel Program Performance Analysis Tool

JOURNAL OF SOFTWARE, VOL. 6, NO. 12, DECEMBER 2011 2399 RPPA: A Remote Parallel Program Performance Analysis Tool Yunlong Xu, Zeng Zhao, Weiguo Wu*, and Yixin Zhao School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an, China Email: [email protected], [email protected] Abstract—Parallel program performance analysis plays an modifying, deleting, and inserting code to change the important role in exploring parallelism and improving execution behaviors of programs, is applied to the efficiency of parallel programs. To ease the performance performance measurement. The framework of RPPA analysis for remote programs, this paper presents a remote consists of three parts: client, server and computing nodes. parallel program performance analysis tool, RPPA (Remote A performance analysis task is committed to the server Parallel Performance Analyzer), which is based on dynamic code instrumentation. A hierarchical structure is adopted by through the client graphical user interface (i.e., GUI); a RPPA which consists of 3 parts: client, server and server daemon running on the server then starts the computing nodes. Performance analysis tasks are submitted program performance data collection processes on several to the server via the graphical user interface of the client, computing nodes; later, the server gathers performance and then actual analysis processes are started on the data files from computing nodes, and sends back to the computing nodes by the server to collect performance data client for visualization. for visualization in the client. The performance information The reminder of this article is organized as follows: gained by RPPA is comprehensive and intuitive, hence it is First, we consider related work and point out RPPA’s quite helpful for users to analyze performance, locate strength through comparison with existing tools in the bottlenecks of programs, and optimize programs. next section. In Section 3, we describe the overall Index Terms—parallel programming tools, program architecture of RPPA. After that, we present the performance analysis, dynamic code instrumentation, methodology applied in the client, the server and the performance visualization computing nodes in detail respectively in Section 4. In Section 5, we present the evaluation of a RPPA prototype. Section 6 is a discussion. Finally, we consider future I. INTRODUCTION work and conclude the paper. In the past few years, parallel computers have grown very fast, while software for parallel computing has II. RELATED WORK grown comparatively slow. This situation makes the Performance analysis is crucial to parallel program efficiency of parallel systems relatively low, thus development. This work combines knowledge of various hardware performance cannot be fully utilized [1]. fields, e.g., parallel computing, computer architecture, Therefore software becomes the bottleneck of parallel data mining. The related work of program performance computing, and limits the widely use of parallel analysis has been carried out by many organizations and computers. For instance, parallel program development is agencies. challenging due to the lack of effective tools for coding, MPE [2] is an extension of MPICH [3] which is a debugging, performance analyzing and optimizing. portable implementation of MPI. It consists of a large Motivated by the preceding challenge, a remote number of programming interfaces and examples for visualization parallel program performance analysis tool, correctness debugging, performance analysis and RPPA, which aims to help programmers to optimize visualization. It supports several file formats to log performance of applications and make full use of parallel performance information of programs, which can be computing resources, is designed and developed based on viewed by Upshot, Jumpshot [5], and other visualization the survey of existing tools and our former research of a tools. parallel program integrated development environment Paradyn [6], developed by University of Wisconsin, is (i.e., IDE). RPPA provides a solution for performance a software package which aids in analyzing performance analysis of MPI [4] applications running on parallel of large-scale parallel applications. It supports analyzing computers with SMP nodes. To gain comprehensive MPI and PVM [7] applications. The cause of performance information of programs, a dynamic performance problems is systematically detected by instrumentation based approach, which includes inserting and modifying code of programs automatically. And performance bottlenecks are automatically searched 3 The journal article is based on the conference paper, "Research and by means of a W (i.e., When, Where, and Why) model. Design of a Remote Visualization Parallel Program Performance TAU [8] is a parallel program performance analysis Analysis Tool," which appeared in PAAP’10. system developed by University of Oregon, Juelich *Corresponding author: Weiguo Wu © 2011 ACADEMY PUBLISHER doi:10.4304/jsw.6.12.2399-2406 2400 JOURNAL OF SOFTWARE, VOL. 6, NO. 12, DECEMBER 2011 Research Center, and Los Alamos National Laboratory Performance Analysis Operation GUI Command Handling Module together. It focuses on system robustness, flexibility, Performance Data Visualization Client Task Visualization portability and integration of other related tools, and Downloading Controlling Launching Views Module Module supports multiple programming languages, including Module C/C++, Fortran, Java, Python. A MPI wrapper library is Connection Manager provided to analyze performance of MPI functions, as Internet well as a log format conversion tool. Data Collection Launching VENUS (Visual Environment of Neocomputer Utility Data File Module Server System) [17] is a parallel program performance MPI Communication Gathering Library Module visualization environment developed by Xi'an Jiaotong University. It provides solutions for monitoring, Interconnection network analyzing and optimizing large-scale PVM parallel programs based on profiling method. System supports MPI Communication MPI Communication MPI Communication real-time and post-mortem performance visualization. Library Library Library Measurement code is automatically inserted into source Performance Data Collector Performance Data Collector Performance Data Collector code of programs to collect performance data for data MPI Processes MPI Processes MPI Processes analysis and performance visualization. In this way, Performance Data file Performance Data file Performance Data file performance analysis helps users to optimize their programs. Computing Node Computing Node Computing Node Open|SpeedShop [9], a dynamic binary instrumentation based performance tool using DPCL Figure 1. Overall architecture of RPPA [16]/Dyninst [12], aims to overcome a common limitation of performance analysis tools: each tool alone is often systems. The performance information of programs is limited in scope and comes with widely varying presented in RPPA’s GUI in an intuitive way, which is interfaces and workﬂow constraints, requiring different quite helpful to locate performance bottlenecks and changes in the often complex build and execution improve efficiency of programs. A performance analysis infrastructure of the target application; thus it provides task is committed to the server through the graphical efficient, easy to apply, and integrated performance client; a daemon running on the server starts the program analysis for parallel systems. performance data collection processes, whose core step is HPCToolkit [10], developed by Rice University, is an dynamic instrumentation, on several computing nodes; integrated suite of tools that supports measurement, the server gathers performance data files from the analysis, attribution, and presentation of application computing nodes, and sends back to the client for performance for both sequential and parallel programs. visualization. HPCToolkit can pinpoint and quantify scalability RPPA is customized to the cluster structure. The bottlenecks in fully-optimized parallel programs and overall architecture of RPPA is depicted in Figure 1. The multithreaded programs at a low cost. Call path proﬁles client connects with the server over the internet, and the for fully-optimized codes can also be collected without computing nodes connect with the server, which can also compiler support. be a computing node, over high-speed interconnection The tools enumerated above have a common defect network. Users control the whole system through a GUI, that performance visualization and data collection are which includes two core modules for submitting both carried out on parallel computers or clusters. It is performance analysis tasks, downloading and visualizing hard for users who are not familiar with parallel systems performance data. The server-side operation details (e.g., to install and apply these tools. The preceding defect how the run-time processes are assigned to the computing hinders the widely use of parallel program development nodes) are transparent to users. As a result, operations are environment and parallel computing resources to some greatly simplified for users who are not familiar with the extent. Therefore, there is an urgent need for a user- bottom parallel cluster structure. The server, entry node friendly remote parallel program performance analysis of a parallel cluster, is responsible for maintaining the tool. RPPA is to meet this need. Users connect to remote connection between the client and the cluster, accepting

RPPA: a Remote Parallel Program Performance Analysis Tool

Application Performance Optimization

Memory Subsystem Profiling with the Sun Studio Performance Analyzer

Oracle Solaris Studio 12.2 Performance Analyzer MPI Tutorial

Oracle® Solaris Studio 12.4: Performance Analyzer Tutorials

Openmp Performance 2

Openmp 4.0 Support in Oracle Solaris Studio

1 a Study on Performance Analysis Tools for Applications Running On

Java™ Performance

Characterization for Heterogeneous Multicore Architectures

Oracle Developer Studio Performance Analyzer Brief

Optimizing Applications with Oracle Solaris Studio Compilers and Tools

Oracle Solaris Studio July, 2014