Dpomp: an Infrastructure for Performance Monitoring of Openmp Applications
Total Page:16
File Type:pdf, Size:1020Kb
DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 dPOMP: An Infrastructure for Performance Monitoring of OpenMP Applications Bernd Mohr Forschungszentrum Jülich (FZJ) John von Neumann - Institut für Computing (NIC) Zentralinstitut für Angewandte Mathematik (ZAM) 52425 Jülich, Germany [email protected] dPOMP Team • Luiz DeRose •IBM Research, ACTC • Yorktown Heights, NY, USA • [email protected] Thomas J. Watson Research Center PO Box 218 Yorktown Heights, NY 10598 • Seetharami Seelam •IBM Research, ACTC • Yorktown Heights, NY, USA • [email protected] • Bernd Mohr • Forschungszentrum Jülich, ZAM • [email protected] © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [2] © 2004 Bernd Mohr 1 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 Outline •What is POMP? •What is DPCL? • IBM compiler and run-time library features that makes dPOMP possible • dPOMP Implementation •Examples of use © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [3] The Motivation: PMPI - The MPI Profiling Interface • PMPI allows selective replacement of MPI routines at link time ⇒ no re-compilation necessary • Uses technique of “wrapper” function libraries • Used by most MPI performance tools • Vampirtrace, MP_profiler, MPICH MPE, TAU, EPILOG, … Call MPI_Send MPI_Send MPI_Send PMPI_Send Call MPI_Bcast MPI_Bcast User program Profiling library MPIMPI Librarylibrary © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [4] © 2004 Bernd Mohr 2 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 “Standard” OpenMP Monitoring API? • Problem: • OpenMP (unlike MPI) does not define standard monitoring interface • OpenMP is defined mainly by directives/pragmas •Solution: • POMP: OpenMP Monitoring Interface •Joint Development – Forschungszentrum Jülich –University of Oregon • Presented at EWOMP’01, LACSI’01 and SC’01 “The Journal of Supercomputing”, 23, Aug. 2002. © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [5] POMP Instrumentation POMP POMP instrumented OpenMP preprocessor program compiler POMP monitoring library POMP OpenMP OpenMP compiler enabled program with --pomp executable POMP OpenMP enabled compiler RTS OpenMP binary executable compiler instrumentor © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [6] © 2004 Bernd Mohr 3 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 Prototype POMP Instrumentation Tool • OpenMP Pragma And Region Instrumentor • Source-to-source translator to insert POMP calls around OpenMP constructs and API functions • Implemented in C++ • Supports: • Fortran77 und Fortran90, OpenMP 2.0 • C und C++, OpenMP 1.0 • Additional POMP directives for control and region definition • EPILOG and TAU POMP measurement libraries • Preserves source code information (#line line file) • Does not support: Instrumentation of user functions • http://www.fz-juelich.de/zam/kojak/opari/ © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [7] 44 OpenMP Monitoring APIs: Other Projects • European IST Project INTONE • Development of OpenMP programming environment (includes monitoring interface) • Pallas, CEPBA, Royal Inst. Of Technology, TU Dresden • http://www.cepba.upc.es/intone/ • Intel KAI Software Laboratory (KSL), VGV (Vampir+Guide) • Development of OpenMP monitoring interface inside ASCI • Based on POMP, but further developed in other directions • Current status: • Design of joint proposal Ö POMP2 == POMP (presented at EWOMP’02) • Investigating standardization through OpenMP Forum (??) © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [8] © 2004 Bernd Mohr 4 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 POMP Functionality • Call of POMP routines at significant points (“events”) during execution of OpenMP programs • Instrumentation-time (static) and run-time (dynamic) event context get passed as parameter to POMP routines • Allows specification of extent of • Instrumentation • Monitoring • Organization of events into groups and assignment to levels allows for flexible yet simple control © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [9] OpenMP Event Model • OpenMP Directives/Pragmas •ENTER/EXIT of OpenMP construct plus BEGIN/END of corresponding structured block • Special case parallel loop: CHUNKBEGIN/END, ITERBEGIN/END or ITEREVENT instead of BEGIN/END • “Single events” for small constructs like atomic or flush • OpenMP API calls •ENTER/EXIT for omp_set_*_lock() functions • “Single events” for all API functions • User functions and regions •ENTER/EXIT or “single events” © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [10] © 2004 Bernd Mohr 5 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 Example: Standard Instrumentation 1: int main() { 2: int id; ***3: POMP_Init(); 3:4: #pragma omp parallel private(id) ***5: { POMP_handle_t pomp_hd1 = 0; ***6: int32id = omp_get_thread_num();pomp_tid = omp_get_thread_num(); ***7: POMP_Parallel_enter(&pomp_hd1,printf("hello from %d\n", id); pomp_tid, -1, 1, ***8: } "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**"); 4:9: } #pragma omp parallel private(id) 5: { *** int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_begin(pomp_hd1, pomp_tid); 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); *** POMP_Parallel_end(pomp_hd1, pomp_tid); 8: } *** POMP_Parallel_exit(pomp_hd1, pomp_tid); *** } *** POMP_Finalize(); 9: } © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [11] Example: Optimized Instrumentation 1: int main() { 2: int id; *** POMP_handle_t pomp_hd1 = 0; *** POMP_Init(); *** POMP_Get_handle(&pomp_hd1, *** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**"); 3: *** { int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1, NULL); 4: #pragma omp parallel private(id) 5: { *** int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_begin(pomp_hd1, pomp_tid); 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); *** POMP_Parallel_end(pomp_hd1, pomp_tid); 8: } *** POMP_Parallel_exit(pomp_hd1, pomp_tid); *** } *** POMP_Finalize(); 9: } © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [12] © 2004 Bernd Mohr 6 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 dPOMP Motivation • Need for testbed for POMP2 proposal • Could be gets never accepted by OpenMP ARB • Even if accepted, may take too long to be implemented • Need for POMP implementation based on dynamic instrumentation •src-to-src: OPARI •compiler: INTONE •run-time lib: KSL-POMP • Our Approach • A POMP implementation based on dynamic probes • Built on top of IBM's DPCL © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [13] What Is DPCL? • C++ Based Class Library • IBM Poughkeepsie Unix Development Lab • 11 Classes, Plus Additional API's • Dynamic Instrumentation - Software Probes • Based on DynInst and Paradyn • Language/Programming Model Independent • Supports Fortran, Fortran 90, C, C++ • Requires only information from the executable (a.out) •Provides a general purpose infrastructure for: • Serial, shared memory, and message passing • A Platform to Enable Tools Developers To Build Tools With Less Time And Effort © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [14] © 2004 Bernd Mohr 7 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 DPCL Probes • DPCL allows tools to insert data, functions, and code patches (probes) into a program dynamically • Call site • Call entry • Call exit • Probes can collect and report program information, program state, or modify the program execution • Probes may be placed at specific locations in the program and can be activated: • Whenever execution reaches that location • By expiration of a timer •Exactly once © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [15] The IBM Compiler and Run-time Library Source code Compiler generated master thread all threads main() { main() { A@0L1 { POMP_Parallel_begin POMP_Function_enter POMP_Loop_enter A() A() xlf_DoPar POMP_Function_exit POMP_Loop_exit } } run-time library POMP_Parallel_end } A() { A() { OMP parallel POMP_Parallel_enter A@0L1@OL2 { OMP loop xlf_Par POMP_Loop_chunk_begin do I=start,end POMP_Parallel_exit loop body OMP end parallel POMP_Loop_chunk_end enddo } } } © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [16] © 2004 Bernd Mohr 8 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 Limitations • 63 out of 68 POMP events supported ! • Limitations due to compiler issues • POMP_Loop_iter_(begin, or end, or event) • POMP_Implicit_barrier_(end, or exit) • OMP Parallel Loop NOT = OMP Parallel / OMP Loop •Compile Time Context (CTC) – hasFirstPrivate, hasLastPrivate, hasNowait, hasCopyin, schedule, hasOrdered, and hasCopypriv not available • Limitations due to DPCL issues • Loop iteration values (init, final, incr, chunk) • Limitations due to lack of time … • C++ methods instrumentation support © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [17] Changes and Extensions Due to Open Issues • Fully defined attribute and values for CTC string • Event handler is always passed by reference • Finer instrumentation control • User defined functions – Function calls in “main” program (outside parallel regions) + all MPI calls are instrumented by default – User can provide a file with functions to instrument • POMP Events – Only events supplied in the monitoring libraries are instrumented © 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [18] © 2004 Bernd Mohr 9 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004 dPOMP Tool • Basic usage % dpomp <pomp-lib> <exe> • <pomp-lib> POMP compliant monitoring library • <exe> OpenMP application (or mixed-mode)