DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

dPOMP: An Infrastructure for Performance Monitoring of OpenMP Applications

Bernd Mohr

Forschungszentrum Jülich (FZJ) John von Neumann - Institut für Computing (NIC) Zentralinstitut für Angewandte Mathematik (ZAM) 52425 Jülich, Germany [email protected]

dPOMP Team

• Luiz DeRose •IBM Research, ACTC • Yorktown Heights, NY, USA • laderose@us..com

Thomas J. Research Center PO Box 218 Yorktown Heights, NY 10598 • Seetharami Seelam •IBM Research, ACTC • Yorktown Heights, NY, USA • [email protected]

• Bernd Mohr • Forschungszentrum Jülich, ZAM • [email protected]

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [2]

© 2004 Bernd Mohr 1 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Outline

•What is POMP?

•What is DPCL?

• IBM compiler and run-time features that makes dPOMP possible

• dPOMP Implementation

•Examples of use

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [3]

The Motivation: PMPI - The MPI Profiling Interface

• PMPI allows selective replacement of MPI routines at link time ⇒ no re-compilation necessary • Uses technique of “wrapper” function libraries • Used by most MPI performance tools • Vampirtrace, MP_profiler, MPICH MPE, TAU, EPILOG, …

Call MPI_Send MPI_Send MPI_Send

PMPI_Send

Call MPI_Bcast MPI_Bcast

User program Profiling library MPIMPI Librarylibrary

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [4]

© 2004 Bernd Mohr 2 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

“Standard” OpenMP Monitoring API?

• Problem: • OpenMP (unlike MPI) does not define standard monitoring interface • OpenMP is defined mainly by directives/pragmas

•Solution: • POMP: OpenMP Monitoring Interface •Joint Development – Forschungszentrum Jülich –University of Oregon • Presented at EWOMP’01, LACSI’01 and SC’01

“The Journal of Supercomputing”, 23, Aug. 2002.

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [5]

POMP Instrumentation

POMP POMP instrumented OpenMP preprocessor program compiler

POMP monitoring library POMP OpenMP OpenMP compiler enabled program with --pomp executable POMP OpenMP enabled compiler RTS

OpenMP binary executable compiler instrumentor

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [6]

© 2004 Bernd Mohr 3 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Prototype POMP Instrumentation Tool

• OpenMP Pragma And Region Instrumentor • Source-to-source translator to insert POMP calls around OpenMP constructs and API functions • Implemented in ++

• Supports: • Fortran77 und Fortran90, OpenMP 2.0 • C und C++, OpenMP 1.0 • Additional POMP directives for control and region definition • EPILOG and TAU POMP measurement libraries • Preserves information (#line line file) • Does not support: Instrumentation of user functions

• http://www.fz-juelich.de/zam/kojak/opari/

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [7] 44

OpenMP Monitoring : Other Projects

• European IST Project INTONE • Development of OpenMP programming environment (includes monitoring interface) • Pallas, CEPBA, Royal Inst. Of Technology, TU Dresden • http://www.cepba.upc.es/intone/ • KAI Software Laboratory (KSL), VGV (Vampir+Guide) • Development of OpenMP monitoring interface inside ASCI • Based on POMP, but further developed in other directions

• Current status: • Design of joint proposal Ö POMP2 == POMP (presented at EWOMP’02) • Investigating standardization through OpenMP Forum (??)

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [8]

© 2004 Bernd Mohr 4 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

POMP Functionality

• Call of POMP routines at significant points (“events”) during execution of OpenMP programs

• Instrumentation-time (static) and run-time (dynamic) event context get passed as parameter to POMP routines

• Allows specification of of • Instrumentation • Monitoring

• Organization of events into groups and assignment to levels allows for flexible yet simple control

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [9]

OpenMP Event Model

• OpenMP Directives/Pragmas •ENTER/EXIT of OpenMP construct plus BEGIN/END of corresponding structured block • Special case parallel loop: CHUNKBEGIN/END, ITERBEGIN/END or ITEREVENT instead of BEGIN/END • “Single events” for small constructs like atomic or flush

• OpenMP API calls •ENTER/EXIT for omp_set_*_lock() functions • “Single events” for all API functions

• User functions and regions •ENTER/EXIT or “single events”

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [10]

© 2004 Bernd Mohr 5 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Example: Standard Instrumentation

1: int main() { 2: int id; ***3: POMP_Init(); 3:4: #pragma omp parallel private(id) ***5: { POMP_handle_t pomp_hd1 = 0; ***6: int32id = omp_get_thread_num();pomp_tid = omp_get_thread_num(); ***7: POMP_Parallel_enter(&pomp_hd1,printf("hello from %d\n", id); pomp_tid, -1, 1, ***8: } "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**"); 4:9: } #pragma omp parallel private(id) 5: { *** int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_begin(pomp_hd1, pomp_tid); 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); *** POMP_Parallel_end(pomp_hd1, pomp_tid); 8: } *** POMP_Parallel_exit(pomp_hd1, pomp_tid); *** } *** POMP_Finalize(); 9: }

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [11]

Example: Optimized Instrumentation

1: int main() { 2: int id; *** POMP_handle_t pomp_hd1 = 0; *** POMP_Init(); *** POMP_Get_handle(&pomp_hd1, *** "49*type=pregion*file=demo.c*slines=4,4*elines=8,8**"); 3: *** { int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_enter(&pomp_hd1, pomp_tid, -1, 1, NULL); 4: #pragma omp parallel private(id) 5: { *** int32 pomp_tid = omp_get_thread_num(); *** POMP_Parallel_begin(pomp_hd1, pomp_tid); 6: id = omp_get_thread_num(); 7: printf("hello from %d\n", id); *** POMP_Parallel_end(pomp_hd1, pomp_tid); 8: } *** POMP_Parallel_exit(pomp_hd1, pomp_tid); *** } *** POMP_Finalize(); 9: }

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [12]

© 2004 Bernd Mohr 6 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

dPOMP Motivation

• Need for testbed for POMP2 proposal • Could be gets never accepted by OpenMP ARB • Even if accepted, may take too long to be implemented • Need for POMP implementation based on dynamic instrumentation •src-to-src: OPARI •compiler: INTONE •run-time lib: KSL-POMP

• Our Approach • A POMP implementation based on dynamic probes • Built on of IBM's DPCL

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [13]

What Is DPCL?

• C++ Based Class Library • IBM Poughkeepsie Development Lab • 11 Classes, Plus Additional API's

• Dynamic Instrumentation - Software Probes • Based on DynInst and Paradyn

• Language/Programming Model Independent • Supports , Fortran 90, C, C++ • Requires only information from the executable (a.out)

•Provides a general purpose infrastructure for: • Serial, shared memory, and message passing

• A Platform to Enable Tools Developers To Build Tools With Less Time And Effort

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [14]

© 2004 Bernd Mohr 7 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

DPCL Probes

• DPCL allows tools to insert data, functions, and code patches (probes) into a program dynamically • Call site • Call entry • Call exit

• Probes can collect and report program information, program state, or modify the program execution

• Probes may be placed at specific locations in the program and can be activated: • Whenever execution reaches that location • By expiration of a timer •Exactly once

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [15]

The IBM Compiler and Run-time Library

Source code Compiler generated master thread all threads main() { main() { A@0L1 { POMP_Parallel_begin POMP_Function_enter POMP_Loop_enter A() A() xlf_DoPar POMP_Function_exit POMP_Loop_exit } } run-time library POMP_Parallel_end } A() { A() {

OMP parallel POMP_Parallel_enter A@0L1@OL2 { OMP loop xlf_Par POMP_Loop_chunk_begin do I=start,end POMP_Parallel_exit loop body OMP end parallel POMP_Loop_chunk_end enddo }

} }

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [16]

© 2004 Bernd Mohr 8 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Limitations

• 63 out of 68 POMP events supported !

• Limitations due to compiler issues • POMP_Loop_iter_(begin, or end, or event) • POMP_Implicit_barrier_(end, or exit) • OMP Parallel Loop NOT = OMP Parallel / OMP Loop •Compile Time Context (CTC) – hasFirstPrivate, hasLastPrivate, hasNowait, hasCopyin, schedule, hasOrdered, and hasCopypriv not available

• Limitations due to DPCL issues • Loop iteration values (, final, incr, chunk)

• Limitations due to lack of time … • C++ methods instrumentation support

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [17]

Changes and Extensions Due to Open Issues

• Fully defined attribute and values for CTC string

• Event handler is always passed by reference

• Finer instrumentation control • User defined functions – Function calls in “main” program (outside parallel regions) + all MPI calls are instrumented by default – User can provide a file with functions to instrument

• POMP Events – Only events supplied in the monitoring libraries are instrumented

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [18]

© 2004 Bernd Mohr 9 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

dPOMP Tool

• Basic usage % dpomp

POMP compliant monitoring library • OpenMP application (or mixed-mode)

•Performs binary instrumentation • Amount of instrumentation can be controlled by – By the tool builder: Set of POMP calls available in the monitoring library – By the user: Environment variables

• Executes instrumented application

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [19]

dPOMP Tool

• Selective instrumentation of user functions % dpomp –l Edit % dpomp –f

• Predefined POMP libraries (probes) • pomprof_probe (to generate *.viz profiles) • elg_probe (to generate EPILOG trace files)

• Trial package available from IBM Alphaworks for 2004 • dPOMP + pomprof_probe • http://www.alphaworks.ibm.com/tech/dpomp/

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [20]

© 2004 Bernd Mohr 10 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

POMP Profiler Library (POMPROF)

• POMP compliant library from IBM ACTC

• Generates a detailed profile describing overheads and time spent by each thread in three key regions of the parallel application: • Parallel regions • OpenMP loops inside a parallel region • User defined functions

• Profile data • Presented in the form of an XML file •Visualized with PeekPerf

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [21]

Example: PeekPerf Visualization of POMPROF Output

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [22]

© 2004 Bernd Mohr 11 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

KOJAK POMP Tracing Library: elg_probe

• POMP monitoring library which generates EPILOG event traces • Processed by KOJAK’s automatic event tracer analyzer EXPERT

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [23]

The KOJAK Project

• Kit for Objective Judgement and Automatic Knowledge-based detection of bottlenecks

• Lomg-term goals • Design and Implementation of a Portable, Generic, and Automatic Performance Analysis Environment

• Current focus •Event Tracing • Parallel with SMP nodes • MPI, OpenMP, Hybrid (OpenMP + MPI) programming model • Development of research prototypes

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [24]

© 2004 Bernd Mohr 12 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Overall KOJAK Architecture

Semi-automatic user OPARI / modified Instrumentation program TAU instr. program POMP+PMPI Compiler / libraries executable Linker EPILOG trace library

execute EXPERT analysis EXPERT Analyzer result Presenter EPILOG EARL event trace Automatic Analysis

trace VTF3 VAMPIR converter event trace

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [25] Manual Analysis

KOJAK Architecture on IBM AIX

executable

execute with dpomp EXPERT analysis EXPERT Analyzer result Presenter EPILOG EARL Event trace Automatic Analysis

trace VTF3 VAMPIR converter event trace

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [26] Manual Analysis

© 2004 Bernd Mohr 13 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Performance Property Region Tree Location What problem? Where in source code? How is the In what context? problem distributed across the machine?

Color Coding How severe is the problem?

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [27]

EPILOG Trace Converted to VTF3

• EPILOG-to-VTF3 • Maps OpenMP constructs into VAMPIR symbols and activities

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [28]

© 2004 Bernd Mohr 14 DPOMP: OpenMP Tool Infrastructure SCICOMP Bologna, March 2004

Conclusion

•Veryproductive and effective collaboration with IBM ACTC

• Innovative tool infrastructure for OpenMP

• Available at IBM alphaworks

Future Work

• OPARI • Support for POMP2 • dPOMP • More extensive evaluations • Finish missing features • Remove limitations?

© 2004 Forschungszentrum Jülich, NIC-ZAM, Bernd Mohr [29]

© 2004 Bernd Mohr 15