Rochester
BlueGene External Performance Instrumentation Facility
Roy Musselman Other Contributors: Dave Hermsmeier, Kurt Pinnow, Brent Swartz Blue Gene Software Development IBM Rochester, Minnesota
ScicomP 12 IBM System Scientific Computing User Group Boulder, CO, July 18-21, 2006
© 2006 IBM Corporation IBM Systems & Technology Group Rochester
Agenda
Performance Monitor Tools Overview – PAPI – HPC Toolkit (LIBHPM) – External Performance Instrumentation Facility (EPIF)
EPIF – Interface to the Hardware Performance Counters – Features – Operation – Commands Demo – Features and application example
2 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Performance Monitoring Tools Overview
PAPI – Performance Application Programming Interface – Defines a standard interface for accessing performance counter hardware – Instrumentation and data collection is managed from within the application. – Available at: http://icl.cs.utk.edu/papi/index.html High Performance Computing (HPC) Toolkit – Developed by ACTC, IBM Research http://www.research.ibm.com/actc – LIBHPM – Detailed hardware performance monitoring – Instrumentation and data collection is managed from within the application. – Packaged with other complementary tools to profile and visualize results BlueGene/L External Performance Instrumentation Facility (EPIF) – No change to the application required, thus no direct correlation to program execution. – Negligible impact to performance: Uses the control system network to extract counter data asynchronously with the execution of the applications – Expanded with new function and now generally available in BlueGene\L V1R3. All three tools utilize the hardware performance counters implemented on BlueGene/L.
3 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
BlueGene’s Hardware Performance Counter Mechanism
All three performance monitoring tools utilize this mechanism. Special logic within the Compute node taps into the various components. – Processors & FPUs, L2 and L3 hit/miss, torus and tree network events – 328 total unique events
Up to 52 of the 328 events can be counted concurrently using 32-bit counter registers. At periodic intervals, the 32-bit counters are read and accumulated into 64-bit locations in SRAM Current Limitations – The 32-bit counters may overflow thus necessitating the software accumulation. – Contention for limited FPU event counter resources – Only one type of Load/Store instruction count per processor – Only one type of FPU Instruction count per processor – In V1R3, the derived FPU counters will sample the FPU instructions in a round-robin fashion across the processors
4 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Compute Node Counters Monitor Hardware Events 5.5GB/s
11GB/s PLB (4:1) 32k/32k L1 256 2.7GB/s 128 L2 440 CPU 4MB
EDRAM Shared “Double FPU” L3 directory L3 Cache 1024+ Multiported for EDRAM or 256 144 ECC snoop Shared Memory 5.6GF SRAM 32k/32k L1 peak 128 Buffer 22GB/s L2 node 440 CPU 256 I/O proc Includes ECC
256 “Double FPU” l
128
DDR Ethernet JTAG Control Gbit Access Torus Tree Global with ECC Interrupt 5.5 GB/s
Gbit JTAG 6 out and 3 out and 144 bit wide Ethernet 6 in, each at 3 in, each at 4 global DDR 1.4 Gbit/s link 2.8 Gbit/s link barriers or 256MB interrupts
5 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
EPIF’s Interface to the Hardware Performance Counters
Prior to program invocation, the hardware counter logic on the Compute Node chip is programmed to capture the occurrences of a subset of hardware events: – Ex. L3 hits/misses, FPU operations, Torus packet activity – The user can choose one of 22 possible predefined subsets (a.k.a. counter definition ids) consisting of up to 52 of the 328 possible events – Counter definition ids 0:16 are identical to those used by LIBHPM
The counter data is periodically read from the SRAM and retrieved by the service node via the control system network (JTAG)
EPIF manages the collection, filtering, and storage of the counter data.
File system storage required: 340KB per sample per midplane for each sample that is preserved.
6 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
EPIF Key Features
Easy to use Provides a non-intrusive mechanism of monitoring system and job performance characteristics. – No application changes are required other than: – Just relink the application with the –lbgl_perfctr.rts library – The interval timer is used to trigger the counter sample and accumulate to SRAM.
Minimal performance impact to the running applications – Sampling of counter data is done with negligible performance impact. – Collection of data is done via the control system network (JTAG)
EPIF provides the following: – A GUI to browse results – Storage of results to the external file system – Option to store results to the MMCS SQL database – Ability to filter and organize the counter and attribute data. – Supports CSV export formats for easy import into spreadsheets – Derived FPU counters for aggregate estimates of FLOP rates
7 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
8 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Perfmon Operation
Jobs are initiated as usual with either a default or specified counter definition ID – Counter definition ID – a predefined subset of counters consisting of up to 52 of the 328 possible hardware events that can be monitored – Specified by the BGL_PERFMON environment variable One or more instances of perfmon can be started on the service node, each with their own set of parameters including: – Sample interval – Attributes to filter the set of jobs to be monitored (ex. user name, block id, etc.) – Sample type: detailed or summary – Destination of the collected data: file system directory and optionally to the MMCS SQL database Perfmon will monitor all running jobs except for the following: – Those jobs that do not match the filter criteria used to initiate the perfmon application. – Those jobs that have not been linked with the performance counter library – Those jobs that have been instrumented with other tools using the performance counters (ie. PAPI or LIBHPM)
9 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
EPIF commands perfmon – Starts an instance of the performance monitor tool. Options control the collection of hardware counter data. – --username=‘(userid1,userid2,userid3)’ – --block_id=‘(R0*,R0R1R2R3)’ – --sample_type=d dsp_perfmon – Provides a simple GUI to view performance data and do some high-level distillation of the collected data – Works on data actively being collected and data that was previously collected
ext_perfmon_data – Extracts performance data to CSV files for analysis by other tools. Many options available to filter the extracted data.
10 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
EPIF Commands (cont.)
imp_perfmon_data – Imports collected performance data to the MMCS SQL performance database
exp_perfmon_data – Exports performance data from the MMCS SQL performance database to the external file system, optionally deleting the data from the SQL database
end_perfmon – Ends in instance of perfmon prior to the ending criteria specified on the perfmon command
11 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
EPIF complements other performance monitoring tools
EPIF is not intended to be an all inclusive, comprehensive set of performance tools.
EPIF deals exclusively with the performance data obtainable from the hardware performance counters.
EPIF does not replace PAPI or LIBHPM, which can be used to zero-in on specific code segments.
EPIF can be used by system administrators for real-time system and job activity monitoring. (detecting hung jobs, summarizing job statistics)
EPIF can be used by programmers with access to the service node for an aggregate view of application performance.
Other data analysis and visualization tools can utilize the detailed data obtained from EPIF.
12 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Demo of dsp_perfmon
python dsp_perfmon.py Navigate to find and select the .mon file
13 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: List of jobs monitored by this perfmon instance
14 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: List of filters and runtime attributes
15 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: List of job and block attributes
16 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: Explore Via Samples/Nodes/Counters
17 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: Extract Perfmon Data ( right click on Sample 4 )
18 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: Extracted .csv file
19 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester dsp_perfmon demo: Extracted histogram data
20 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Application Example X=0, Z\Y 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 17 19 21 30 30 21 19 17 17 19 21 30 30 21 19 17 1 19 22 25 30 30 25 22 19 19 22 25 30 30 25 22 19 2 21 25 30 35 35 30 25 21 21 25 31 35 35 31 25 21 3 30 30 35 44 44 35 31 30 30 31 35 44 44 35 31 30 4 30 30 35 44 44 35 30 30 30 31 35 44 44 35 31 30 An application was exhibiting very poor 5 21 25 31 35 35 30 25 21 21 25 30 35 35 31 25 21 performance when executing multiple 6 19 22 25 30 30 25 22 19 19 22 25 30 30 25 22 19 7 17 19 21 30 30 21 19 17 17 19 21 30 30 21 19 17 concurrent point-to-point MPI operations. 8 17 19 21 30 30 21 19 17 17 19 21 30 30 21 19 17 9 19 22 25 30 30 25 22 19 19 22 25 30 30 25 22 19 10 21 25 31 35 35 30 25 21 21 25 30 35 35 30 25 21 Suspected network congestion. 11 30 31 35 44 44 35 30 30 30 30 35 44 44 35 30 30 12 30 31 35 44 44 35 30 30 30 31 35 44 44 35 30 30 13 21 25 31 35 35 31 25 21 21 25 31 35 35 30 25 21 Needed a method to detect and visualize the 14 19 22 25 30 30 25 22 19 19 22 25 30 30 25 22 19 torus network activity within the system. 15 17 19 21 30 30 21 19 17 17 19 21 30 30 21 19 17 With no source code changes, the EPIF was used to capture the torus network packet transmission counters and export the specific data to file. A custom visualization tool was then used to colorize the various ranges of counter values and map them to the node locations to reveal the congested areas. (hot spots)
21 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Future Development We believe that this style of external instrumentation has great potential in high-performance computing.
The definition of future functionality is currently being considered.
We solicit feedback and suggestions to – Evaluate and experiment with the current facility – Influence future design – Help us to provide functionality that is most important to the high performance computing community .
We encourage other analysis and visualization tool developers to consider the possibilities of utilizing the data provided by EPIF for enhancements to their offerings.
22 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Resources: Support Website http://www-03.ibm.com/servers/eserver/support/bluegene/index.html
23 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation IBM Systems & Technology Group Rochester
Resources: Redbooks
Detailed documentation of the External Performance Instrumentation Facility is available in the Redbook entitled: “Blue Gene/L: Performance Analysis Tools”
24 Blue Gene External Performance Instrumentation Facility | ScicomP 12 © 2006 IBM Corporation