IBM ACTC: Helping to Make Supercomputing Easier

Luiz DeRose Advanced Computing Technology Center IBM Research

HPC Symposium University of Oklahoma laderose@us..com Thomas J. Research Center PO Box 218 Sept 12, 2002 © 2002 Yorktown Heights, NY 10598

Outline

„ Who we are

¾ Mission statement

¾ Functional overview and organization

¾ History

„ What we do

¾ Industry solutions and activities

™ Education and training

™ STC community building

™ Application consulting

™ Performance tools research

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 2

1 ACTC

„ Mission ¾ To close the gap between HPC users and IBM

¾ Conduct research on applications for IBM servers within the scientific and technical community

™ Technical directions

™ Emerging technologies „ ACTC - Research ¾ Software tools and libraries ¾ HPC applications

¾ Research collaborations ¾ Education and training „ Focus ¾ AIX and Linux platforms

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 3

ACTC Functional Overview

Outside Technology: • Linux • Computational Grids ACTC • DPCL, OpenMP, MPI

IBM Technology: • Power 4, RS/6000 SP • GPFS Solutions: • PSSP, LAPI • Tools • Libraries Customer Needs: • App Consulting • Reduce Cost of Ownership • Collaboration • Optimized Applications • Education + User Training • Education • User Groups

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 4

2 ACTC History

„ Created in September, 1998

¾ Emphasis on helping new customers to port and optimize on IBM system

¾ Required establishing relationships with scientists on research level

„ Expanded operations via alignment with Web/Server Division:

¾ EMEA extended in April, 1999

¾ AP extended (TRL) in September, 2000

¾ Partnership with IBM Centers of Competency (Server Group)

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 5

ACTC Education

„ 1st Power4 Workshop ¾ Jan. 24,25, 2001 at Watson Research ¾ Full Architecture Disclosure from Austin „ 1st Linux Clusters Optimization Workshop ¾ May 25--27, 2001 at MHPCC ¾ Application Tuning for IA32 and Myrinet „ 2001 European ACTC Workshop on IBM SP ¾ Feb. 19,20, 2001 in Trieste (Cineca) ¾ http://www.cineca.it/actc-workshop/index.html „ 2001 Japan HPC Forum and ACTC Workshop ¾ July 17--19 2001 in Tokyo and Osaka „ 1st Linux Clusters Institute Workshop ¾ October 1--5, 2001 at UIUC/NCSA (Urbana) „ IBM Research Internships at Watson

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 6

3 HPC Community: IBM ScicomP

„ Established in 1999 via ACTC Workshop „ IBM External Customer Organization ¾ Equivalent to “CUG” (Cray User Group) „ 2000 – US and European groups merge „ 2001 ¾ 1st European Meeting (SciComp 3) ™ May 8-11, Barcelona, Spain (CEPBA) ¾ US Meeting: October 9-12, Knoxville, TN „ 2002 - ¾ ScicomP 5, Manchester, England (Daresbury Lab ) ¾ ScicomP 6, Berkeley, CA (NERSC) ™ Joint meeting with SP/XXL „ 2003 - ¾ ScicomP 7 in Goettingen, Germany, March 3-7 ¾ ScicomP 8 in Minneapolis (MSI), August 5-9 „ http://www.spscicomp.org

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 7

HPC Community: LCI

„ Linux Clusters Institute (LCI) ¾ http://www.linuxclustersinstitute.org

¾ Established April, 2001 ¾ ACTC Partnership with NCSA and HPCERC

¾ Mission: education and training for deployment of Linux clusters in the STC community

¾ Primary activity: intensive, hands-on workshops

¾ Upcoming Workshops

™ Albuquerque, NM (HPCERC) Sep 30-Oct 04, 2002

™ University of Kentucky, Lexington - Jan 13-17, 2003 „ The third LCI International Conference on Linux Clusters ¾ October 23-25, 2002; St Petersburg, FL ¾ http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/ Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 8

4 Porting Example - NCSC (Cray)

„ ACTC Workshop at NCSC

¾ Bring your code and we show you how to port it (5-day course)

¾ 32 researchers attended

¾ All but one were ported by end of the week

™ the one code not ported was because it was written in PVM, and was not installed on the SP system

„ NCSC now conducts their own workshops

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 9

Porting Examples

„ NAVO (Cray)

¾ Production benchmark code required maximum of 380-400 elapsed seconds

™ ACTC ported and tuned this code to run under 315 seconds on the SP

¾ ACTC Workshop at NAVO

™ 8 production codes ported and optimized by end of week

„ EPA (Cray)

¾ One ACTC man-month

™ Convert largest user code to the SP

™ Codes now run 3 to 6 times faster than they did on T3E

„ NASA/Goddard

¾ 5 codes ported with help of one ACTC scientist working onsite for two weeks Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 10

5 HPC Tools Requirements

„ User demands for performance tools

¾ e.g., memory analysis tools, MPI and OpenMP tracing tools, system utilization tools

„ Insufficient tools on IBM systems

¾ Limited funding for development

¾ Long testing cycle (18 months)

„ STC Users comfortable with “AS-IS” software

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 11

Performance Tools Challenge

... addi r4,r0,1024 fmr fp0,fp0 C = A + B stfd fp31,-8(SP) (c1, c2) = (a1, a2) 6 (b1, b2) mfspr r0,LR a1=1& a2=1e c1bb1& c2bb2 stfd fp30,-16(SP) b1=1& b2=1e c1ba1& c2ba2 lwz r3,T.30.const(RTOC) for i = 1 : 2, stmw r29,-28(SP) ai =? e ci b bi lwz r31,T.34 bi =? e ci b ai lwz r30,T.36.__gen_index_ ai = bi e ci b ai stw r0,8(SP) otherwise, error addis r0,r0,17200 ...

„ User’s mental model of program do not match with executed version

¾ Performance tools must be able to revert this semantic gap

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 12

6 ACTC Tools

„ Goals

¾ Help IBM sales in HPC and increase user satisfaction

¾ Complement IBM application performance tools offerings

„ Methodology

¾ Development in-house and in collaboration with Universities and Research Labs

¾ When available, use infrastructure under development by the IBM development divisions

¾ Deployment “AS IS” to make it rapidly available to the user community

¾ Close interaction with application developers for quick feedback, targeting functionality enhancements

„ Focus

¾ AIX and Linux

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 13

ACTC Software: Tools

„ Performance Tools

¾ HW Performance Monitor Toolkit (HPM)

™ HPMCount

™ LIBHPM

™ HPMViz

™ CATCH

¾ UTE Gantt Chart

™ MPI trace visualization

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 14

7 ACTC Software: Collaborative Tools

„ OMP/OMPI trace and Paraver ¾ Graphical user interface for performance analysis of OpenMP programs ¾ Center for Parallelism of Barcelona (CEPBA) at UPC - Spain „ SvPablo ¾ Performance Analysis and Visualization system ¾ Dan Reed at University of Illinois „ Paradyn and Dyninst ¾ Dynamic system for performance analysis

¾ Bart Miller at University of Wisconsin ¾ Jeff Hollingsworth at University of Maryland

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 15

ACTC Software: Libraries

„ MPItrace ¾ Performance trace library for MPI analysis „ TurboMPI ¾ Collective communication functions to enhance MPI performance for SMP nodes „ TurboSHMEM ¾ “Complete” implementation of Cray SHMEM interface to allow easy and well-tuned porting of Cray applications to the IBM RS/6000 SP „ MIO (Modular I/O) ¾ I/O prefetching and optimization library to enhance AIX Posix I/O handling

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 16

8 Current Research Projects

„ Hardware Performance Monitor (HPM) Toolkit ¾ Data capture, analysis and presentation of hardware performance metrics for application and system data ™ Correlation of application behavior with hardware components ™ Hints for program optimization ™ Dynamic instrumentation and profiler „ Simulation Infrastructure to Guide Memory Analysis (SIGMA) ¾ Data-centric tool under development to ™ Identify performance problems and inefficiencies caused by the data layout in the memory hierarchy ™ Propose solutions to improve performance in current and new architectures ™ Predict performance Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 17

HPMVIZ

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 18

9 SIGMA

„ Where is the memory?

„ Motivation

¾ Efficient utilization of the memory subsystem is a critical factor to obtain performance on current and future architectures „ Goals:

¾ Identify bottlenecks, problems, and inefficiencies in a program due to the memory hierarchy

¾ Support serial and parallel applications

¾ Support present and future architectures Sept 12, 2002Support presentACTC - © 2002and - Luiz futureDeRose [email protected] 19

Current Collaboration Efforts

„ LLNL (CASC) – with Jeff Vetter ¾ Interpreting HW Counters analysis ™ Handling of large volume of multi-dimensional data ™ Correlate hardware events with performance bottlenecks „ Research Centre Juelich – with Bernd Mohr ¾ Usability and reliability of performance data obtained from performance tools ¾ Feasibility of automaautomatictic performance tools „ CEPBA (UPC) – with Jesus Labarta ¾ Performance analysis of OpenMP programs ¾ OMP/OMPI Trace & Paraver interface „ University of Karlsruhe - with Klaus Geers ¾ HPM Collect

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 20

10 Program Analysis - Paraver

x_solve y_solve z_solve rhs „ Basics: ¾ Functions

¾ Loops

¾ Locks

„ Hardware counters: ¾ TLB misses ¾ L1 misses ¾ …

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 21

Hardware Counters Analysis

„ Derived metrics ¾ Instruction mix

™ %Loads

™ %Flops

¾ Computational intensity

™ flops/loads

¾ Program on architecture

™ miss ratios

™ FLOPS per L1 miss

¾ Performance

™ IPC

™ FPC Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 22

11 Analysis of Decomposition Effects

„ 1 x 12

„ 3 x 4

„ 12 x 1

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 23

Summary

„ ACTC is a consolidation of IBM’s highest level of technical expertise in HPC

„ ACTC research is rapidly disseminated to maximize benefits of IBM servers for HPC applications

Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 24

12