IBM ACTC: Helping to Make Supercomputing Easier
Luiz DeRose Advanced Computing Technology Center IBM Research
HPC Symposium University of Oklahoma laderose@us.ibm.com Thomas J. Watson Research Center PO Box 218 Sept 12, 2002 © 2002 Yorktown Heights, NY 10598
Outline
Who we are
¾ Mission statement
¾ Functional overview and organization
¾ History
What we do
¾ Industry solutions and activities
Education and training
STC community building
Application consulting
Performance tools research
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 2
1 ACTC
Mission ¾ To close the gap between HPC users and IBM
¾ Conduct research on applications for IBM servers within the scientific and technical community
Technical directions
Emerging technologies ACTC - Research ¾ Software tools and libraries ¾ HPC applications
¾ Research collaborations ¾ Education and training Focus ¾ AIX and Linux platforms
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 3
ACTC Functional Overview
Outside Technology: • Linux • Computational Grids ACTC • DPCL, OpenMP, MPI
IBM Technology: • Power 4, RS/6000 SP • GPFS Solutions: • PSSP, LAPI • Tools • Libraries Customer Needs: • App Consulting • Reduce Cost of Ownership • Collaboration • Optimized Applications • Education + User Training • Education • User Groups
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 4
2 ACTC History
Created in September, 1998
¾ Emphasis on helping new customers to port and optimize on IBM system
¾ Required establishing relationships with scientists on research level
Expanded operations via alignment with Web/Server Division:
¾ EMEA extended in April, 1999
¾ AP extended (TRL) in September, 2000
¾ Partnership with IBM Centers of Competency (Server Group)
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 5
ACTC Education
1st Power4 Workshop ¾ Jan. 24,25, 2001 at Watson Research ¾ Full Architecture Disclosure from Austin 1st Linux Clusters Optimization Workshop ¾ May 25--27, 2001 at MHPCC ¾ Application Tuning for IA32 and Myrinet 2001 European ACTC Workshop on IBM SP ¾ Feb. 19,20, 2001 in Trieste (Cineca) ¾ http://www.cineca.it/actc-workshop/index.html 2001 Japan HPC Forum and ACTC Workshop ¾ July 17--19 2001 in Tokyo and Osaka 1st Linux Clusters Institute Workshop ¾ October 1--5, 2001 at UIUC/NCSA (Urbana) IBM Research Internships at Watson
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 6
3 HPC Community: IBM ScicomP
Established in 1999 via ACTC Workshop IBM External Customer Organization ¾ Equivalent to “CUG” (Cray User Group) 2000 – US and European groups merge 2001 ¾ 1st European Meeting (SciComp 3) May 8-11, Barcelona, Spain (CEPBA) ¾ US Meeting: October 9-12, Knoxville, TN 2002 - ¾ ScicomP 5, Manchester, England (Daresbury Lab ) ¾ ScicomP 6, Berkeley, CA (NERSC) Joint meeting with SP/XXL 2003 - ¾ ScicomP 7 in Goettingen, Germany, March 3-7 ¾ ScicomP 8 in Minneapolis (MSI), August 5-9 http://www.spscicomp.org
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 7
HPC Community: LCI
Linux Clusters Institute (LCI) ¾ http://www.linuxclustersinstitute.org
¾ Established April, 2001 ¾ ACTC Partnership with NCSA and HPCERC
¾ Mission: education and training for deployment of Linux clusters in the STC community
¾ Primary activity: intensive, hands-on workshops
¾ Upcoming Workshops
Albuquerque, NM (HPCERC) Sep 30-Oct 04, 2002
University of Kentucky, Lexington - Jan 13-17, 2003 The third LCI International Conference on Linux Clusters ¾ October 23-25, 2002; St Petersburg, FL ¾ http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/ Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 8
4 Porting Example - NCSC (Cray)
ACTC Workshop at NCSC
¾ Bring your code and we show you how to port it (5-day course)
¾ 32 researchers attended
¾ All but one were ported by end of the week
the one code not ported was because it was written in PVM, and was not installed on the SP system
NCSC now conducts their own workshops
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 9
Porting Examples
NAVO (Cray)
¾ Production benchmark code required maximum of 380-400 elapsed seconds
ACTC ported and tuned this code to run under 315 seconds on the SP
¾ ACTC Workshop at NAVO
8 production codes ported and optimized by end of week
EPA (Cray)
¾ One ACTC man-month
Convert largest user code to the SP
Codes now run 3 to 6 times faster than they did on T3E
NASA/Goddard
¾ 5 codes ported with help of one ACTC scientist working onsite for two weeks Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 10
5 HPC Tools Requirements
User demands for performance tools
¾ e.g., memory analysis tools, MPI and OpenMP tracing tools, system utilization tools
Insufficient tools on IBM systems
¾ Limited funding for development
¾ Long testing cycle (18 months)
STC Users comfortable with “AS-IS” software
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 11
Performance Tools Challenge
... addi r4,r0,1024 fmr fp0,fp0 C = A + B stfd fp31,-8(SP) (c1, c2) = (a1, a2) 6 (b1, b2) mfspr r0,LR a1=1& a2=1e c1bb1& c2bb2 stfd fp30,-16(SP) b1=1& b2=1e c1ba1& c2ba2 lwz r3,T.30.const(RTOC) for i = 1 : 2, stmw r29,-28(SP) ai =? e ci b bi lwz r31,T.34 bi =? e ci b ai lwz r30,T.36.__gen_index_ ai = bi e ci b ai stw r0,8(SP) otherwise, error addis r0,r0,17200 ...
User’s mental model of program do not match with executed version
¾ Performance tools must be able to revert this semantic gap
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 12
6 ACTC Tools
Goals
¾ Help IBM sales in HPC and increase user satisfaction
¾ Complement IBM application performance tools offerings
Methodology
¾ Development in-house and in collaboration with Universities and Research Labs
¾ When available, use infrastructure under development by the IBM development divisions
¾ Deployment “AS IS” to make it rapidly available to the user community
¾ Close interaction with application developers for quick feedback, targeting functionality enhancements
Focus
¾ AIX and Linux
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 13
ACTC Software: Tools
Performance Tools
¾ HW Performance Monitor Toolkit (HPM)
HPMCount
LIBHPM
HPMViz
CATCH
¾ UTE Gantt Chart
MPI trace visualization
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 14
7 ACTC Software: Collaborative Tools
OMP/OMPI trace and Paraver ¾ Graphical user interface for performance analysis of OpenMP programs ¾ Center for Parallelism of Barcelona (CEPBA) at UPC - Spain SvPablo ¾ Performance Analysis and Visualization system ¾ Dan Reed at University of Illinois Paradyn and Dyninst ¾ Dynamic system for performance analysis
¾ Bart Miller at University of Wisconsin ¾ Jeff Hollingsworth at University of Maryland
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 15
ACTC Software: Libraries
MPItrace ¾ Performance trace library for MPI analysis TurboMPI ¾ Collective communication functions to enhance MPI performance for SMP nodes TurboSHMEM ¾ “Complete” implementation of Cray SHMEM interface to allow easy and well-tuned porting of Cray applications to the IBM RS/6000 SP MIO (Modular I/O) ¾ I/O prefetching and optimization library to enhance AIX Posix I/O handling
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 16
8 Current Research Projects
Hardware Performance Monitor (HPM) Toolkit ¾ Data capture, analysis and presentation of hardware performance metrics for application and system data Correlation of application behavior with hardware components Hints for program optimization Dynamic instrumentation and profiler Simulation Infrastructure to Guide Memory Analysis (SIGMA) ¾ Data-centric tool under development to Identify performance problems and inefficiencies caused by the data layout in the memory hierarchy Propose solutions to improve performance in current and new architectures Predict performance Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 17
HPMVIZ
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 18
9 SIGMA
Where is the memory?
Motivation
¾ Efficient utilization of the memory subsystem is a critical factor to obtain performance on current and future architectures Goals:
¾ Identify bottlenecks, problems, and inefficiencies in a program due to the memory hierarchy
¾ Support serial and parallel applications
¾ Support present and future architectures Sept 12, 2002Support presentACTC - © 2002and - Luiz futureDeRose [email protected] 19
Current Collaboration Efforts
LLNL (CASC) – with Jeff Vetter ¾ Interpreting HW Counters analysis Handling of large volume of multi-dimensional data Correlate hardware events with performance bottlenecks Research Centre Juelich – with Bernd Mohr ¾ Usability and reliability of performance data obtained from performance tools ¾ Feasibility of automaautomatictic performance tools CEPBA (UPC) – with Jesus Labarta ¾ Performance analysis of OpenMP programs ¾ OMP/OMPI Trace & Paraver interface University of Karlsruhe - with Klaus Geers ¾ HPM Collect
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 20
10 Program Analysis - Paraver
x_solve y_solve z_solve rhs Basics: ¾ Functions
¾ Loops
¾ Locks
Hardware counters: ¾ TLB misses ¾ L1 misses ¾ …
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 21
Hardware Counters Analysis
Derived metrics ¾ Instruction mix
%Loads
%Flops
¾ Computational intensity
flops/loads
¾ Program on architecture
miss ratios
FLOPS per L1 miss
¾ Performance
IPC
FPC Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 22
11 Analysis of Decomposition Effects
1 x 12
3 x 4
12 x 1
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 23
Summary
ACTC is a consolidation of IBM’s highest level of technical expertise in HPC
ACTC research is rapidly disseminated to maximize benefits of IBM servers for HPC applications
Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 24
12