IBM ACTC: Helping to Make Supercomputing Easier
Total Page:16
File Type:pdf, Size:1020Kb
IBM ACTC: Helping to Make Supercomputing Easier Luiz DeRose Advanced Computing Technology Center IBM Research HPC Symposium University of Oklahoma [email protected] Thomas J. Watson Research Center PO Box 218 Sept 12, 2002 © 2002 Yorktown Heights, NY 10598 Outline Who we are ¾ Mission statement ¾ Functional overview and organization ¾ History What we do ¾ Industry solutions and activities Education and training STC community building Application consulting Performance tools research Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 2 1 ACTC Mission ¾ To close the gap between HPC users and IBM ¾ Conduct research on applications for IBM servers within the scientific and technical community Technical directions Emerging technologies ACTC - Research ¾ Software tools and libraries ¾ HPC applications ¾ Research collaborations ¾ Education and training Focus ¾ AIX and Linux platforms Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 3 ACTC Functional Overview Outside Technology: • Linux • Computational Grids ACTC • DPCL, OpenMP, MPI IBM Technology: • Power 4, RS/6000 SP • GPFS Solutions: • PSSP, LAPI • Tools • Libraries Customer Needs: • App Consulting • Reduce Cost of Ownership • Collaboration • Optimized Applications • Education + User Training • Education • User Groups Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 4 2 ACTC History Created in September, 1998 ¾ Emphasis on helping new customers to port and optimize on IBM system ¾ Required establishing relationships with scientists on research level Expanded operations via alignment with Web/Server Division: ¾ EMEA extended in April, 1999 ¾ AP extended (TRL) in September, 2000 ¾ Partnership with IBM Centers of Competency (Server Group) Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 5 ACTC Education 1st Power4 Workshop ¾ Jan. 24,25, 2001 at Watson Research ¾ Full Architecture Disclosure from Austin 1st Linux Clusters Optimization Workshop ¾ May 25--27, 2001 at MHPCC ¾ Application Tuning for IA32 and Myrinet 2001 European ACTC Workshop on IBM SP ¾ Feb. 19,20, 2001 in Trieste (Cineca) ¾ http://www.cineca.it/actc-workshop/index.html 2001 Japan HPC Forum and ACTC Workshop ¾ July 17--19 2001 in Tokyo and Osaka 1st Linux Clusters Institute Workshop ¾ October 1--5, 2001 at UIUC/NCSA (Urbana) IBM Research Internships at Watson Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 6 3 HPC Community: IBM ScicomP Established in 1999 via ACTC Workshop IBM External Customer Organization ¾ Equivalent to “CUG” (Cray User Group) 2000 – US and European groups merge 2001 ¾ 1st European Meeting (SciComp 3) May 8-11, Barcelona, Spain (CEPBA) ¾ US Meeting: October 9-12, Knoxville, TN 2002 - ¾ ScicomP 5, Manchester, England (Daresbury Lab ) ¾ ScicomP 6, Berkeley, CA (NERSC) Joint meeting with SP/XXL 2003 - ¾ ScicomP 7 in Goettingen, Germany, March 3-7 ¾ ScicomP 8 in Minneapolis (MSI), August 5-9 http://www.spscicomp.org Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 7 HPC Community: LCI Linux Clusters Institute (LCI) ¾ http://www.linuxclustersinstitute.org ¾ Established April, 2001 ¾ ACTC Partnership with NCSA and HPCERC ¾ Mission: education and training for deployment of Linux clusters in the STC community ¾ Primary activity: intensive, hands-on workshops ¾ Upcoming Workshops Albuquerque, NM (HPCERC) Sep 30-Oct 04, 2002 University of Kentucky, Lexington - Jan 13-17, 2003 The third LCI International Conference on Linux Clusters ¾ October 23-25, 2002; St Petersburg, FL ¾ http://www.linuxclustersinstitute.org/Linux-HPC-Revolution/ Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 8 4 Porting Example - NCSC (Cray) ACTC Workshop at NCSC ¾ Bring your code and we show you how to port it (5-day course) ¾ 32 researchers attended ¾ All but one were ported by end of the week the one code not ported was because it was written in PVM, and was not installed on the SP system NCSC now conducts their own workshops Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 9 Porting Examples NAVO (Cray) ¾ Production benchmark code required maximum of 380-400 elapsed seconds ACTC ported and tuned this code to run under 315 seconds on the SP ¾ ACTC Workshop at NAVO 8 production codes ported and optimized by end of week EPA (Cray) ¾ One ACTC man-month Convert largest user code to the SP Codes now run 3 to 6 times faster than they did on T3E NASA/Goddard ¾ 5 codes ported with help of one ACTC scientist working onsite for two weeks Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 10 5 HPC Tools Requirements User demands for performance tools ¾ e.g., memory analysis tools, MPI and OpenMP tracing tools, system utilization tools Insufficient tools on IBM systems ¾ Limited funding for development ¾ Long testing cycle (18 months) STC Users comfortable with “AS-IS” software Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 11 Performance Tools Challenge ... addi r4,r0,1024 fmr fp0,fp0 C = A + B stfd fp31,-8(SP) (c1, c2) = (a1, a2) 6 (b1, b2) mfspr r0,LR a1=1& a2=1e c1bb1& c2bb2 stfd fp30,-16(SP) b1=1& b2=1e c1ba1& c2ba2 lwz r3,T.30.const(RTOC) for i = 1 : 2, stmw r29,-28(SP) ai =? e ci b bi lwz r31,T.34 bi =? e ci b ai lwz r30,T.36.__gen_index_ ai = bi e ci b ai stw r0,8(SP) otherwise, error addis r0,r0,17200 ... User’s mental model of program do not match with executed version ¾ Performance tools must be able to revert this semantic gap Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 12 6 ACTC Tools Goals ¾ Help IBM sales in HPC and increase user satisfaction ¾ Complement IBM application performance tools offerings Methodology ¾ Development in-house and in collaboration with Universities and Research Labs ¾ When available, use infrastructure under development by the IBM development divisions ¾ Deployment “AS IS” to make it rapidly available to the user community ¾ Close interaction with application developers for quick feedback, targeting functionality enhancements Focus ¾ AIX and Linux Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 13 ACTC Software: Tools Performance Tools ¾ HW Performance Monitor Toolkit (HPM) HPMCount LIBHPM HPMViz CATCH ¾ UTE Gantt Chart MPI trace visualization Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 14 7 ACTC Software: Collaborative Tools OMP/OMPI trace and Paraver ¾ Graphical user interface for performance analysis of OpenMP programs ¾ Center for Parallelism of Barcelona (CEPBA) at UPC - Spain SvPablo ¾ Performance Analysis and Visualization system ¾ Dan Reed at University of Illinois Paradyn and Dyninst ¾ Dynamic system for performance analysis ¾ Bart Miller at University of Wisconsin ¾ Jeff Hollingsworth at University of Maryland Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 15 ACTC Software: Libraries MPItrace ¾ Performance trace library for MPI analysis TurboMPI ¾ Collective communication functions to enhance MPI performance for SMP nodes TurboSHMEM ¾ “Complete” implementation of Cray SHMEM interface to allow easy and well-tuned porting of Cray applications to the IBM RS/6000 SP MIO (Modular I/O) ¾ I/O prefetching and optimization library to enhance AIX Posix I/O handling Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 16 8 Current Research Projects Hardware Performance Monitor (HPM) Toolkit ¾ Data capture, analysis and presentation of hardware performance metrics for application and system data Correlation of application behavior with hardware components Hints for program optimization Dynamic instrumentation and profiler Simulation Infrastructure to Guide Memory Analysis (SIGMA) ¾ Data-centric tool under development to Identify performance problems and inefficiencies caused by the data layout in the memory hierarchy Propose solutions to improve performance in current and new architectures Predict performance Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 17 HPMVIZ Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 18 9 SIGMA Where is the memory? Motivation ¾ Efficient utilization of the memory subsystem is a critical factor to obtain performance on current and future architectures Goals: ¾ Identify bottlenecks, problems, and inefficiencies in a program due to the memory hierarchy ¾ Support serial and parallel applications ¾ Support present and future architectures Sept 12, 2002Support presentACTC - © 2002and - Luiz futureDeRose [email protected] 19 Current Collaboration Efforts LLNL (CASC) – with Jeff Vetter ¾ Interpreting HW Counters analysis Handling of large volume of multi-dimensional data Correlate hardware events with performance bottlenecks Research Centre Juelich – with Bernd Mohr ¾ Usability and reliability of performance data obtained from performance tools ¾ Feasibility of automaautomatictic performance tools CEPBA (UPC) – with Jesus Labarta ¾ Performance analysis of OpenMP programs ¾ OMP/OMPI Trace & Paraver interface University of Karlsruhe - with Klaus Geers ¾ HPM Collect Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 20 10 Program Analysis - Paraver x_solve y_solve z_solve rhs Basics: ¾ Functions ¾ Loops ¾ Locks Hardware counters: ¾ TLB misses ¾ L1 misses ¾ … Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 21 Hardware Counters Analysis Derived metrics ¾ Instruction mix %Loads %Flops ¾ Computational intensity flops/loads ¾ Program on architecture miss ratios FLOPS per L1 miss ¾ Performance IPC FPC Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 22 11 Analysis of Decomposition Effects 1 x 12 3 x 4 12 x 1 Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 23 Summary ACTC is a consolidation of IBM’s highest level of technical expertise in HPC ACTC research is rapidly disseminated to maximize benefits of IBM servers for HPC applications Sept 12, 2002 ACTC - © 2002 - Luiz DeRose [email protected] 24 12.