The IBM PERCS Project: Hardware-Software Co
Total Page:16
File Type:pdf, Size:1020Kb
The IBM PERCS Project: Hardware- Software Co-design of a Supercomputer for High Programmer Productivity Kemal Ebcioglu Co-leader, Programming Model and Tools Area IBM PERCS Project IBM Research Email: [email protected] This work has been supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract No. NBCH30390004 Disclaimer The research material described in this presentation implies no commitment regarding future IBM software or hardware products. WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 2 PERCS Programming Model & Tools Team X10 PERCS Productivity – Philippe Charles – Catalina Danis – Chris Donawa – Christine Halverson – Kemal Ebcioglu – Wendy Kellogg – Christian Grothoff – Allan Kielstra University partners – Christoph von Praun – MIT – Vijay Saraswat – Purdue University – Vivek Sarkar – UC Berkeley PERCS Tools – U. Delaware – Marina Biberstein – U. Illinois – Bill Chung – U. Pittsburgh – Robert Fuhrer – UT Austin – Matthias Hauswirth – Vanderbilt University – Eugen Nistor – Peter Sweeney Leads – Beth Tibbitts – Mootaz Elnozahy (PERCS Principal Investigator) – Frank Tip – Rama Govindaraju (IBM STG software lead) – Mandana Vaziri – Vivek Sarkar, Kemal Ebcioglu (IBM Research PERCS – Justin Xue Programming Model & Tools leads) WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 3 Talk outline Programming productivity trends in HPC Overview of PERCS project and X10 language Productivity experiments with X10 PERCS hardware-software co-design research agenda – Programmer productivity features – Scalability/performance features – Virtualization features Summary and future challenges WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 4 Programming productivity trends in High Performance Computing Step function breakthroughs in productivity have historically occurred rarely, with fierce natural selection – Fortran; “structured programming” in early days – Integrated Debugging Environments – Safe OO programming – Re-usable components and model-based design – Separation of concerns-Aspect oriented programming (jury still out) The HPC community has stayed behind advances in commercial software engineering – C/C++/Fortran/MPI – command line tools – Very performance driven: functionality and performance concerns often tangled – Modern HPC machine complexities lead to expertise gap: • Only a small percentage of employees are able to produce HPC software on a deadline • [Sarkar, Williams, Ebcioglu 2004] WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 5 Productivity crisis in future scalable computing systems Memory wall: Severe non- Frequency wall: Multiple layers of uniformities in bandwidth & hierarchical heterogeneous latency in memory hierarchy parallelism to compensate for slowdown in frequency scaling Proc Cluster Proc Cluster Clusters (scale-out) PEs, PEs, PEs, PEs, SMP L1 $ . L1 $ L1 $ . L1 $ . Multiple cores on a chip L2 Cache L2 Cache Coprocessors (SPUs) . SMTs SIMD ILP Attempting to overcome these . L3 Cache . obstacles in a 10^5 processor Memory future system reduces productivity: - Lengthen SW life cycle - Increase in expertise gap WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 6 High Productivity Computing Systems (slide from DARPA) Goal: ¾ Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2010) Impact: z Performance (time-to-solution): speedup critical national security applications by a factor of 10X to 40X z Programmability (idea-to-first-solution): reduce cost and time of developing application solutions z Portability (transparency): insulate research and operational application software from system z Robustness (reliability): apply all known techniques to protect against outside attacks, hardware faults, & HPCS Program Focus Areas programming errors Applications: z Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology FillFill thethe CriticalCritical TechnologyTechnology andand CapabilityCapability GapGap TodayToday (late(late 80’s80’s HPCHPC technology)…..technology)…..toto…..Future…..Future (Quantum/Bio(Quantum/Bio Computing)Computing) Overview of IBM PERCS Project PERCS: DARPA-sponsored hardware-software project – 10X productivity improvement – grand challenge – Multi-petaflop performance – more than 100K processors – Commercial viability as an IBM product – Project addresses all levels of hardware-software stack: from circuits to programming model The PERCS productivity strategy – New programming model for scalability and productivity, with embodiment in X10 language – Integrated tools for reduced time-to-solution, built on open- source Eclipse framework – Productivity model and measurement tools, with a focus on addressing the expertise gap WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 8 X10 design goals (productivity) By design, X10 aims to rule out large classes of bugs – Building on proven baseline OO productivity features: • Type safety, memory safety, pointer safety, portability, maintainability – New X10 language features help avoid concurrency errors: e.g. eliminating deadlock with X10 clocks (generalized barriers) Concise specification of distributed aggregate operations – For rapid prototyping Language features for productivity, e.g.: – Atomic sections • Free the programmer from the complexity of lock management – Rooted exception model: • Handling errors from deeply nested parallel asynchronous activities – Integrated fine grain parallelism inside a place and across places • Going beyond the SPMD model Ability to re-use legacy software components Eclipse based tool chain for race detection, refactoring, performance optimization, visualization WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 9 X10 Design Goals (scalability) Aiming to scale X10 programs to O(10^5) threads – Language constructs for explicitly programming non-uniform data accesses • Performance transparency for remote accesses – Ability to specify high degrees of asynchronous parallelism – Scalable memory consistency and synchronization primitives – Automatic and semi-automatic performance optimization based on dynamic feedback • Continuous Program Optimization WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 10 An X10 example program: (HPC Challenge) RandomAccess public boolean run() { dist D = dist.factory.block([0:TABLE_SIZE-1]); Allocate and initialize Table as a final long[.] Table = new long[D] (point [i]) { return i; } block-distributed array. final long[.] RanStarts = new long[dist.factory.unique()] Allocate and initialize RanStarts with (point [i]) { return starts(i*N_UPDATES_PER_PLACE);}; one random number seed for each final long value[.] SmallTable= place. new long value[[0:TABLE_SIZE-1]] (point [i]) {return i*S_TABLE_INIT;}; Allocate a small immutable table that can be copied to all places. finish ateach (point [i] : RanStarts ) { Everywhere in parallel, repeatedly long ran = nextRandom(RanStarts[i]); generate random Table indices and for (point [count]: [1:N_UPDATES_PER_PLACE]) { atomically read/modify/write Table final int J = f(ran); element. final long K = SmallTable[g(ran)]; async (D[J]) atomic Table[J] ^= K; ran = nextRandom(ran); } } return Table.sum() == EXPECTED_RESULT; } WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 11 Current X10 Status and Schedule X10 status and schedule 6/2003 PERCS programming model concept 2/2004 Kickoff of X10 as concrete embodiment of PERCS Prog Model 7/2004 First draft of X10 language specification 2/2005 First (unoptimized) X10 prototype – reference implementation 7/2005 X10 application and productivity studies 3Q2005 Start participation in High Productivity Language “consortium” 1/2006 Second (optimized) X10 prototype 6/2006 Open source release of X10 reference implementation Structure of X10 reference implementation Code X10 Templates X10 Multithreaded Grammar Annotated Target RTS AST Native AST Java code Parser Analysis passes Code emitter JVM X10 source PEM Events WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 12 Productivity experiments with X10 Major study involving 27 subjects, 05/23/2005 – 05/27/2005 – Mostly CS and Science students at Pittsburgh Supercomputing Center Sequence alignment problem in bio-informatics (SSCA#1) – Suggested by David Bader, and refined by PSC bio-informatics experts – Given the sequential version of the code, parallelize it. Three conditions studied on a 3000-processor Alpha supercomputer (lemieux) at PSC – C+MPI – X10 (Parallel execution through emulation only) – UPC 4.5 day experiment – Two days of tutorials taught by experts, hands-on exercises – Two days of coding under observation (both human observation and automated recording of activity) – ½ day exit interview Experiment professionally run by the IBM Research Social Computing Group and PSC team. – Subjects anonymous to the technical team, known as X1, X2,… – All interactions were recorded. … Feedback from study now influencing x10 language and tools design – Unique approach to validating and improving language and tools productivity WASP 2005 Keynote Kemal Ebcioglu September 22, 2005 13 Development Time (slide from PSC team) MPI and UPC both exhibited Development time from the first serial run (if performed) or larger maximum and median parallel construct to the first correct parallel output (where obtained) or to end (where no parallel output obtained) times to correct parallel outputs. Variability of development times for