COSC 7364 Advanced Parallel Computations
Introduction and Organizational Issues
Edgar Gabriel Fall 2009
Edgar Gabriel
Organizational issues (I)
• Classes: – Monday, 2.30pm – 4.00pm, PGH 3 – Wednesday, 2.30pm – 4.00pm, PGH 3 • Evaluation – 2 presentations: 20% each – paper reviews: 20% – project: 40% (might include one more presentation) • In case of questions: – email: [email protected] – Tel: (713) 743 3358 – Office hours: PGH 524, Tue, 11am-12pm or by appointment • All slides available on the website: – http://www.cs.uh.edu/~gabriel/cosc7364_f09/
COSC 6385 – Computer Architecture Edgar Gabriel
1 Goals of this course
• Evaluate new parallel programming paradigms – language features • difficulty to convert a sequential application into a parallel one • ability to express data locality and affinity information • I/O operations – current status • available implementations • difficulty to install and make it run • performance obtained
COSC 6385 – Computer Architecture Edgar Gabriel
Goals of this course (II)
• Codes to be parallelized: – an iterative equation solver using a TFQMR algorithm – a k-means clustering algorithm • sequential C and Fortran versions available for both codes – MPI and OpenMP versions also available
• Platforms of interest: – multi-core processors – clustered environments
COSC 6385 – Computer Architecture Edgar Gabriel
2 Goals of this course (III)
• Programming languages of interest: – Partitioned global address space languages (PGAS) • Unified Parallel C (UPC) • Co-Array Fortran (CoF) and/or Global Arrays (GA) – High Productivity Computing Systems (HPCS) Languages • X10 – Commercial approach • Intel Threading Building Blocks (ITBB) • Performance will be compared to MPI and OpenMP versions of the code
COSC 6385 – Computer Architecture Edgar Gabriel
Procedure
• List of papers will be available on the webpage soon – everybody has to deliver a short review (1-2 pages) for each paper for the scheduled day of the presentation – teams of two persons get assigned to one language • 1st presentation: language features and answers to the questions outlined by me • 2nd presentation: experiences with installing the compiler, initial parallelization for the TFQMR solver – final project report: • report summarizing everything – installation, availability – parallelization of the two codes – performance results COSC 6385 – Computer Architecture Edgar Gabriel
3 Time schedule
• Mon, Aug 24: introduction, organizational issues • Wed, Aug. 26: Process, thread and memory affinity • Mon., Aug. 31: project sign-up, discussion of the TFQMR code
• Sept. 14 and 16: 1 st presentations on the language issues (2 on Mon., 2 on Wed.). Paper reviews due. • Oct. 5 and 7: 1 st project presentations (installation, preliminary parallelization experience of the TFQMR code) • Mon. Oct. 12: handout and discussion of the k-means code • Nov. 2 and 4: 2 nd project presentations (preliminary results with the k-means code) • Nov.30: if required, final project presentations. • Dec. 7: final project reports due with all performance numbers.
COSC 6385 – Computer Architecture Edgar Gabriel
State of the Art: MPI
• distributed memory programming model, i.e. each process executes independently of the other processes • specifies an Application Programming Interface (API) for data exchange between processes – no protocol, i.e. different implementations can not interoperate • notion of process groups and collective communication • supports parallel I/O operations
• can be used to write SIMD and MIMD applications
COSC 6385 – Computer Architecture Edgar Gabriel
4 State of the Art: MPI
• Large number of commercial and public domain implementations available • Very large number of applications and tools available – in fact, the only approach as of today which has been used for applications running on > 10,000 processors • Downside: – programming effort can be fairly significant – some aspects of parallel programming especially on multi-core processors missing (e.g. affinity information, discussed in details in the next lecture) – memory overhead due replicating certain data items (e.g. ghostcells) can be significant
COSC 6385 – Computer Architecture Edgar Gabriel
State of the Art: OpenMP
• shared memory parallel programming paradigm, i.e. all threads share the same address space – data exchange between different threads through main memory • Based on compiler directives and library functions • allows for incremental parallelization – but think of Amdahl’s law! • no notion of parallel I/O • can be used to write SIMD and MIMD style parallel applications – although until recently mostly used for large loops
COSC 6385 – Computer Architecture Edgar Gabriel
5 Software Developer Today and Tomorrow
fault-tolerance scalability cloud/grid computing
I/O virtualization
GPGPUs
power awareness security
COSC 6385 – Computer Architecture Edgar Gabriel multi-core processors
Code Optimization Challenges
• Influence of Hardware Components – Processor, Memory , Network, Storage • Influence of Software Components – Operating system, device drivers, compiler, (MPI) communication library – Application: problem size, communication frequencies • External Influences: – Process placement by batch scheduler – Shared resources: network, file system – Disturbances: OS jitter, process arrival patterns – Virtualization
COSC 6385 – Computer Architecture Edgar Gabriel
6 Energy Related Costs of an HPC center
• Data from the High Performance Computing Center Stuttgart (HLRS) – 2002: 120 KW – 2005: 650 KW (new supercomputer) – 2009: 1250 KW (system upgrade) ~ 1 Mio € / year – 2011: 5000 KW (estimate) ~ 5 Mio € / year • Fraction of Energy and Cooling costs vs. overall budget of the HLRS – 2005: ~12% – 2011: estimated ~33%
COSC 6385 – Computer Architecture Edgar Gabriel Data courtesy of Michael M. Resch, HLRS
Current Status
• MPI-2.1 – released September 4 2008 – merging of MPI-1.1, MPI-1.2, and MPI-2.0 documents – minor corrections and clarifications • MPI-2.2 – scheduled for release in Fall 2009 – corrections, clarifications to the previous standards – new functionality • must not break existing codes • be modest from the implementation perspective
COSC 6385 – Computer Architecture Edgar Gabriel
7 Currently being discussed
• MPI-3 – Fault Tolerance – Hybrid Programming – Collective Operations – New Fortran Bindings – Remote Memory Access (One sided communications) – Tools Support – Active Messages –…
COSC 6385 – Computer Architecture Edgar Gabriel
Current Status
• OpenMP 3.0 – released in June 2008 – New features • Tasking (tasks vs nested parallel regions) • C++ Random access iterators • Loop collapse • Pointer type loop control variable for C and C++ • Threadprivate static class member for C++ • Fortran allocatable arrays • Routines for nesting support •…
COSC 6385 – Computer Architecture Edgar Gabriel
8 Currently being discussed
• Error model: error detection, returning error codes • Interoperability and composability: with other threading models, PGAS languages, MPI • Tools support ( e.g. the collector API) • Locality and affinity: thread to processor mappings, data to thread mappings, code to thread mappings and thread groups are all potential candidates • Accelerators support (unlikely for upcoming releases) • Refinements to tasking model: taskgroups, task dependences, task reductions • Additional synchronization mechanisms (semaphores, non- flushing locks, point-to-point synchronization)
COSC 6385 – Computer Architecture Edgar Gabriel Information courtesy of Barbara Chapman, University of Houston
9