COSC 7364 Advanced Parallel Computations

Introduction and Organizational Issues

Edgar Gabriel Fall 2009

Edgar Gabriel

Organizational issues (I)

• Classes: – Monday, 2.30pm – 4.00pm, PGH 3 – Wednesday, 2.30pm – 4.00pm, PGH 3 • Evaluation – 2 presentations: 20% each – paper reviews: 20% – project: 40% (might include one more presentation) • In case of questions: – email: [email protected] – Tel: (713) 743 3358 – Office hours: PGH 524, Tue, 11am-12pm or by appointment • All slides available on the website: – http://www.cs.uh.edu/~gabriel/cosc7364_f09/

COSC 6385 – Computer Architecture Edgar Gabriel

1 Goals of this course

• Evaluate new parallel programming paradigms – language features • difficulty to convert a sequential application into a parallel one • ability to express data locality and affinity information • I/O operations – current status • available implementations • difficulty to install and make it run • performance obtained

COSC 6385 – Computer Architecture Edgar Gabriel

Goals of this course (II)

• Codes to be parallelized: – an iterative equation solver using a TFQMR algorithm – a k-means clustering algorithm • sequential and Fortran versions available for both codes – MPI and OpenMP versions also available

• Platforms of interest: – multi-core processors – clustered environments

COSC 6385 – Computer Architecture Edgar Gabriel

2 Goals of this course (III)

• Programming languages of interest: – Partitioned global address space languages (PGAS) • Unified Parallel C (UPC) • Co-Array Fortran (CoF) and/or (GA) – High Productivity Computing Systems (HPCS) Languages • X10 – Commercial approach • Intel Threading Building Blocks (ITBB) • Performance will be compared to MPI and OpenMP versions of the code

COSC 6385 – Computer Architecture Edgar Gabriel

Procedure

• List of papers will be available on the webpage soon – everybody has to deliver a short review (1-2 pages) for each paper for the scheduled day of the presentation – teams of two persons get assigned to one language • 1st presentation: language features and answers to the questions outlined by me • 2nd presentation: experiences with installing the compiler, initial parallelization for the TFQMR solver – final project report: • report summarizing everything – installation, availability – parallelization of the two codes – performance results COSC 6385 – Computer Architecture Edgar Gabriel

3 Time schedule

• Mon, Aug 24: introduction, organizational issues • Wed, Aug. 26: , and memory affinity • Mon., Aug. 31: project sign-up, discussion of the TFQMR code

• Sept. 14 and 16: 1 st presentations on the language issues (2 on Mon., 2 on Wed.). Paper reviews due. • Oct. 5 and 7: 1 st project presentations (installation, preliminary parallelization experience of the TFQMR code) • Mon. Oct. 12: handout and discussion of the k-means code • Nov. 2 and 4: 2 nd project presentations (preliminary results with the k-means code) • Nov.30: if required, final project presentations. • Dec. 7: final project reports due with all performance numbers.

COSC 6385 – Computer Architecture Edgar Gabriel

State of the Art: MPI

programming model, i.e. each process executes independently of the other processes • specifies an Application Programming Interface (API) for data exchange between processes – no protocol, i.e. different implementations can not interoperate • notion of process groups and collective communication • supports parallel I/O operations

• can be used to write SIMD and MIMD applications

COSC 6385 – Computer Architecture Edgar Gabriel

4 State of the Art: MPI

• Large number of commercial and public domain implementations available • Very large number of applications and tools available – in fact, the only approach as of today which has been used for applications running on > 10,000 processors • Downside: – programming effort can be fairly significant – some aspects of parallel programming especially on multi-core processors missing (e.g. affinity information, discussed in details in the next lecture) – memory overhead due replicating certain data items (e.g. ghostcells) can be significant

COSC 6385 – Computer Architecture Edgar Gabriel

State of the Art: OpenMP

parallel , i.e. all threads share the same address space – data exchange between different threads through main memory • Based on compiler directives and library functions • allows for incremental parallelization – but think of Amdahl’s law! • no notion of parallel I/O • can be used to write SIMD and MIMD style parallel applications – although until recently mostly used for large loops

COSC 6385 – Computer Architecture Edgar Gabriel

5 Software Developer Today and Tomorrow

fault-tolerance cloud/

I/O virtualization

GPGPUs

power awareness security

COSC 6385 – Computer Architecture Edgar Gabriel multi-core processors

Code Optimization Challenges

• Influence of Hardware Components – Processor, Memory , Network, Storage • Influence of Software Components – , device drivers, compiler, (MPI) communication library – Application: problem size, communication frequencies • External Influences: – Process placement by batch scheduler – Shared resources: network, file system – Disturbances: OS jitter, process arrival patterns – Virtualization

COSC 6385 – Computer Architecture Edgar Gabriel

6 Energy Related Costs of an HPC center

• Data from the High Performance Computing Center Stuttgart (HLRS) – 2002: 120 KW – 2005: 650 KW (new ) – 2009: 1250 KW (system upgrade) ~ 1 Mio € / year – 2011: 5000 KW (estimate) ~ 5 Mio € / year • Fraction of Energy and Cooling costs vs. overall budget of the HLRS – 2005: ~12% – 2011: estimated ~33%

COSC 6385 – Computer Architecture Edgar Gabriel Data courtesy of Michael M. Resch, HLRS

Current Status

• MPI-2.1 – released September 4 2008 – merging of MPI-1.1, MPI-1.2, and MPI-2.0 documents – minor corrections and clarifications • MPI-2.2 – scheduled for release in Fall 2009 – corrections, clarifications to the previous standards – new functionality • must not break existing codes • be modest from the implementation perspective

COSC 6385 – Computer Architecture Edgar Gabriel

7 Currently being discussed

• MPI-3 – Fault Tolerance – Hybrid Programming – Collective Operations – New Fortran Bindings – Remote Memory Access (One sided communications) – Tools Support – Active Messages –…

COSC 6385 – Computer Architecture Edgar Gabriel

Current Status

• OpenMP 3.0 – released in June 2008 – New features • Tasking (tasks vs nested parallel regions) • C++ Random access iterators • Loop collapse • Pointer type loop control variable for C and C++ • Threadprivate static class member for C++ • Fortran allocatable arrays • Routines for nesting support •…

COSC 6385 – Computer Architecture Edgar Gabriel

8 Currently being discussed

• Error model: error detection, returning error codes • Interoperability and composability: with other threading models, PGAS languages, MPI • Tools support ( e.g. the collector API) • Locality and affinity: thread to processor mappings, data to thread mappings, code to thread mappings and thread groups are all potential candidates • Accelerators support (unlikely for upcoming releases) • Refinements to tasking model: taskgroups, task dependences, task reductions • Additional synchronization mechanisms (semaphores, non- flushing locks, point-to-point synchronization)

COSC 6385 – Computer Architecture Edgar Gabriel Information courtesy of Barbara Chapman, University of Houston

9