Parallele Programmierung

Cache Coherence 3) Programming Memory-Coupled Memory Consistency Variable Analysis Systems OpenMP Examples of Memory- . Page 1 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz Appearances • symmetric multiprocessors (SMP): – processors of the same type – connected to a shared global memory via a bus or a crossbar • distributed (virtual) shared-memory systems (DSM/VSM): – a shared global address space – physically distributed memory Cache Coherence Memory Consistency • properties: Variable Analysis – simpler to program OpenMP – granularity: from program to block level Examples of Memory- . • several models, depending on physical arrangement of memory Page 2 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz Appearances (cont’d) • Uniform Memory Access (UMA): – same memory access behaviour of all processors to one shared memory – same access times of all processors to all stored data – no distinction of local or remote memory – local caches frequent – examples: SGI Power Challenge, Sun SPARCstation 10 and 20 Cache Coherence • Non-Uniform Memory Access (NUMA): Memory Consistency Variable Analysis – one shared global address spaces OpenMP – but physically distributed memory units Examples of Memory- . – different access times depending on location of the data (local or remote) – often even hierarchy of access times due to network topology – example: CM-5 (fat-tree topology) Page 3 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz Appearances (cont’d) • Cache-Coherent Non-Uniform Memory Access (CC-NUMA): – all access traffic done via local cache – cache-coherence ensured system-wide – examples: SGI Origin, HP/Convex SPP series • Cache-Only Memory Architecture (COMA): – special case of CC-NUMA Cache Coherence – all memory treated as cache, data are migrating Memory Consistency Variable Analysis – examples: KSR-1, KSR-2 OpenMP • Non-Cache-Coherent Non-Uniform Memory Access (NCC-NUMA): Examples of Memory- . – remote access not done via cache – cache-coherence has to be ensured explicitly – examples: Cray T3D and T3E Page 4 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz 3.1. Cache Coherence Definitions • problem: independent access of different processors with local cache to shared data may cause validity problems (several simultaneous copies or instances of the data) • cache coherence: Cache coherence is obtained if the results of a parallel program behave as if there were a total ordering of all memory accesses satisfying: 1. This total order is consistent with the program order for accesses to that memory unit from any processor. Cache Coherence 2. The value returned by a ‘READ’ is the last value written in the total ordering – Memory Consistency the system must not provide out-dated values. Variable Analysis • consistency: OpenMP A system is consistent if all existing copies of a memory word (in main memory and Examples of Memory- . caches) are identical. • How do inconsistencies occur? – A change in a cache is not immediately realized in main memory (copy-back or write-back policy, in contrast to write-through policy). • system-wide permanent consistency is expensive • Temporary inconsistencies can be tolerated, if cache-coherence is ensured. • for that: cache-coherence protocols: – write-update protocol: modification of one copy leads to modifications of all Page 5 of 46 other copies (before next access at the latest) Parallel and High-Performance – write-invalidate protocol: modification of one copy causes all other copies to be Computing declared ’invalid’ 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz A Cache-Coherence Strategy: Bus Snooping • bus snooping: a famous and widespread strategy • field of application: SMP with local caches, connected to the shared main memory via a bus • principle: all processors tap the bus, they snoop the bus for addresses put on the bus by the other processors • If a processor notices an address available in its local cache, too, the following steps are executed: Cache Coherence – In case of a detected ‘WRITE’ and a non-modified local copy, the local copy is Memory Consistency invalidated. Variable Analysis OpenMP – In case of a detected ‘READ’ or ‘WRITE’ and a modified local copy, the bus transfer is interrupted. First, the local copy is written to main memory (a direct Examples of Memory- . cache-to-cache transfer is not that frequent); then, the interrupted transfer is continued. • Hence, cache-coherence (the temporal order) is ensured! • suitable cache-coherence protocol for bus snooping: MESI protocol Page 6 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz MESI Protocol • MESI protocol: standard protocol for bus snooping • cache-coherence protocol of the write-invalidate type • each block in each cache is assigned one of four possible states: – exclusive modified: there has been a ‘WRITE’ modification, but the block is the only copy in any of the caches – exclusive unmodified: there have been only ‘READ’ accesses to this block, and Cache Coherence the block is the only copy in any of the caches Memory Consistency – shared unmodified: there is more than one copy in the different caches, but only Variable Analysis ‘READ’ accesses so far OpenMP – invalid: the values in the local cache block have been declared invalid Examples of Memory- . • any kind of action may lead to a state transition: – data is needed locally (for a ‘READ’ or a ‘WRITE’) and may be available locally or not – a data transfer with local copies involved is snooped on the bus Page 7 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz MESI Protocol (cont’d) Cache Coherence Memory Consistency Variable Analysis OpenMP Examples of Memory- . Page 8 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz MESI Protocol (cont’d) • legend: – RH (Read Hit): the data needed locally is available locally – RMS (Read Miss Shared): the data needed locally is not available locally, and there exist other copies – RME (Read Miss Exclusive): the data needed locally is not available locally, but there exist no other copies – WH (Write Hit): the data to be modified locally is available locally Cache Coherence – WM (Write Miss): the data to be modified locally is not available locally Memory Consistency – SHR (Snoop Hit on a Read): an address of the block is snooped on the bus in a Variable Analysis ‘READ’ request OpenMP – SHW (Snoop Hit on a Write or Read-with-intent-to-modify): an address of the Examples of Memory- . block is snooped on the bus in a ‘WRITE’ request – dirty line copy back: interrupt bus transfer; the interrupting processor stores his copy to main memory, before restarting the interrupted transfer – invalidate transaction: the other processors are informed of a modification and caused to invalidate their copies – Read-with-intent-to-modify: causes invalidation of potential other copies – cache line fill: fill cache with missing data Page 9 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz MESI Protocol: Example 1 (READ) • scenario: – some processor wants to read some invalidated data in its local cache – hence, we have a RM (Read Miss), and the data is transferred from main memory • three possible cases: – the cache block was not in any other processor’s cache: hence, an RME is done, the cache block is loaded, and its new state is exclusive Cache Coherence unmodified (as long as no other processor loads the block into its cache, and as Memory Consistency long as we have only RH’s of the local processor) Variable Analysis – the cache block is in another processor’s cache, with an exclusive unmodified or OpenMP shared unmodified state: Examples of Memory- . then, an RMS is done, and all involved cache memories switch this block’s state or attribute to shared unmodified (via the snooping action SHR) – if another cache memory owns this block as exclusive modified, then the address is detected via bus snooping (SHR), the bus transaction is interrupted, the cache block is written to main memory (dirty line copy back), the state there is set to shared unmodified, and the read operation is repeated Page 10 of 46 Parallel and High-Performance Computing 3. Programming Memory-Coupled Systems Hans-Joachim Bungartz MESI Protocol: Example 2 (WRITE) • scenario: – a processor wants to write some data into its cache – if in state exclusive modified: WH, no further action necessary (due to write-back policy no immediate copying to main memory) – else: give address on bus to allow snooping; three possible cases: * block not yet in cache, state invalid: send a Read-with-intent-to-modify on the bus, all other caches snoop (SHW) Cache Coherence and switch their state to invalid, if it was shared unmodified or exclusive un- Memory Consistency modified before (no more direct READ possible); the block is loaded from Variable Analysis main memory and gets the attribute exclusive modified; if there was an ex- OpenMP clusive modified elsewhere, bus transfer is interrupted and main memory is Examples of Memory- . updated * block in cache, state exclusive unmodified: change state to exclusive modified * block in cache, state shared unmodified: send an invalidate transaction on the bus to cause the respective caches to swith their

Load more