<<

CH18 Parallel Processing =Multiple Organization • {Multi-processor, Multi-} • Single instruction, single data stream - SISD • Multiple Processor Organizations • Single instruction, multiple data stream - SIMD • Symmetric Multiprocessors • Multiple instruction, single data stream - MISD • Coherence and the MESI Protocol • Multiple instruction, multiple data stream- MIMD • Clusters • Non- • Vector Computation

TECH

Single Instruction, Single Data Stream - SISD Parallel Organizations - SISD • Single processor • Single instruction stream • Data stored in single memory • Uni-processor

Single Instruction, Multiple Data Stream - SIMD Parallel Organizations - SIMD • Single machine instruction • Controls simultaneous execution • Number of processing elements • Lockstep basis • Each processing element has associated data memory • Each instruction executed on different set of data by different processors • Vector and array processors

1 Multiple Instruction, Multiple Instruction, Single Data Stream - MISD Multiple Data Stream- MIMD • Sequence of data • Set of processors • Transmitted to set of processors • Simultaneously execute different instruction • Each processor executes different instruction sequences sequence • Different sets of data • Never been implemented • SMPs, clusters, and NUMA systems

Parallel Organizations - MIMD Shared Parallel Organizations - MIMD Memory

Taxonomy of Parallel Processor Architectures MIMD - Overview • General purpose processors • Each can all instructions necessary • Further classified by method of processor communication

2 Block Diagram of Tightly Coupled Multiprocessor Tightly Coupled - SMP • Processors share memory • Communicate via that • Symmetric Multiprocessor (SMP) 4Share single memory or pool 4Shared to access memory 4Memory access time to given area of memory is approximately the same for each processor

Tightly Coupled - NUMA Loosely Coupled - Clusters • Non-uniform memory access • Collection of independent uni-processors or SMPs • Access times to different regions of memory may • Interconnected to form a cluster differ • Communication via fixed or network connections

=Symmetric Multiprocessors SMP Advantages

• A stand alone computer with the following characteristics • Performance 4 Two or more similar processors of comparable capacity 4If some work can be done in parallel 4 Processors share same memory and I/O • Availability 4 Processors are connected by a bus or other internal connection 4 Memory access time is approximately the same for each processor 4Since all processors can perform the same functions, 4 All processors share access to I/O failure of a single processor does not halt the system f Either through same channels or different channels giving paths to same • Incremental growth devices 4 All processors can perform the same functions (hence symmetric) 4User can enhance performance by adding additional 4 System controlled by integrated processors f providing interaction between processors • Scaling f Interaction at , , file and data element levels 4Vendors can offer range of products based on number of processors

3 =Organization Classification (network) -Time Shared Bus • Time shared or common bus • Simplest form • Multiport memory • Structure and interface similar to single processor • Central system • Following features provided 4Addressing - distinguish modules on bus 4Arbitration - any module can be temporary master 4Time sharing - if one module has the bus, others must and may have to suspend • Now have multiple processors as well as multiple I/O modules

Time Share Bus - Advantages Time Share Bus - Disadvantage • Simplicity • Performance limited by bus cycle time • Flexibility • Each processor should have local cache • Reliability 4Reduce number of bus accesses • Leads to problems with 4Solved in hardware - see later

Multiport Memory – -Multiport Memory {many access ports} Advantages and Disadvantages • Direct independent access of memory modules by • More complex each processor 4Extra login in memory system • Logic required to resolve conflicts • Better performance • Little or no modification to processors or modules 4Each processor has dedicated path to each module required • Can configure portions of memory as private to one or more processors 4Increased security • Write through cache policy

4 -Central Control Unit =Operating System Issues • Funnels separate data streams between • Simultaneous concurrent processes independent modules (PE, Memory, I/O) • • Can buffer requests • Synchronization • Performs arbitration and timing • • Pass status and control • Reliability and fault tolerance • Perform cache update alerting • Interfaces to modules remain the same • .g. IBM S/370

=Cache Coherence Solutions • Problem - multiple copies of same data • Compiler and operating system deal with problem in different caches • Overhead transferred to compile time • Can result in an inconsistent view of memory • Design complexity transferred from hardware to • Write back policy can lead to inconsistency software • Write through can also give problems unless caches • However, software tends to make conservative monitor memory traffic decisions 4Inefficient cache utilization • Analyze code to determine safe periods for caching shared variables

Hardware Solution Directory Protocols • Cache coherence protocols • Collect and maintain information about copies of data • Dynamic recognition of potential problems in cache • Run time • Directory stored in main memory • More efficient use of cache • Requests are checked against directory • Transparent to programmer • Appropriate transfers are performed • Directory protocols • Creates central bottleneck • Snoopy protocols • Effective in large scale systems with complex interconnection schemes

5 Snoopy Protocols Write Invalidate • Distribute cache coherence responsibility among • Multiple readers, one writer cache controllers • When a write is required, all other caches of the line • Cache recognizes that a line is shared are invalidated • Updates announced to other caches • Writing processor then has exclusive (cheap) access • Suited to bus based multiprocessor until line required by another processor • Increases bus traffic • Used in Pentium II and PowerPC systems • State of every line is marked as modified, exclusive, shared or invalid • MESI

Write Update MESI State Transition Diagram • Multiple readers and writers • Updated word is distributed to all other processors

• Some systems use an adaptive mixture of both solutions

=Clusters Cluster Benefits • Alternative to SMP • Absolute • High performance • Incremental scalability • High availability • High availability • applications • Superior price/performance

• A group of interconnected whole • Working together as unified resource • Illusion of being one machine • Each computer called a node

6 Cluster Configurations - Standby Server, Cluster Configurations - No Shared Disk Shared Disk

Cluster Configurations Operating Systems Issues // • Passive standby • Failure management • Active secondary 4Highly available • Separate servers 4Failover 4Failback • Servers connected to disks • balancing • Servers share disks

=Non-Uniform Memory Access Clusters SMP NUMA • Both use multiple processors for high demand • Uniform memory access applications 4All processors have access to all pats of main memory • SMP is easier to manage 4Access time to all regions of memory the same 4Access time by all processors the same • SMP takes less physical space and less power • Non-uniform memory Access • SMP established and stable technology 4All processors have access to all memory using load and • Clusters are better for incremental and absolute store scalability 4Access time depends on region of memory being accessed • Clusters are better for availability 4Different processors access different regions of memory at different speeds • Cache-coherent NUMA 4Cache coherence is maintained

7 CC-NUMA Organization NUMA Pros and Cons • Effective performance at higher level of parallelism than SMP • Not transparently like SMP 4Need software changes • Availability

Required Reading • Stallings Chapter 16

8