CH18 Parallel Processing =Multiple Processor Organization Single Instruction, Single Data Stream

CH18 Parallel Processing =Multiple Processor Organization • {Multi-processor, Multi-computer} • Single instruction, single data stream - SISD • Multiple Processor Organizations • Single instruction, multiple data stream - SIMD • Symmetric Multiprocessors • Multiple instruction, single data stream - MISD • Cache Coherence and the MESI Protocol • Multiple instruction, multiple data stream- MIMD • Clusters • Non-Uniform Memory Access • Vector Computation TECH Computer Science Single Instruction, Single Data Stream - SISD Parallel Organizations - SISD • Single processor • Single instruction stream • Data stored in single memory • Uni-processor Single Instruction, Multiple Data Stream - SIMD Parallel Organizations - SIMD • Single machine instruction • Controls simultaneous execution • Number of processing elements • Lockstep basis • Each processing element has associated data memory • Each instruction executed on different set of data by different processors • Vector and array processors 1 Multiple Instruction, Multiple Instruction, Single Data Stream - MISD Multiple Data Stream- MIMD • Sequence of data • Set of processors • Transmitted to set of processors • Simultaneously execute different instruction • Each processor executes different instruction sequences sequence • Different sets of data • Never been implemented • SMPs, clusters, and NUMA systems Parallel Organizations - MIMD Shared Parallel Organizations - MIMD Memory Distributed Memory Taxonomy of Parallel Processor Architectures MIMD - Overview • General purpose processors • Each can process all instructions necessary • Further classified by method of processor communication 2 Block Diagram of Tightly Coupled Multiprocessor Tightly Coupled - SMP • Processors share memory • Communicate via that shared memory • Symmetric Multiprocessor (SMP) 4Share single memory or pool 4Shared bus to access memory 4Memory access time to given area of memory is approximately the same for each processor Tightly Coupled - NUMA Loosely Coupled - Clusters • Non-uniform memory access • Collection of independent uni-processors or SMPs • Access times to different regions of memory may • Interconnected to form a cluster differ • Communication via fixed path or network connections =Symmetric Multiprocessors SMP Advantages • A stand alone computer with the following characteristics • Performance 4 Two or more similar processors of comparable capacity 4If some work can be done in parallel 4 Processors share same memory and I/O • Availability 4 Processors are connected by a bus or other internal connection 4 Memory access time is approximately the same for each processor 4Since all processors can perform the same functions, 4 All processors share access to I/O failure of a single processor does not halt the system f Either through same channels or different channels giving paths to same • Incremental growth devices 4 All processors can perform the same functions (hence symmetric) 4User can enhance performance by adding additional 4 System controlled by integrated operating system processors f providing interaction between processors • Scaling f Interaction at job, task, file and data element levels 4Vendors can offer range of products based on number of processors 3 =Organization Classification (network) -Time Shared Bus • Time shared or common bus • Simplest form • Multiport memory • Structure and interface similar to single processor • Central control unit system • Following features provided 4Addressing - distinguish modules on bus 4Arbitration - any module can be temporary master 4Time sharing - if one module has the bus, others must wait and may have to suspend • Now have multiple processors as well as multiple I/O modules Time Share Bus - Advantages Time Share Bus - Disadvantage • Simplicity • Performance limited by bus cycle time • Flexibility • Each processor should have local cache • Reliability 4Reduce number of bus accesses • Leads to problems with cache coherence 4Solved in hardware - see later Multiport Memory – -Multiport Memory {many access ports} Advantages and Disadvantages • Direct independent access of memory modules by • More complex each processor 4Extra login in memory system • Logic required to resolve conflicts • Better performance • Little or no modification to processors or modules 4Each processor has dedicated path to each module required • Can configure portions of memory as private to one or more processors 4Increased security • Write through cache policy 4 -Central Control Unit =Operating System Issues • Funnels separate data streams between • Simultaneous concurrent processes independent modules (PE, Memory, I/O) • Scheduling • Can buffer requests • Synchronization • Performs arbitration and timing • Memory management • Pass status and control • Reliability and fault tolerance • Perform cache update alerting • Interfaces to modules remain the same • e.g. IBM S/370 =Cache Coherence Software Solutions • Problem - multiple copies of same data • Compiler and operating system deal with problem in different caches • Overhead transferred to compile time • Can result in an inconsistent view of memory • Design complexity transferred from hardware to • Write back policy can lead to inconsistency software • Write through can also give problems unless caches • However, software tends to make conservative monitor memory traffic decisions 4Inefficient cache utilization • Analyze code to determine safe periods for caching shared variables Hardware Solution Directory Protocols • Cache coherence protocols • Collect and maintain information about copies of data • Dynamic recognition of potential problems in cache • Run time • Directory stored in main memory • More efficient use of cache • Requests are checked against directory • Transparent to programmer • Appropriate transfers are performed • Directory protocols • Creates central bottleneck • Snoopy protocols • Effective in large scale systems with complex interconnection schemes 5 Snoopy Protocols Write Invalidate • Distribute cache coherence responsibility among • Multiple readers, one writer cache controllers • When a write is required, all other caches of the line • Cache recognizes that a line is shared are invalidated • Updates announced to other caches • Writing processor then has exclusive (cheap) access • Suited to bus based multiprocessor until line required by another processor • Increases bus traffic • Used in Pentium II and PowerPC systems • State of every line is marked as modified, exclusive, shared or invalid • MESI Write Update MESI State Transition Diagram • Multiple readers and writers • Updated word is distributed to all other processors • Some systems use an adaptive mixture of both solutions =Clusters Cluster Benefits • Alternative to SMP • Absolute scalability • High performance • Incremental scalability • High availability • High availability • Server applications • Superior price/performance • A group of interconnected whole computers • Working together as unified resource • Illusion of being one machine • Each computer called a node 6 Cluster Configurations - Standby Server, Cluster Configurations - No Shared Disk Shared Disk Cluster Configurations Operating Systems Issues // • Passive standby • Failure management • Active secondary 4Highly available • Separate servers 4Failover 4Failback • Servers connected to disks • Load balancing • Servers share disks =Non-Uniform Memory Access Clusters v SMP NUMA • Both use multiple processors for high demand • Uniform memory access applications 4All processors have access to all pats of main memory • SMP is easier to manage 4Access time to all regions of memory the same 4Access time by all processors the same • SMP takes less physical space and less power • Non-uniform memory Access • SMP established and stable technology 4All processors have access to all memory using load and • Clusters are better for incremental and absolute store scalability 4Access time depends on region of memory being accessed • Clusters are better for availability 4Different processors access different regions of memory at different speeds • Cache-coherent NUMA 4Cache coherence is maintained 7 CC-NUMA Organization NUMA Pros and Cons • Effective performance at higher level of parallelism than SMP • Not transparently like SMP 4Need software changes • Availability Required Reading • Stallings Chapter 16 8.

Load more