CH18 Parallel Processing =Multiple Processor Organization Single Instruction, Single Data Stream

CH18 Parallel Processing =Multiple Processor Organization Single Instruction, Single Data Stream

CH18 Parallel Processing =Multiple Processor Organization • {Multi-processor, Multi-computer} • Single instruction, single data stream - SISD • Multiple Processor Organizations • Single instruction, multiple data stream - SIMD • Symmetric Multiprocessors • Multiple instruction, single data stream - MISD • Cache Coherence and the MESI Protocol • Multiple instruction, multiple data stream- MIMD • Clusters • Non-Uniform Memory Access • Vector Computation TECH Computer Science Single Instruction, Single Data Stream - SISD Parallel Organizations - SISD • Single processor • Single instruction stream • Data stored in single memory • Uni-processor Single Instruction, Multiple Data Stream - SIMD Parallel Organizations - SIMD • Single machine instruction • Controls simultaneous execution • Number of processing elements • Lockstep basis • Each processing element has associated data memory • Each instruction executed on different set of data by different processors • Vector and array processors 1 Multiple Instruction, Multiple Instruction, Single Data Stream - MISD Multiple Data Stream- MIMD • Sequence of data • Set of processors • Transmitted to set of processors • Simultaneously execute different instruction • Each processor executes different instruction sequences sequence • Different sets of data • Never been implemented • SMPs, clusters, and NUMA systems Parallel Organizations - MIMD Shared Parallel Organizations - MIMD Memory Distributed Memory Taxonomy of Parallel Processor Architectures MIMD - Overview • General purpose processors • Each can process all instructions necessary • Further classified by method of processor communication 2 Block Diagram of Tightly Coupled Multiprocessor Tightly Coupled - SMP • Processors share memory • Communicate via that shared memory • Symmetric Multiprocessor (SMP) 4Share single memory or pool 4Shared bus to access memory 4Memory access time to given area of memory is approximately the same for each processor Tightly Coupled - NUMA Loosely Coupled - Clusters • Non-uniform memory access • Collection of independent uni-processors or SMPs • Access times to different regions of memory may • Interconnected to form a cluster differ • Communication via fixed path or network connections =Symmetric Multiprocessors SMP Advantages • A stand alone computer with the following characteristics • Performance 4 Two or more similar processors of comparable capacity 4If some work can be done in parallel 4 Processors share same memory and I/O • Availability 4 Processors are connected by a bus or other internal connection 4 Memory access time is approximately the same for each processor 4Since all processors can perform the same functions, 4 All processors share access to I/O failure of a single processor does not halt the system f Either through same channels or different channels giving paths to same • Incremental growth devices 4 All processors can perform the same functions (hence symmetric) 4User can enhance performance by adding additional 4 System controlled by integrated operating system processors f providing interaction between processors • Scaling f Interaction at job, task, file and data element levels 4Vendors can offer range of products based on number of processors 3 =Organization Classification (network) -Time Shared Bus • Time shared or common bus • Simplest form • Multiport memory • Structure and interface similar to single processor • Central control unit system • Following features provided 4Addressing - distinguish modules on bus 4Arbitration - any module can be temporary master 4Time sharing - if one module has the bus, others must wait and may have to suspend • Now have multiple processors as well as multiple I/O modules Time Share Bus - Advantages Time Share Bus - Disadvantage • Simplicity • Performance limited by bus cycle time • Flexibility • Each processor should have local cache • Reliability 4Reduce number of bus accesses • Leads to problems with cache coherence 4Solved in hardware - see later Multiport Memory – -Multiport Memory {many access ports} Advantages and Disadvantages • Direct independent access of memory modules by • More complex each processor 4Extra login in memory system • Logic required to resolve conflicts • Better performance • Little or no modification to processors or modules 4Each processor has dedicated path to each module required • Can configure portions of memory as private to one or more processors 4Increased security • Write through cache policy 4 -Central Control Unit =Operating System Issues • Funnels separate data streams between • Simultaneous concurrent processes independent modules (PE, Memory, I/O) • Scheduling • Can buffer requests • Synchronization • Performs arbitration and timing • Memory management • Pass status and control • Reliability and fault tolerance • Perform cache update alerting • Interfaces to modules remain the same • e.g. IBM S/370 =Cache Coherence Software Solutions • Problem - multiple copies of same data • Compiler and operating system deal with problem in different caches • Overhead transferred to compile time • Can result in an inconsistent view of memory • Design complexity transferred from hardware to • Write back policy can lead to inconsistency software • Write through can also give problems unless caches • However, software tends to make conservative monitor memory traffic decisions 4Inefficient cache utilization • Analyze code to determine safe periods for caching shared variables Hardware Solution Directory Protocols • Cache coherence protocols • Collect and maintain information about copies of data • Dynamic recognition of potential problems in cache • Run time • Directory stored in main memory • More efficient use of cache • Requests are checked against directory • Transparent to programmer • Appropriate transfers are performed • Directory protocols • Creates central bottleneck • Snoopy protocols • Effective in large scale systems with complex interconnection schemes 5 Snoopy Protocols Write Invalidate • Distribute cache coherence responsibility among • Multiple readers, one writer cache controllers • When a write is required, all other caches of the line • Cache recognizes that a line is shared are invalidated • Updates announced to other caches • Writing processor then has exclusive (cheap) access • Suited to bus based multiprocessor until line required by another processor • Increases bus traffic • Used in Pentium II and PowerPC systems • State of every line is marked as modified, exclusive, shared or invalid • MESI Write Update MESI State Transition Diagram • Multiple readers and writers • Updated word is distributed to all other processors • Some systems use an adaptive mixture of both solutions =Clusters Cluster Benefits • Alternative to SMP • Absolute scalability • High performance • Incremental scalability • High availability • High availability • Server applications • Superior price/performance • A group of interconnected whole computers • Working together as unified resource • Illusion of being one machine • Each computer called a node 6 Cluster Configurations - Standby Server, Cluster Configurations - No Shared Disk Shared Disk Cluster Configurations Operating Systems Issues // • Passive standby • Failure management • Active secondary 4Highly available • Separate servers 4Failover 4Failback • Servers connected to disks • Load balancing • Servers share disks =Non-Uniform Memory Access Clusters v SMP NUMA • Both use multiple processors for high demand • Uniform memory access applications 4All processors have access to all pats of main memory • SMP is easier to manage 4Access time to all regions of memory the same 4Access time by all processors the same • SMP takes less physical space and less power • Non-uniform memory Access • SMP established and stable technology 4All processors have access to all memory using load and • Clusters are better for incremental and absolute store scalability 4Access time depends on region of memory being accessed • Clusters are better for availability 4Different processors access different regions of memory at different speeds • Cache-coherent NUMA 4Cache coherence is maintained 7 CC-NUMA Organization NUMA Pros and Cons • Effective performance at higher level of parallelism than SMP • Not transparently like SMP 4Need software changes • Availability Required Reading • Stallings Chapter 16 8.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us