POWER5 system B. Sinharoy R. N. Kalla microarchitecture J. M. Tendler R. J. Eickemeyer This paper describes the implementation of the IBM POWER5e J. B. Joyner chip, a two-way simultaneous multithreaded dual-core chip, and systems based on it. With a key goal of maintaining both binary and structural compatibility with POWER4e systems, the POWER5 microprocessor allows system scalability to 64 physical processors. A POWER5 system allows both single-threaded and multithreaded execution modes. In single-threaded execution mode, a POWER5 system allows for higher performance than its predecessor POWER4 system at equivalent frequencies. In multithreaded execution mode, the POWER5 microprocessor implements dynamic resource balancing to ensure that each thread receives its fair share of system resources. Additionally, software- settable thread priority is enforced by the POWER5 hardware. To conserve power, the POWER5 chip implements dynamic power management that allows reduced power consumption without affecting performance.
Introduction We first present background information on different IBM introduced the POWER4* systems in 2001 [1]. multithreaded hardware implementations. We then Incorporating many features previously included only provide a brief overview of the POWER5 structure in mainframe systems, POWER4 systems also included and contrast it to a POWER4 system. With that as unique packaging and system architecture. A POWER4 background, we describe the POWER5 simultaneous chip integrates onto a single die two processor cores, a multithreading (SMT) implementation in detail. We then shared second-level cache, a directory for an off-chip describe the memory subsystem and how systems using third-level cache, and the circuitry necessary to more than a single POWER5 chip are formed. As interconnect it with other POWER4 chips to form SMT makes greater use of chip resources, it consumes a system. In addition to the thread-level parallelism greater power. To counteract this, the POWER5 chip inherent in the dual-processor chip, high-instruction- incorporates dynamic power management into its design level parallelism was achieved through an out-of- to allow the processor to save switching power when it is order execution design. not needed without affecting performance. This capability The POWER5* chip is the next-generation chip. is also described. Finally, the reliability, availability, and In designing the POWER5 system, a key goal was to serviceability (RAS) enhancements in POWER5 systems maintain both binary and structural compatibility with are briefly described. existing POWER4 systems to ensure not only that binaries would continue to execute properly, but that all Multithreading background application optimizations would carry forward to newer Typical advanced microprocessors have executed systems. With that as a base requirement, we specified instructions from a single instruction stream. increased performance and other server functional Performance has improved over the years through enhancements in the areas of server virtualization, many architectural techniques, such as caches, branch reliability, availability, and serviceability at both the prediction, and out-of-order execution. These lead to chip and system levels. improved performance at a given processor frequency by In this paper, we describe the approach used in increasing instruction-level parallelism. At the same time, improving performance. We enhanced thread-level through the use of longer pipelines and fewer logic levels parallelism by allowing two threads, or instruction per stage, processor frequencies have been increasing streams, to execute on each of the two processor cores. more rapidly than the technology. Despite the