THE FLASH MULTIPROCESSOR: DESIGNING A FLEXIBLE AND SCALABLE SYSTEM Jeffrey Scott Kuskin Technical Report No. CSL-TR-97-744 November 1997 This research has been supported by DARPA contract DABT63-94-C-0054. The FLASH Multiprocessor: Designing a Flexible and Scalable System Jeffrey Scott Kuskin CSL-TR-97-744 November 1997 Computer Systems Laboratory Departments of Electrical Engineering and Computer Science Stanford University William Gates Computer Science Building, A-408 Stanford, CA 94305-9040
[email protected] Abstract The choice of a communication paradigm, or protocol, is central to the design of a large- scale multiprocessor system. Unlike traditional multiprocessors, the FLASH machine uses a programmable node controller, called MAGIC, to implement all protocol processing. The architecture of the MAGIC chip allows FLASH to support multiple communication paradigms — in particular, cache-coherent shared memory and high-performance message passing — while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine's global memory, a port to the inter- connection network, an I/O interface, and MAGIC, the custom node controller. The MAGIC chip handles all communication both within the node and among nodes, using hardwired data paths for efficient data movement and a programmable processor opti- mized for executing protocol operations. The result is a system that is flexible and scal- able, yet competitive in performance with a traditional multiprocessor that implements a single communication paradigm completely in hardware. The focus of this dissertation is the architecture, design, and performance of FLASH. Much of the motivation behind the FLASH system and the MAGIC node controller design stems from an examination of the characteristics of protocol code and the architecture of the DASH system, the predecessor to FLASH.