Packaging the P. Coteus H. R. Bickford Blue Gene/L T. M. Cipolla P. G. Crumley supercomputer A. Gara S. A. Hall As 1999 ended, IBM announced its intention to construct a one- G. V. Kopcsay petaflop supercomputer. The construction of this system was based A. P. Lanzetta on a cellular architecture—the use of relatively small but powerful L. S. Mok building blocks used together in sufficient quantities to construct R. Rand large systems. The first step on the road to a petaflop machine (one R. Swetz quadrillion floating-point operations in a second) is the Blue Genet/L supercomputer. Blue Gene/L combines a low-power T. Takken processor with a highly parallel architecture to achieve P. La Rocca unparalleled computing performance per unit volume. C. Marroquin Implementing the Blue Gene/L packaging involved trading off P. R. Germann considerations of cost, power, cooling, signaling, electromagnetic M. J. Jeanson radiation, mechanics, component selection, cabling, reliability, service strategy, risk, and schedule. This paper describes how 1,024 dual-processor compute application-specific integrated circuits (ASICs) are packaged in a scalable rack, and how racks are combined and augmented with host computers and remote storage. The Blue Gene/L interconnect, power, cooling, and control systems are described individually and as part of the synergistic whole.
Overview implementation of these networks resulted in an Late in 1999, IBM announced its intention to construct a additional link ASIC, a control field-programmable one-petaflop supercomputer [1]. Blue Gene*/L (BG/L) is gate array (FPGA), six major circuit cards, and custom a massively parallel supercomputer developed at the IBM designs for a rack, cable network, clock network, power Thomas J. Watson Research Center in partnership with system, and cooling system—all part of the BG/L core the IBM Engineering and Technology Services Division, complex. under contract with the Lawrence Livermore National Laboratory (LLNL). The largest system now being System overview assembled consists of 65,536 (64Ki1) compute nodes in 64 Although an entire BG/L system can be configured to run racks, which can be organized as several different systems, a single application, the usual method is to partition the machine into smaller systems. For example, a 20-rack each running a single software image. Each node consists (20Ki-node) system being assembled at the IBM Thomas of a low-power, dual-processor, system-on-a-chip J. Watson Research Center can be considered as four application-specific integrated circuit (ASIC)—the rows of four compute racks (16Ki nodes), with a standby BG/L compute ASIC (BLC or compute chip)—and its set of four racks that can be used for failover. Blue associated external memory. These nodes are connected Gene/L also contains two host computers to control the with five networks, a three-dimensional (3D) torus, a machine and to prepare jobs for launch; I/O racks of global collective network, a global barrier and interrupt redundant arrays of independent disk drives (RAID); and network, an input/output (I/O) network which uses switch racks of Gigabit Ethernet to connect the compute Gigabit Ethernet, and a service network formed of Fast racks, the I/O racks, and the host computers. The host, I/O Ethernet (100 Mb) and JTAG (IEEE Standard 1149.1 racks, and switch racks are not described in this paper developed by the Joint Test Action Group). The except in reference to interconnection. Figure 1 shows a top view of the 65,536 compute processors cabled as a 1The unit ‘‘Ki’’ indicates a ‘‘kibi’’—the binary equivalent of kilo (K). See http:// physics.nist.gov/cuu/Units/binary.html. single system.