CRAY SV1 Series Service Update Bulletin
Total Page:16
File Type:pdf, Size:1020Kb
CRAY® SV1 Series Service Update Bulletin CRAY® SV1 Series Systems Last Modified: Today June 1998 System Description The new CRAY SV1 series product line provides the following performance enhancements over the successful CRAY J90 product line: • Significantly improved vector and scalar performance • Improved CPU data paths • Improved cache performance • Enhanced instruction set CRAY SV1 series mainframes use the same power, cooling, and control mechanisms and the same memory module design as CRAY J90 series mainframes. The CRAY SV1 processor module is a field-upgradable replacement for the CRAY J90 or CRAY J90se processor module. The I/O system used with the CRAY SV1 can be either the current VME I/O system on upgraded CRAY J90 series systems or scalable I/O architecture (SIO), which is standard on all new CRAY SV1 series systems. SWS Functionality The SWS-ION 3.9 release introduces some of the features required for the CRAY SV1 mainframe, but it is not fully supported yet. Therefore, although the following commands are present, they should not be used: • bootsv1(8) • checksv1(8) • clearsv1(8) • dumpsv1(8) • haltsv1(8) The CRAY SV1 type is provided in the topology(5) file. However, it should not be used yet. Silicon Graphics Proprietary 1 6/16/98 System Configurations CRAY® SV1 Series Service Update Bulletin System Configurations CRAY SV1-1A System The CRAY SV1-1A system is 16 -processor mainframe cabinet (4 x 4 backplane). Each system has a minimum of 3 cabinets: 1 processing cabinet and 2 peripheral (PC-10) cabinets. The system can include 1 or 2 additional PC-10 cabinets, for a maximum of 4 PC-10 cabinets per system. Silicon Graphics Engineering must approve requests for configurations that include more than 2 additional PC-10 cabinets. The first two PC-10 cabinets must be physically joined to the processing cabinet. Additional PC-10 cabinets may be aligned on either side of the processing cabinet. CRAY SV1-1 System The CRAY SV-1 system is 32 -processor mainframe cabinet (8 x 8 backplane). Each system has a minimum of 3 cabinets: 1 processing cabinet and 2 peripheral (PC-10) cabinets. The system can include 1 or 2 additional PC-10 cabinets, for a maximum of 4 PC-10 cabinets per system. Silicon Graphics Engineering must approve requests for configurations that include more than 2 additional PC-10 cabinets. The first 2 PC-10 cabinets must be physically joined to the processing cabinet. Additional PC-10 cabinets may be aligned on either side of the processing cabinet. CRAY SV1 SuperCluster System The CRAY SV-1 SuperCluster systems have a minimum of 6 cabinets: 4 processing cabinets and 2 peripheral cabinets. This configuration of 6 cabinets is called a cell. Each processing cabinet within a cell is called a node. Each node contains one 8 x 8 backplane with 8 memory modules and 4 to 8 processor modules. The SuperCluster can be expanded to contain 8 cells or 32 processor cabinets with an appropriate number of peripheral cabinets. The number of peripheral cabinets is based on the customer’s configuration requirements. 2 Silicon Graphics Proprietary 6/16/98 CRAY® SV1 Series Service Update Bulletin Processor Module Enhancements Processor Module Enhancements The CRAY SV1 processor module includes two major enhancements: increased vector and scalar performance and increased operating frequency. The CRAY SV1 system is expected to provide approximately 4 times the scalar performance and approximately 5 to 6 times the vector performance of CRAY J90 series systems. These performance improvements result from implementing IBM 6S CMOS ASIC technology, adding a 256-Kbyte cache, increasing the operating frequency, and increasing the number of vector pipes from 1 to 2. The dual-pipe design of the CPU remains compatible with current CRAY J90 series systems, CRAY Y-MP series systems, and CRAY C90 series systems running in Y-MP mode. Increased Operating Frequency The operating frequency for the enhanced portions of the CRAY SV1 CPU has been increased. The enhanced portion of the CPU includes all ASICs except the CI ASIC, which controls I/O and each clock ASIC (MC0 and MC1). Increased Vector Performance The CRAY SV1 vector performance is approximately 5 to 6 times greater than that of a CRAY J90 series system. The CRAY SV1 CPU vector functional units communicate directly with the 256-Kbyte cache with sufficient bandwidth to perform 4 load operations and 2 store operations in each clock period. Increased Scalar Performance The CRAY SV1 scalar performance is approximately 4 times greater than that of a CRAY J90 series system. This improvement results from the use of a large local cache and an increased clock frequency. Improved Cache The CRAY SV1 CPU contains a much larger 256-Kbyte cache, which caches all vector and scalar instruction data. The 128-word cache for the PC ASIC will be removed to simplify the design. This does not have any effect on compatibility with CRAY J90 series or CRAY J90se series processor modules. Silicon Graphics Proprietary 3 6/16/98 Processor Module Enhancements CRAY® SV1 Series Service Update Bulletin Improved CPU Data Paths The CPU data paths to memory operate at the increased clock frequency. This does not increase memory bandwidth, but it does increase the efficiency of the memory subsystem. ASIC Modifications The IBM 6S CMOS ASIC technology enabled designers to combine the PC and VU ASICs into a single ASIC (PV) and also to combine the VA and VB ASICs into a single ASIC (VAB). The PV ASIC contains all the logic for the A and S registers, local real-time clock, instruction buffers, B and T registers, performance monitor, issue control, scalar functional units, vector registers, and associated vector and floating-point arithmetic units. The VAB ASIC contains all the logic to control memory access and arbitrate memory conflicts; send scalar, vector, and I/O requests to memory; and distribute memory read data to the requesting functional unit. Instruction Set Enhancements The dual-pipe design of the CRAY SV1 processor module enables designers to incorporate the bit matrix multiply (BMM) functional unit. The BMM instructions (070ij6, 074ij6) perform a logical multiplication of two matrices, designated A and B, which produces a single-bit result for each pair of elements that are multiplied. This instruction is normally disabled. The execution of this instruction is enabled by setting a configuration bit. If the instruction occurs while disabled, the processor executes a floating-reciprocal instruction. The vector BMM execute instruction (174ij6) is implemented to share ports with the floating-add functional unit. The execution of this instruction is enabled by setting a configuration bit. This instruction is normally disabled. If the instruction occurs while it is disabled, the processor executes a parity instruction. The vector load BMM register instruction (1740j4) is implemented in hardware and loads a maximum of 64 BMM registers. The BMM functional unit shares ports with the floating-add functional unit. This instruction is normally disabled. The execution of this instruction is enabled by setting a configuration bit. If the instruction occurs while it is disabled, the processor executes a floating-reciprocal instruction. 4 Silicon Graphics Proprietary 6/16/98 CRAY® SV1 Series Service Update Bulletin Processor Module Enhancements The vector leading zero instruction (174ij3) is implemented to share the floating-reciprocal/pop/parity functional unit. This instruction is normally disabled. The execution of this instruction is enabled by setting a configuration bit. If the instruction occurs while it is disabled, the processor executes floating-reciprocal instruction. Interchangeability The CRAY SV1 processor module can replace, or co-exist with, either a CRAY J90 or CRAY J90se processor module with no restrictions in any CRAY J90 series chassis. The CRAY SV1 systems use the scalable I/O architecture. It will also be possible to install CRAY SV1 processor modules in a CRAY J90 series system that includes a VME I/O subsystem. Multistreaming Processor The multistreaming processor (MSP) tightly couples four CRAY SV1 CPUs together by way of software (refer to Figure 1). Hardware modifications are not required to implement the MSP. A CRAY SV1 node can contain a maximum of 6 MSPs and a CRAY SV1 cell can contain a maximum of 24 MSPs. The MSP has 8 vector pipes with a peak performance of approximately 4 Gflops. Figure 1. Multistreaming Processor MSP MSP CPUCPU CPU CPU SV1 CPUCPU CPU CPU Processor MEMORY Modules CPUCPU CPU CPU CPUCPU CPU CPU Silicon Graphics Proprietary 5 6/16/98 Service and Support Strategy CRAY® SV1 Series Service Update Bulletin Service and Support Strategy Customer and Professional Services has implemented several support service offerings. Refer to the Customer and Professional Services web site at: http://www.csd.sgi.com/prod/sptservice/ This document is the property of Silicon Graphics, Inc. The use of this document is subject to specific license rights extended by Silicon Graphics, Inc. to the owner or lessee of a Silicon Graphics, Inc. computer system or other licensed party according to the terms and conditions of the license and for no other purpose. Silicon Graphics, Inc. Unpublished Proprietary Information — All Rights Reserved. Autotasking, CF77, CRAY, CRAY-1, Cray Ada, CraySoft, CRAY Y-MP, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD, SUPERCLUSTER, UNICOS, and X-MP EA are federally registered trademarks and Because no workstation is an island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, CRAY-2, Cray Animation Theater, CRAY APP, CRAY C90, CRAY C90D, Cray C++ Compiling System, CrayDoc, CRAY EL, CRAY J90, CRAY J90se, CrayLink, Cray NQS, Cray/REELlibrarian, CRAY S-MP, CRAY SSD-T90, CRAY T3D, CRAY T3E, CRAY T3E-900, CRAY T90, CrayTutor, CRAY X-MP, CRAY XMS, CSIM, CVT, Delivering the power .