STK-6401 the Affordable Mini-Supercomputer with Muscle
Total Page:16
File Type:pdf, Size:1020Kb
STK-6401 The Affordable Mini-Supercomputer with Muscle I I II ~~. STKIII SUPERTEK COMPUTERS STK-6401 - The Affordable Mini-Supercomputer with Muscle The STK-6401 is a high-performance design supports two vector reads, one more than 300 third-party and public mini-supercomputer which is fully vector write, and one I/O transfer domain applications developed for compatible with the Cray X-MP/ 48™ with an aggregate bandwidth of the Cray 1 ™ and Cray X-MP instruction set, including some 640 MB/s. Bank conflicts are reduced architectures. important operations not found to a minimum by a 16-way, fully A UNIX™ environment is also in the low-end Cray machines. interleaved structure. available under CTSS. The system design combines Coupled with the multi-ported vec advanced TTL/CMOS technologies tor register file and a built-in vector Concurrent Interactive and with a highly optimized architecture. chaining capability, the Memory Unit Batch Processing By taking advantage of mature, makes most vector operations run as CTSS accommodates the differing multiple-sourced off-the-shelf devices, if they were efficient memory-to requirements of applications develop the STK-6401 offers performance memory operations. This important ment and long-running, computa equal to or better than comparable feature is not offered in most of the tionally intensive codes. As a result, mini-supers at approximately half machines on the market today. concurrent interactive and batch their cost. 110 Subsystem access to the STK-6401 is supported Additional benefits of this design with no degradation in system per The I/O subsystem of the STK-6401 approach are much smaller size, formance. CTSS manages multiple low power consumption, the ability communicates with central memory concurrent processes for efficient via a high-speed port which is trans to operate with fan cooling, and sharing of the STK-6401's resources. parent to CPU operation. This port intrinsic high reliability. User-controlled preemptive priority has a bandwidth of 160 MB/sec and Central Processing Unit scheduling allows users to control is available to multiple data paths resource allocation and system with individual bandwidths of up to The STK-6401 architecture is based workload, for optimal use of the on five major, tightly-coupled sub 50 MB/sec. Controllers, based on the STK-6401. systems: Instruction Unit, Vector Unit, VMEbus, manage the data flow To simplify the user's interface with Scalar Unit, Memory Unit, and I/O associated with these channels. the system, CTSS provides a single Processor. This structure yields a The flexibility of this approach command language for both interac peak computational rate of 40 MFlDPS enables very high density disk drives tive and batch access; the batch job and high throughputs for a wide to be interfaced easily to the STK-6401, manager - COSMOS - accepts a allowing accommodation of new drives range of applications with various directives file containing commands degrees of vectorizability or inherent as they become available. Currently in the same form as would be used parallelism. both 2.4 MB/sec and 12.5 MB/sec interactively. The Instruction Unit executes the drives are offered. Cray X-MP instruction set, enabling High-performance magnetic tape Program Recovery/ Restart Facility programs currently running on a Cray units, terminals, and networking via To aid recovery and restart, CTSS to be used without change on the both Ethernet (TCP/ IP) and HYPER writes a running program's memory STK-6401. channeFM are supported. image to a file (called a dropfile) The Vector Unit contains a multi The STK-6401's I/O structure also whenever the program is temporarily ported vector register file which makes high bandwidth channels removed from memory or terminates supports as many as 16 word available for customer-specific I/o. abnormally. CTSS creates the dropfile transfers per clock cycle - with a in the user's directory (where it is bandwidth of 2.56 GB/s. It can fully Productive Software Environment accessible to the user) as part of the support all concurrent vector opera The Cray Time Sharing System setup for program execution. tions as well as vector-memory and (CTSS) was specifically designed to When a program terminates abnor vector-scalar data transfers. Hence, give computational .scientists and mally, the dropfile receives the pro peak or near-optimal vector perfor applications developers a highly gram's memory image together with mance can readily be sustained in productive, interactive, supercom all the system information needed to most applications. puting environment. restart execution from the point at The Scalar Unit contains a multi Large, complex computational which it was interrupted. Since the ported scalar register file that sup models can be developed rapidly and dropfile is itself an executable file, ports simultaneous scalar operations efficiently using CTSS' broad range the program may be recovered simply with low latencies. Its 20-MIPS peak of facilities - including advanced by executing the dropfile. performance for 64-bit scalar opera text editing, powerful symbolic tions, supported by the Instruction debugging, fast turnaround of testing, High-Performance Disk 110 Unit which issues instructions at the and interaction with long running Many engineering and scientific maximum rate of one per cycle, codes. applications require very high disk makes the STK-6401 suitable for The STK-6401's sophisticated I/O bandwidth to properly support many scalar oriented applications. FORTRAN applications environment, their computational workload. coupled with bit-for-bit instruction CTSS has been designed to support Central Memory compatibility with the Cray X-MP, exceptionally high I/O rates via a The STK-6401 Memory Unit serves lets users retain their current combination of features: file system the other major subsystems at very FORTRAN applications interface overhead is reduced through a stream high data transfer rates. Its 4-ported while also taking advantage of the lined, optimized disk file index struc- t CONTROL J t t MODUL ES FUN CTIONAL MEMORY VECTOR SCALAR ADDRESS UNITS INSTRUCTION FETCH AND DECODE MEMORY FUNCTIONAL BUSES UNIT BUSES INSTRUCTION .. ... .. ... F.P. VE CTOR BU FFER ... .. ... MULTIPLY - F.P. SC ALAR ADD AND T &S .. ... .. SUBTRACT - MEMORY CE NTRAL .. ADDR ESS F.P. DATA PATH MEMORY RECIPROCAL CONTROL -r-- B&A ... - 1.1 - REGISTERS I I · I I I OTHER· UNITS I 7"- I I · I ADDITION j INPUT/OUTPUT AND I' LOGICAL FUNCTIONAL U UNITS PERIPHERALS AND USER INTERFACE STK·6401 FUNCTIONAL BLOCK DIAGRAM ture; disk positioning overhead is Vectorizing Compiler: The Cray cerned with underlying system and minimized by allocating to new files FORTRAN Compiler (CFT) is an hardware dependencies. the largest possible blocks of con optimizing, vectorizing compiler that MATHUB and OMNIUB provide tiguous disk space; lIO transfers are supports language and library optimized and vectorized basic math optimized by moving data in multi enhancements for vector processing. ematical functions and high-level ples of disk sectors (512 64-bit words); Existing FORTRAN applications pro mathematics and scientific routines, operating system lIO processing grams can therefore take full advan including the Basic Linear Algebra overhead is substantially lowered by tage of the STK-6401's outstanding (BLAS) routines. FORTUB and performing all lIO processing tasks vector performance. CFTUB furnish optimized systems in an intelligent lOP subsystem. CFT enhancements also support support routines, including high Further, CTSS and the FORTRAN other manufacturers' extensions to the performance, asynchronous Run-Time Library fully support ANSI '77 FORTRAN standard, such FORTRAN lIO. asynchronous lIO, thus enabling as the VMS™ FORTRAN extensions. Dynamic Symbolic Debugging: Under applications to take advantage of Consequently CFT assists the pro CTSS the applications developer has computational and lIO overlap. grammer's productivity and maximizes a powerful and convenient means of software execution speed and troubleshooting code - the Dynamic FORTRAN Applications portability. Debug Tool (DDT). Since CTSS Development Environment Scientific Libraries: Under CTSS four allows a program to directly control CTSS provides an efficient FORTRAN system libraries are available to the the execution of another, DDT may environment for the applications pro applications developer. The library be used on a program's dropfile for grammer through a powerful vec interface is optimized to achieve max debugging or for post-mortem dumps torizing compiler, scientific libraries, imum program performance without without having to recompile or relink and dynamic debugging. the programmer having to be con- the applications program. Hardware Specifications Software Specifications Architecture CTSS Operating System Full Cray X-MP/ 48 instruction set. Interactive / batch. Hardware support for scatter/gather, compressed Multi-user. index, and enhanced addressing mode. Hierarchical file system. Process priority levels. Computation Rate Interprocess communication. 40 MFWPS peak vector performance. Windowing capability. - 20 MIPS peak scalar performance. UNIX™ environment. Central Memory FORTRAN Applications 640 MB/s aggregate bandwidth. Development Environment 160 MB/s bandwidth to I/o. CFT (CRAY FORTRAN compiler) Up to 128 MBytes of storage. ANSI '77. Error detection/correction (SECDED). Scalar optimization. Automatic vectorization. Vector Registers (64-Bit) VMS™ FORTRAN extensions.