STK-6401 The Affordable Mini- with Muscle

I I II ~~.

STKIII SUPERTEK COMPUTERS STK-6401 - The Affordable Mini-Supercomputer with Muscle

The STK-6401 is a high-performance design supports two vector reads, one more than 300 third-party and public mini-supercomputer which is fully vector write, and one I/O transfer domain applications developed for compatible with the X-MP/ 48™ with an aggregate bandwidth of the Cray 1 ™ and Cray X-MP instruction set, including some 640 MB/s. Bank conflicts are reduced architectures. important operations not found to a minimum by a 16-way, fully A ™ environment is also in the low-end Cray machines. interleaved structure. available under CTSS. The system design combines Coupled with the multi-ported vec­ advanced TTL/CMOS technologies tor register file and a built-in vector Concurrent Interactive and with a highly optimized architecture. chaining capability, the Memory Unit Batch Processing By taking advantage of mature, makes most vector operations run as CTSS accommodates the differing multiple-sourced off-the-shelf devices, if they were efficient memory-to­ requirements of applications develop­ the STK-6401 offers performance memory operations. This important ment and long-running, computa­ equal to or better than comparable feature is not offered in most of the tionally intensive codes. As a result, mini-supers at approximately half machines on the market today. concurrent interactive and batch their cost. 110 Subsystem access to the STK-6401 is supported Additional benefits of this design with no degradation in system per­ The I/O subsystem of the STK-6401 approach are much smaller size, formance. CTSS manages multiple low power consumption, the ability communicates with central memory concurrent processes for efficient via a high-speed port which is trans­ to operate with fan cooling, and sharing of the STK-6401's resources. parent to CPU operation. This port intrinsic high reliability. User-controlled preemptive priority has a bandwidth of 160 MB/sec and Central Processing Unit scheduling allows users to control is available to multiple data paths resource allocation and system with individual bandwidths of up to The STK-6401 architecture is based workload, for optimal use of the on five major, tightly-coupled sub­ 50 MB/sec. Controllers, based on the STK-6401. systems: Instruction Unit, Vector Unit, VMEbus, manage the data flow To simplify the user's interface with Scalar Unit, Memory Unit, and I/O associated with these channels. the system, CTSS provides a single Processor. This structure yields a The flexibility of this approach command language for both interac­ peak computational rate of 40 MFlDPS enables very high density disk drives tive and batch access; the batch job and high throughputs for a wide to be interfaced easily to the STK-6401, manager - COSMOS - accepts a allowing accommodation of new drives range of applications with various directives file containing commands degrees of vectorizability or inherent as they become available. Currently in the same form as would be used parallelism. both 2.4 MB/sec and 12.5 MB/sec interactively. The Instruction Unit executes the drives are offered. Cray X-MP instruction set, enabling High-performance magnetic tape Program Recovery/ Restart Facility programs currently running on a Cray units, terminals, and networking via To aid recovery and restart, CTSS to be used without change on the both Ethernet (TCP/ IP) and HYPER­ writes a running program's memory STK-6401. channeFM are supported. image to a file (called a dropfile) The Vector Unit contains a multi­ The STK-6401's I/O structure also whenever the program is temporarily ported vector register file which makes high bandwidth channels removed from memory or terminates supports as many as 16 word available for customer-specific I/o. abnormally. CTSS creates the dropfile transfers per clock cycle - with a in the user's directory (where it is bandwidth of 2.56 GB/s. It can fully Productive Software Environment accessible to the user) as part of the support all concurrent vector opera­ The Cray Time Sharing System setup for program execution. tions as well as vector-memory and (CTSS) was specifically designed to When a program terminates abnor­ vector-scalar data transfers. Hence, give computational .scientists and mally, the dropfile receives the pro­ peak or near-optimal vector perfor­ applications developers a highly­ gram's memory image together with mance can readily be sustained in productive, interactive, supercom­ all the system information needed to most applications. puting environment. restart execution from the point at The Scalar Unit contains a multi­ Large, complex computational which it was interrupted. Since the ported scalar register file that sup­ models can be developed rapidly and dropfile is itself an executable file, ports simultaneous scalar operations efficiently using CTSS' broad range the program may be recovered simply with low latencies. Its 20-MIPS peak of facilities - including advanced by executing the dropfile. performance for 64-bit scalar opera­ text editing, powerful symbolic tions, supported by the Instruction debugging, fast turnaround of testing, High-Performance Disk 110 Unit which issues instructions at the and interaction with long running Many engineering and scientific maximum rate of one per cycle, codes. applications require very high disk makes the STK-6401 suitable for The STK-6401's sophisticated I/O bandwidth to properly support many scalar oriented applications. applications environment, their computational workload. coupled with bit-for-bit instruction CTSS has been designed to support Central Memory compatibility with the Cray X-MP, exceptionally high I/O rates via a The STK-6401 Memory Unit serves lets users retain their current combination of features: file system the other major subsystems at very FORTRAN applications interface overhead is reduced through a stream­ high data transfer rates. Its 4-ported while also taking advantage of the lined, optimized disk file index struc- t CONTROL J t t MODUL ES

FUN CTIONAL MEMORY VECTOR SCALAR ADDRESS UNITS

INSTRUCTION FETCH AND DECODE

MEMORY FUNCTIONAL BUSES UNIT BUSES INSTRUCTION ...... F.P. VE CTOR BU FFER ...... MULTIPLY -

F.P. SC ALAR ADD AND T &S ...... SUBTRACT -

MEMORY CE NTRAL .. ADDR ESS F.P. DATA PATH MEMORY RECIPROCAL CONTROL -r-- B&A ... - 1.1 - REGISTERS I I · I I I OTHER· UNITS I 7"- I I · I ADDITION j INPUT/OUTPUT AND I' LOGICAL

FUNCTIONAL U UNITS PERIPHERALS AND USER INTERFACE

STK·6401 FUNCTIONAL BLOCK DIAGRAM ture; disk positioning overhead is Vectorizing : The Cray cerned with underlying system and minimized by allocating to new files FORTRAN Compiler (CFT) is an hardware dependencies. the largest possible blocks of con­ optimizing, vectorizing compiler that MATHUB and OMNIUB provide tiguous disk space; lIO transfers are supports language and library optimized and vectorized basic math­ optimized by moving data in multi­ enhancements for vector processing. ematical functions and high-level ples of disk sectors (512 64-bit words); Existing FORTRAN applications pro­ mathematics and scientific routines, lIO processing grams can therefore take full advan­ including the Basic Linear Algebra overhead is substantially lowered by tage of the STK-6401's outstanding (BLAS) routines. FORTUB and performing all lIO processing tasks vector performance. CFTUB furnish optimized systems in an intelligent lOP subsystem. CFT enhancements also support support routines, including high­ Further, CTSS and the FORTRAN other manufacturers' extensions to the performance, asynchronous Run-Time Library fully support ANSI '77 FORTRAN standard, such FORTRAN lIO. asynchronous lIO, thus enabling as the VMS™ FORTRAN extensions. Dynamic Symbolic Debugging: Under applications to take advantage of Consequently CFT assists the pro­ CTSS the applications developer has computational and lIO overlap. grammer's productivity and maximizes a powerful and convenient means of software execution speed and troubleshooting code - the Dynamic FORTRAN Applications portability. Debug Tool (DDT). Since CTSS Development Environment Scientific Libraries: Under CTSS four allows a program to directly control CTSS provides an efficient FORTRAN system libraries are available to the the execution of another, DDT may environment for the applications pro­ applications developer. The library be used on a program's dropfile for grammer through a powerful vec­ interface is optimized to achieve max­ debugging or for post-mortem dumps torizing compiler, scientific libraries, imum program performance without without having to recompile or relink and dynamic debugging. the programmer having to be con- the applications program. Hardware Specifications Software Specifications

Architecture CTSS Operating System Full Cray X-MP/ 48 instruction set. Interactive / batch. Hardware support for scatter/gather, compressed Multi-user. index, and enhanced addressing mode. Hierarchical file system. Process priority levels. Computation Rate Interprocess communication. 40 MFWPS peak vector performance. Windowing capability. - 20 MIPS peak scalar performance. UNIX™ environment.

Central Memory FORTRAN Applications 640 MB/s aggregate bandwidth. Development Environment 160 MB/s bandwidth to I/o. CFT (CRAY FORTRAN compiler) Up to 128 MBytes of storage. ANSI '77. Error detection/correction (SECDED). Scalar optimization. Automatic vectorization. Vector Registers (64-Bit) VMS™ FORTRAN extensions. 8 64-word registers. DDT (Dynamic Debug Tool) 2.56 GB/s aggregate bandwidth. Interactive symbolic debugging without requiring code recompilation. Scalar Registers (64-Bit) User may specify execution breakpoints and trace­ points, and examine and alter values of variables. 8 registers (S). 64 buffer registers (T). LOR (Loader) - Run-time code segmentation. Address Registers (24-Bit) UPDATE (Source code control) 8 address registers (A) . Source code management librarian. - 64 buffer registers (B) . - Audit trail of code changes. - Reversibility of changes. Functional Units LIB (Object and source code control) 13 functional units. - Mix source, data, object, and binary in a single Concurrent operation. library. Floating point, integer, logical operations for Math/Science Libraries vector, scalar, and address operands. - Optimized for maximum run-time performance.

Reference to use of CTSS and CIVIC on the STK-6401 does not imply endorsement by Input/ Output the U.s. Government or the University of California. Reference to CFT, a product of Cray Research Inc. , does not imply endorsement by CRI I/O subsystem supports terminals, tape, disk, Cray 1 and Cray X-MP are trademarks of Cray Research Inc. and networking. UNIX is a trademark of AT&T. VMS is a trademark of Digital Equipment Corp. Disk bandwidth in multiples of 12.5 MB/s. HYPERchannel is a trademark of Network Systems Corp.

Specifications subject to change without noti ce. Printed in U.S.A. 4/87.

STK111 _ s_u_P_E_R_T_E_K_C_O_M_P_U_T_E_R_S_I_N_C ______

MANUFACTURER OF CR AY·COMPATIBL E 2975 Bowers Aue .. S uite 203. Santa C lara. CA 95051 (408) 727·5749 The Supertek S-l Mini-Supercomputer

The S-l is a high-performance mini­ The Scalar Unit contains a multi­ and executes peripheral driver supercomputer which is fully compat­ ported scalar register file that sup­ routines. The central operating ible with the Cray X-MP /416fM ports simultaneous scalar operations system and the laMs communicate instruction set, including some with low latencies. With 20-MIPS via messages and queues. By thus important operations not found in the peak performance for 64-bit scalar shifting the peripheral processing early Cray machines. operations, and an Instruction Unit burden to the I/O Subsystem, the The system design combines which can issue instructions at the central processor is free for high advanced TIL/CMOS technologies rate of one pe.: cycle, the S-l provides performance computation. with a highly optimized architecture. robust scalar processing. The intelligent peripheral I/O By taking advantage of mature, The S-l Memory Unit serves the Processors in each I/O Module multiple-sourced off-the-shelf devices, other major subsystems at very high control various peripheral devices. the S-l offers performance higher data transfer rates. Its 4-ported Included arc high speed disks, tapes, than its competitors at a substantially memory design has an aggregrate printers, terminals, and network lower price. bandwidth of 640 MB/s and supports interfaces. This provides full, stand Additional benefits of this design two vector reads, one vector write, alone functionality for the S-l as wel1 approach are much smaller size, low and one I/O transfer. The memory's as networking connectivity. power consumption, the ability to 16-way, fully interleaved structure operate with fan cooling, and intrinsic reduces bank conflicts to a minimum. high reliability. Coupled with the multi-ported vector Reliability-Availability-Serviceability register file and a built-in vector chaining capability, the Memory Unit Each S-l system incorporates sophis­ Central Processing Unit makes most vector operations run as ' ticated features to support a weIl if they were efficient register-to­ conceived Reliability/Availability / The S-l architecture is based on five register operations. Serviceability program. major, tightly-coupled subsystems: The Master lOP supports an Instruct:on Unit, Vector Unit, Scalar independent Service Processor Unit, Memory Unit, and I/O Subsys­ I/O Subsystem (SP) which controls the S-l's ad­ tem. This structure yields a peak vanced diagnostic subsystem, the computational rate of 40 MFLOPS The S-l Series I/O Subsystem (lOS) central operating system "bootstrap", and high throughput for a wide range is comprised of multiple I/O Modules and S-l CPU intialization. The SP of applications with various degrees of (laMs) each based on the industry­ also maintains a log of the system's vectorizability or inherent parallelism. standard VME bus. The S-l takes detected and corrected errors. A sclf­ The Instruction Unit executes the full advantage of this architecture by contained subsystem, the SP Cray X-MP instruction set, enabling distributing operating system func­ includes its own processor and local programs currently running on a Cray tions across the central processing memory, an 800 MByte disk, cartridge to be used without modification on unit and the multiple laMs. tape drive, and communications porls the S-l. The laMs connect to the S-l via for the operator's console and for The Vector Unit contains a multi­ the high-speed 160 MB/second remote diagnosis. ported vector register file which channel. Each 10M is controlled by a The SP can set and examine the supports as many as 16 word transfers Master 1/0 Processor, and contains state of internal registers and step the per clock cycle ---- with a bandwidth slots for peripheral I/O processors. functional units through execution of 2.56 GB/s. It can fully support all The Master I/O Processor is driven cycles using the independent concurrent vector operations includ­ by a real-time, event-driven operating diagnostic Scanbus. This approach ing vector-memory and vector-scalar system (RTIOSTM) that processes provides quick fault detection with a data transfers. Hence, peak or ncar­ external interrupts and the Central high level of confidence. optimal vector performance can be Processor I/O requests, readily sustained in most applications.

SUPERTEK COMPUTERS,INC. MEMORY FUNCTIONAL BUSES UNIT BUSES

CENTRAL MEMORY

REGISTERS 10 OllfER· UNITS II" · ·

FUNCTIONAL UNITS

PERIPHERALS AND USER INTERFACE

SUPERTEK S-1 FUNCTIONAL BLOCK DIAGRAM

Hardware Specifications

Architecture Scalar Registers (64-Bit) Full Cray X-MP /416 instruction 8 registers (S). set. --- 64 buffer registers (T). Hardware support for scatter / gather, compressed index, and Address Registers (24-Bit) enhanced addressing mode. 8 address registers (A). --- 64 buffer registers (B). Computation Rate --- 40 MFLOPS peak vector Functional Units performance. 13 functional units. 20 MIPS peak scalar perform­ Concurrent operation. ance. Floating point, integer, logical operations for vector, scalar, and Central Memory address operands. 640 MB/s aggregate bandwidth. 160 MB/s bandwidth to I/O. Input/Output Up to 128 MBytes of storage. I/O subsystem supports termi­ Error detection/correction nals tapes, disks, printers, and (SECDED). networking. Disk bandwidth 2.4 MB/s Vector Registers (64-Bit) standard; Optional high-speed Eight 64-word registers. disks in multiples of 12.5 MB/s. --- 2.56 GB/s aggregate bandwidth. Cmy X-UP is a leqislered trademark of Ctay R~lch. \nc. Spt( if~atior>s subject to chonqe without not~e . SUPERTEK COMPUTERS,lNC.

Supertek Computers, Inc., 1975 Dowers Avenue, Suite 203, Santa aata, CA 95051 (408) 721-5149 / PAX (408) 721-5194 S-l -- UNfXTM OPERATING SYSTEM

Supertek UNIX™ with Supercom­ Software Specifications puting Extensions

Supertek UNIXTM sets a new standard UNIX™ Operating System for ease of use and efficiency among AT&T SYSTEM V/IEEE POSIX Standard supercomputer operating systems. Designed specifically for the S-l series Supercomputing extensions. of mini-supercomputers, Supertek Distributed I/O Subsystem. UNIX provides the optimal computing Interactive & Batch Access. environment for engineering and Process & Job Recovery. scientific users. High Performance I/O. User Specified Process Priority Levels. Derived from AT&T UNIX System Multi-user environment. V, Supertek UNIX provides extensive Hierarchical me System. functionality specifically designed to Interprocess communication. support the broad range of applications Windowing capability. in the scientific computing environment. By combining the familiar and proven FORTRAN Applications timesharing capabilities of UNIX with Development Environment Supertek designed extensions to support large-scale, performance intensive sft (Supertek FORTRAN compiler) scientific computing environment, --- ANSI '77. Supertek UNIX creates an outstanding --- Scalar optimization. environment for interactive applications --- Automatic vectorization. development as well as for long running, VMSTM FORTRAN extensions. large production jobs. ddt (Dynamic Debug Tool) Supercomputing features added to Interactive, Source-level, Symbolic debugging without UNIX by Supertek include multi-stream code recompilation. batch processing, asynchronous disk User may specify execution breakpoints/trace­ I/O, a new user-specified priority points, and examine and alter values of variables. scheme, a highly vectorized applications and system runtime environment, upd (Source code control) resource and job accounting facilities, a Source code management librarian process restart and recovery capability Audit trail of code changes. for long running production applications, Reversibility of changes. and a channel-based I/O interface with multiple, independent, intelligent I/O Math/Science Libraries processors. --- Optimized for maximum run-time performance.

UNIX ~ 0 Irodemorlc of AT&T. I'IIS ~ 0 hodemorlc of ~nol [Qui>mtfll Corporolion. SD

SUPERTEK COMPUTERS, INC.

SupeI1ck Computers, Inc., 2975 Dowers Avenue, Suitc 203, Santa Oara, CA 95051 (408) 727-5749 / FAX (408) 727-5794 VME Based S-1 I nputlOutput ·Subsy.stem

CPU Master S.,vle. PIOC .. lor Unit lOP ~ Wltll RTA Disk 2.~ t.l8/1 T,lnl'" Roto lOP U aoo 1.48 C.plelty Control I I PTD , 12.5 1.48/. T,lnl'., Rato I lOP I U 8ao MB Coplelty - ~ ~ ,-'-- ~L-- ,--L-- Tape 200 IPS lOP a 1800/8250 BPI I/) >- ~ Command Chann.1 Terminal 8~ T.rmlnll. -C 0 -::I ~ a. lOP 1~.8 K Saud I-- ~ l- E r- II) II) -11:1 -::I c ~ 0 - ::I Ee> 0 - ::: Printer CD ::I 800 LPt.I 0 a. -c lOP ::I oS D lL. Network Eth.,n.t ... ~ lOP --7 HYPERchIM.1 '--- I....---- <---.:.- HIgh Sp ..d Oat4 Channel - 160 MB/,

Features: • 64 Bit Scientific Minlsupercomputer • Cray XMP/416 Instruction Set Compatible • 40 MFLOPS Peak Vector Performance • 20 MIPS Peak Scalar Performance SUPERTEK • 1,2,4,8,16 MW Memory Configurations COMI'U'I'I-:I1S.INC. • Four Ported Memory