SX-6 Datenblatt Multi Node 28.09.2001 9:53 Uhr Seite 1

Courtesy of HLRS, Stuttgart/Germany, Cave Simulation

Dr. Ulrich Freking, Institut für Theoretische Physik II der Westfälischen-Wilhelms-Universität Münster, Dynamics of Absorbate coverted semiconductor Surfaces

For Information Contact: NEC SX-6 MULTI-NODE ASIA NEC HPC MARKETING SCALABLE TO MEET EVEN UTMOST DEMANDS PROMOTION DIVISION

7-1 Shiba, 5-chome Minato-ku, Tokyo 108-8001 SOFTWARE Japan +81-3-3798-9131 phone +81-3-3798-9132 fax and Applications [email protected]..co.jp

With traditional NEC reliabili- within a node, and portability is EUROPE NEC EUROPEAN ty and the SUPER-UX operating assured by industry-standard MPI SYSTEMS system, now in its 12th year, SX-6 and OpenMP support. NEC’s is designed for production sites to PSUITE Integrated Development Prinzenallee 11 perform production computing. Environment provides all of the D-40549 Düsseldorf Germany All the functions other systems tools and utilities necessary under +49-211-5369-0 phone promise, like checkpoint-restart a single package for project man- +49-211-5369-199 fax or a robust batch environment, agement, editing, compiling, opti- [email protected] are available today. mizing, and test/debugging. PSUITE LATIN AMERICA The SUPER-UX multi-node is available for cross hosting on all NEC DO BRASIL S.A. SX-OFFICE kernel is enhanced to recognize a popular workstation class prod- multi-node job class. When a ucts as well as personal Rua Arabé, 71 CEP 04042-070 V.Clementino multi-node job enters the system, computers to maximize accessibil- São Paulo SP the kernel will sequence all of the ity and development efficiency. Brasil processes across the nodes, ini- The languages, libraries and tools +55-11-5591-7147 phone tialize the IXS page translation available include Fortran90, Open +55-11-5591-7146 fax [email protected] pages for the job, and provide MP, C++. Tools and libraries dedi- specialized scheduling commen- cated to the development of par- OCEANIA surate with the resources being allel and multi-node jobs include NEC AUSTRALIA PTY.LTD. HPCD used. Applications development is MPI, the TotalView debugger and EXTREME PERFORMANCE THROUGH 635 Ferntree Gully Road made easy through the simplicity the Vampir/SX performance ana- Glen Waverly, VIC 3150 VECTOR PROCESSING AND SCALABILITY of shared memory programming lysis tool. Australia +61-3-9262-1209 phone +61-3-9262-1534 fax By combining the ease of use memory and 1 TeraByte per sec- robustness for production sites. [email protected] CONFIGURATION TABLE and efficiency of shared memory ond of inter-node communication Now SX-6 breaks new ground, with the scalability of distributed bandwidth. introducing high-end scalable Selected Machine Configurations multi-node Systems memory systems, NEC created Vector have parallel vector supercomputing to one of the most powerful com- always provided the absolute high- the technical server competitive Model Group Name SX-6/M Model Name 1024M128 512M64 256M32 128M16 64M8 32M4 16M2 8M2 puting systems available today: est performance available. High- space. With powerful shared- The NEC SX-6 multi-node, which performance high-bandwidth memory nodes and unequaled CPU scales up to configurations with memory and powerful processors inter-node communications band- Number of Nodes 128 64 32 16 8 4 2 2 Number of CPUs 1024 512 256 128 64 32 16 8 8 TeraFlops (TF) of peak perfor- deliver maximum performance width the SX-6 is second to none. Peak Vector Performance 8TF 4TF 2TF 1TF 512GF 256GF 128GF 64GF mance, 8 TeraByte (TB) of main on applications and commercial Vector Register 144kb x 1024 144kb x 512 144kb x 256 144kb x 128 144kb x 64 144kb x 32 144kb x 16 144kb x 8 Scalar Register 64 bitsx128x102464 bitsx128x512 64 bitsx128x256 64 bitsx128x128 64 bitsx128x64 64 bitsx128x32 64 bitsx128x16 64 bitsx128x8

Main Memory Unit Memory Architecture Shared/Distributed Memory Max. Capacity 8TB 4TB 2TB 1TB 512GB 256GB 128GB 128GB Peak Data Transfer Rate 32TB/s 16TB/s 8TB/s 4TB/s 2TB/s 1TB/s 512GB/s 256GB/s

Input/Output Processor Max. Number of HIPPI Adapters 512 256 128 64 32 16 8 8 Max. Number of Channels 16256 8128 4064 2032 1016 508 254 254 Peak Data Transfer Rate 1024GB/s 512GB/s 256GB/s 128GB/s 64GB/s 32GB/s 16GB/s 16GB/s

Internode Crossbar Switch Peak Data Transfer Rate 1024GB/s 512GB/s 256GB/s 128GB/s 64GB/s 32GB/s 16GB/s 16GB/s Design: Zimmermann & Jung, Stuttgart, Germany SX-6 Datenblatt Multi Node 28.09.2001 9:53 Uhr Seite 3

SCALABILITY HARDWARE FEATURES TECHNOLOGY

Ultra-high-speed The Worlds first IXS Enabled Global Model Configuration Single-node System Vector and Scalar Unit single-chip Main Memory Global File System

A SX-6 node is a complete par- The SX-6 series single-node The vector unit of the SX-6 The high gate density possible For access to memory across The NEC SX-6 operating system allel vector system consisting of models scale up to 8 CPUs, deliv- series processor consists of vector for the state-of-the-art CMOS nodes the IXS provides page supports SX-GFS, a proprietary up to eight vector processors each ering up to 64 GF of vector per- registers and 8 sets of pipelines technology and LSI design en- translation tables and global data global file system. SX-GFS pre- with 8 GFLOPS of peak perform- formance and offering a maxi- for logical operations, multiplica- abled NEC to implement the vec- movement instructions. Because sents a common file system view ance. The processors are coupled mum of 64 GB main memory in tion, add/shift operations, divi- tor processor on just one chip. of the characteristics of the IXS to the entire NEC SX-6 multi to an uniform shared main mem- shared memory architecture. The sion, masked operations and load/ This LSI and packaging techno- block-memory-move instructions node system. SX-GFS can achieve ory of 64 GB capacity. The power- memory can stream data to each store. The scalar unit achieves logy leads to a performance of offer the highest performance. up to 80% of the performance of ful SMP single-node with uniform processor at 32 GB/s for a total ultra-high-speed performance 8GFLOPS on a single LSI. This All memory is protected by lock- a locally attached similar storage shared memory provides very memory bandwidth of 256 GB/s. through a 4-way super scalar ultrahigh integration leads to and-key mechanisms under con- device for large I/O requests. high levels of performance for design. The combination of the improved internal latencies and trol of the SUPER-UX multi-node NEC offers its Linux-based Ex- both capacity and capability single-chip vector microprocessor performance in comparison with operating system. Hardware sup- press 5800 1160/Xa machine as a requirements. Multi-node config- Multi-node System with a reduced clock cycle decreas- former generation designs, which port tables for memory security file server for GFS installations. urations with distributed memory es the processing time for each used dozens of chips to imple- are located in both node hard- scale beyond the limits of a single From two up to 128 nodes can be instruction. This leads to the supe- ment a processor, as well as highly ware and IXS hardware. Latencies node. The SX-6 series is compa- connected through NEC’s propri- rior short vectors and scalar per- reduced memory latencies by for internode NUMA memory Ease of Installation tible with its predecessor SX-5 etary ultra-high speed Internode formances. drastically narrowing the distance access are less than most worksta- series. It excels in the total bal- Crossbar Switch (IXS). SX-6 series between memory and processors. tion technology NUMA imple- The SX-6 models’ power con- ance of processing performance, multi-node models cover a range mentations, and the 8 gigabyte sumption and space requirements memory throughput, input/out- from 8 to 1024 CPUs with a maxi- Internode Crossbar Switch per second bandwidth of just a have been reduced by 80% when put performance in much the mum peak performance of 8 TF. Main memory Unit single IXS channel exceeds the compared with the previous gen- same way as the former SX series The maximum total system memo- The SX-6 series multi-node con- entire memory bandwidth of most eration of the SX series. The low systems did. Existing applications ry scales to 8 TB. Each node is con- figurations use an exclusive and The SX-6 series utilizes ultra- SMP class systems. This architec- power consumption allows all and resources can be easily nected via an 8 GB/s channel to proprietary internode crossbar high-speed double data rate syn- ture, introduced with the SX-4 models to be fully air-cooled. migrated to the SX-6. the crossbar switch, the maximum switch (IXS) to connect the indi- chronous DRAM. Single-node sys- series, lends itself to a combina- These two elements contribute to throughput of the crossbar being vidual nodes with an ultrahigh tems have a memory capacity of tion of traditional parallel vector a great reduction of installation 1TB/s. Although they are built on throughput and low latencies net- up to 64 GB and a memory band- processing (OpenMP, microtask- costs and complexities. a distributed memory architecture, work. Eight gigabytes per second width 256 GB per second. Multi- ing) combined with message pass- multi-node systems still provide a of bisection bandwidth is available node systems have a memory cap- ing (MPI). Message passing alone single-system-image. Extreme exe- for each node, for a total of maxi- acity of up to 8 Terabytes and a is also highly efficient on the High Reliability cution performance can be ob- mum 1024 gigabytes per second maximum bandwidth of 32 Tera- architecture. tained on a wide range of applica- internode ultra-high-speed data bytes per second. The usage of highly integrated tions with the powerful single-node transfer. CMOS technology has led to systems connected through the IXS. Input/Output Subsystem greatly reduced number of com- ponents in a single system. This, SX-6 Memory module The Input/Output subsystem of in turn, leads to a tremendously the SX-6 series can be configured improved hardware reliability. to deliver a bandwidth of up to 8 GB per second on a single-node system. Multi-node systems scale up to an I/O capacity of 1 TB per SX-6 single-chip vector processor second.