Cray XD1 Supercomputer

CRAY XD1 DATASHEET Cray XD1 Supercomputer ■ Purpose-built for HPC — delivers exceptional application The Cray XD1™ supercomputer combines breakthrough performance interconnect, management and reconfigurable computing ■ Affordable power — designed for a broad range of HPC technologies to meet users’ demands for exceptional workloads and budgets performance, reliability and usability. Designed to meet ■ Linux, 32 and 64-bit x86 compatible — runs wide variety of the requirements of highly demanding HPC applications ISV applications and open source codes in fields ranging from product design to weather prediction to ■ Simplified system administration — automates scientific research, the Cray XD1 system is an indispensable configuration and management functions tool for engineers and scientists to simulate and analyze ■ Highly reliable — monitors and maintains system health faster, solve more complex problems, and bring solutions ■ Scalable to hundreds of compute nodes — high bandwidth to market sooner. and low latency let applications scale Direct Connected Processor Architecture Cray XD1 System Highlights The Cray XD1 system is based on the Direct Connected Processor (DCP) architecture, harnessing many processors into a single, unified system to deliver new levels of application performance. Compute Processors Cray’s implementation of the DCP architecture optimizes message-passing applications by 12 AMD Opteron™ 64-bit directly linking processors to each other through a high performance interconnect fabric, processors run Linux and are eliminating shared memory contention and PCI bus bottlenecks. organized as six 2-way SMPs to deliver 58 GFLOPs* per chassis. Finely tuned memory and I/O performance removes bottlenecks and maximizes processor performance. RapidArray™ Interconnect The industry’s fastest embedded switching fabric, the RapidArray interconnect uses 12 custom communications processors and a 96 gigabytes (GB) per second nonblocking switching fabric per chassis to deliver 8 GB per second bandwidth between SMPs with 1.8 microsecond MPI latency. Each chassis presents 24 RapidArray links externally with an aggregate 48 GB per second bandwidth between chassis. Application Acceleration Six Xilinx Virtex-II Pro™ Field Programmable Gate Arrays (FPGAs) per chassis attach to the RapidArray fabric for massively parallel execution of critical algorithm components, promising orders of magnitude performance improvement for target applications. Active Management Highly modular, the Cray XD1 base unit is a chassis. Up to 12 chassis can be installed in a rack. Multirack configurations integrate hundreds of processors into a single system. A management processor in each chassis, a realtime Chassis Each Rack operating system, distributed Compute Processors 12 144 software and an independent Performance 58 GFLOPs* 691 GFLOPs* supervisory network operate together to monitor, control, and Aggregate Switching Capacity 96 GB/s 1152 GB/s manage every aspect of the MPI Interprocessor Latency 1.8 µsec 2.0 µsec computer. The Cray XD1 system Aggregate Memory Bandwidth 77 GB/s 922 GB/s enables single system command and control and provides Maximum Memory 96 GB 1.2 TB extensive high availability features. Maximum Disk Storage 1.5 TB 18 TB * peak theoretical performance with 2.4 GHz AMD Opteron CRAY XD1 DATASHEET The Cray XD1 HPC system is based on the Each chassis includes: RapidArray interconnect delivers outstanding DCP architecture. This innovative new computer • 12 custom communications processors performance for highly parallel applications unifies up to hundreds of processors into • a 96 GB per second nonblocking built upon the common MPI, shmem, and a single, balanced computer, delivering embedded switching fabric Global Arrays standards. exceptional performance, reliability, and • Optimized communications libraries usability. The Cray XD1 architecture includes System Topologies four key subsystems: The Cray XD1 embedded switching fabric • Compute Environment and 24 interchassis links enable a wide variety • RapidArray Interconnect of network topologies. Topologies initially • Application Acceleration 96 GB/s supported include direct connect and fat tree. • Active Management nonblocking switching fabric per chassis Application Acceleration Compute Environment The application acceleration subsystem The Cray XD1 compute subsystem is composed incorporates reconfigurable computing of a high performance Linux operating system RapidArray Communications Processors Tightly coupled to the AMD Opterons and capabilities to deliver superlinear speedup and AMD Opteron 64-bit processors. switching fabric, these processors handle of targeted applications. memory-to-memory copies, global memory Direct Connect Bandwidth Each Cray XD1 chassis can be configured management, and system-wide process The AMD Opteron processor’s integrated with six application acceleration processors, synchronization, freeing the AMD memory controller increases memory Opteron based on Xilinx Virtex-II Pro FPGAs, that can bandwidth and reduces latencies. Three to perform core compute tasks and enabling be programmed to accelerate key algorithms. HyperTransport™ links per processor provide concurrent computing and communication. Well suited to functions such as all_reduce up to 19.2 GB per second I/O bandwidth. The communications processors deliver operations, searching, sorting and signal 8 GB per second bandwidth with 1.8 processing, the application acceleration microsecond MPI latency between SMPs. subsystem acts as a coprocessor to the The communications processors also deliver AMD Opterons, handling the computationally 1.3 microsecond latency on the randomly intensive and highly repetitive algorithms that Linux ordered ring latency test (part of the HPCC can be significantly accelerated through suite). With interconnect bandwidth on par for a wide variety parallel execution. with memory bandwidth, a major system of HPC codes bottleneck is removed for many applications, The application acceleration processors improving performance and greatly simplifying are tightly integrated with Linux and the software development. AMD Opterons and use standard software programming APIs, removing a major obstacle Global Process Synchronization to application speedup. RapidArray Embedded Switching Fabric The system’s Linux scheduler has been A 96 GB per second, nonblocking, crossbar optimized to synchronize processes switching fabric in each chassis provides across the system. By ensuring that an four 2 GB per second links to each two-way application executes in the same timeslot Active Management SMP and twenty-four 2 GB per second system-wide, time spent in wait-states interchassis links. can be minimized, substantially improving The active management subsystem delivers processor efficiency. outstanding usability and reliability through RapidArray Communications Libraries single system command and control and self- The RapidArray communications libraries Standards Based healing capabilities. work in conjunction with the RapidArray The Linux operating system, combined with communications processors to streamline AMD Opteron’s support for 32 and 64-bit x86 Single System Command and Control communications protocols, bypassing the compatible computing, allows users to take The active management system combines Linux kernel where possible and optimizing advantage of a wide range of commercial and partitioning and intelligent self-configuring the compute/communicate interaction. The open-source applications. features to allow administrators to view and manage hundreds of processors as one or more logical computers. Partitions divide the Cray XD1 system RapidArray Interconnect into logical computers. Administrators view The Cray XD1 RapidArray interconnect 1.8 and operate on partitions, rather than on directly connects processors over high-speed, individual SMPs. Self-configuration features low-latency pathways. This DCP architecture microsecond MPI latency automatically translate partition details into eliminates the memory contention and PCI SMP configurations. Transaction processing bus bottlenecks. techniques ensure configuration integrity. Balance The Key to Exceptional Application Performance It takes more than simply adding processors to increase HPC application performance. MPI applications, such as simulation and modeling, are compute- and communicate- intensive. They must perform millions of calculations and then exchange intermediate Single system control is provided over common Fault Detection: In each chassis, a dedicated results before beginning administrative functions: system configuration, management processor with its own supervisory another iteration. monitoring and fault tolerance, software network continuously monitors over 200 critical upgrades, storage management, network hardware functions, including temperatures, To maximize processor management, user management, security, voltages, in-rush currents, parity errors, and performance, this and resource and queue management. These component diagnostics. The management exchange of data must functions may be accessed using the active system also monitors the sanity and operation be fast enough to manager graphical user interface (GUI) or a of the Linux operating system and key internal prevent the compute command line interface (CLI). services such as DNS, NIS, and LDAP. processor from sitting Single system command and control Proactive Management: Sophisticated proactive idle, waiting for data. eliminates common configuration problems, controls

Cray XD1 Supercomputer

Cray XD1™ Supercomputer Release 1.3

Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams

Pubtex Output 2006.05.15:1001

The Gemini Network

INNOVATION for HPC AMD Launches Opteron 6300 Series X86 & Announces 64-Bit ARM Strategy

Taking the Lead in HPC

Blackwidow System Update Blackwidow System Update Performance Update Programming Environment Scalar & Vector Computing Roadmap

CUG 09-Program-042109-Fix NERSC.Indd

Early Evaluation of the Cray XT3 at ORNL,” Proc

Cray Linux Environment™ (CLE) 4.0 Software Release Overview Supplement

Corporate Update “Jim's Top Ten”

Infiniband and 10-Gigabit Ethernet for Dummies