the K computer System overview

Atsuya Uno

System Development Team, Development Group Next-Generation R&D Center (The Institute of Physical and Chemical Research)

SC'11 K computer, “京” ,is …

• “京 [kei]” is a nickname of the next-generation supercomputer system – The name was chosen from public applications in July last year.

– “京” stands for a unit corresponding to 10 P, which is also the performance target of our project.

• No.1 system on the latest TOP500 list. – 10.51 PFlops in LINPACK benchmark (peak : 11.28PFlops) – Efficiency : 93.2% – Execution time : 29hours 28minutes. – Power consumption : 12.66 MW

SC'11 (1) Design targets (in 2006) and Schedule • 10 peta-FLOPS in LINPACK benchmark • Peta-FLOPS sustained performance in real applications • Low power consumption system • Highly reliable system

FY2006 FY2007 FY2008 FY2009 FY2010 FY2011 FY2012

Conceptual System Prototype,Prototype,Production, installation,Production, installation, TuningTuning and and Detailed designDetailed design improvement design evaluationevaluationand adjustment and adjustment improvement

Computer Design ConstructionConstruction building Design

Buildings Research DesignDesign Construction building Construction the present

SC'11 (2) System Overview of the K computer

SC'11 (3) System Configuration

The K computer Compute nodes

Number of CPUs > 80K Total memory capacity > 1PB User Tofu interconnect (6D mesh / torus network) Internet Pre/Post K computer IO Nodes Processing Front-end Servers Servers

Local file system (11PB~)

Control and Management NW

Global IO NW System System Management Control Servers Servers Global file system Job / User System (30PB~) Management Configuration

SC'11 (4) Compute Nodes • CPU : SPARC64TM VIIIfx (8 core) • Network : Tofu interconnect – 128 Gflops (6D mesh / torus network) – 2.0 GHz – provide logical 3D torus network for each job – 5 GB/s x 2 bandwidth of each link

• Memory : 16 GB x 2 5 GB/s

© Limited. (peak)

Compute node

CPU: 128GFLOPS SPARC64TM VIIIfx (8 cores) Core Core 5 GB/s x 2 (peak) Core 5 GB/s x 2 (peak) SIMD(4FMA)Core SIMD(4FMA)Core SIMD(4FMA)Core SIMD(4FMA)Core16GFlops SIMD(4FMA)Core16GFlopsSIMD(4FMA)16GFlops SIMD(4FMA)16GFlopsSIMD(4FMA)16GFlops 16GFlops16GFlops x 16GFLOPS L2$: 6MB

64 GB/s MEM: 16 GB

z x 2(peak) 5 GB/s

y (Image of 3D torus) © Fujitsu Limited. x SC'11 (5) CPU – SPARC64TM VIIIfx

• Superscalar Multi-core processor (8 cores) – 128 GFLOPS (16 GFLOPS/core) • 2 GHz – 256 FP registers (double precision) / core – 2 SIMD (FMA x 4) / core 45nm CMOS process Chip size : 22.7mm x 22.6mm # of transistors : 760M © Fujitsu Limited. • Shared 6MB L2$ (12way) Power : 58W (TYP, 30℃), Water Cooling – Software controllable cache (Sector cache) Specification • Memory throughput Performance (peak) 128 GFLOPS (16 GFLOPS x 8 cores) – L2$ - Main memory : 0.5B/FLOP (64GB/s) Core 8 Clock 2.0 GHz

• Hardware barrier FMA x 4 (2 SIMD) – High-speed synchronization of on-chip cores Floating-point DIVIDE x 2 Execution units COMPARE x 2 • Power consumption (Core spec) Floating-point register (64bit) : 256 – 58W at 30℃ by water-cooling General purpose register (64bit) : 188 L1I$ : 32 KB (2way) Cache L1D$ : 32 KB (2way) You can get more detailed information from ‘SPARC64TM VIIIfx Extensions’ L2$ : Shared 6 MB (12way) (http://img.jp.fujitsu.com/downloads/jp/jhpc/sparc64viiifx-extensions.pdf) Memory throughput 64 GB/s (0.5B/F) SC'11 (6) Tofu (Torus fusion) Interconnect • High communication performance and fault-tolerant network • Network topology : 6D mesh / torus network – Each node has 10 links (5 GB/s x 2 bandwidth) • 4 links for making a basic unit (2x3x2 mesh / torus) and 6 links for XYZ link (XZ:torus / Y:mesh) – Multi-path routing by a combination of XYZ link and 2x3x2 mesh / torus network enables to make a detour of faulty nodes • From user’s point of view, network topology of the job is 1,2 or 3D torus network. – This torus network is dynamically configured when the nodes are assigned to the job.

Z+ Y+

Z X- X+ Basic unit Y (2x3x2 mesh / torus) X Basic units form 3D torus. Neighboring basic units are connected by 12 links. Y- Z-

SC'11 (7) System environments

• OS: based OS on compute nodes

• File system : FEFS (Fujitsu Exabyte File System) based on file system Compute nodes – Two-level large-scale distributed file system • Local file system for temporary files used by jobs IO • Global file system for user’s permanent files Local File System – Staging functions Output • Stage-in files Input files on the global file system which are used in a job are copied on the File local file system. staging • Stage-out Input Output files generated during a job execution are moved back to the global file files system after the job finished. Global File System • Batch job-oriented system – Interactive environments are available for debugging.

SC'11 (8) Programming environments

• Programing languages and compilers – Fortran 95 and subset of Fortran 2003, XPFortran – C99 and C++2003 including several extensions to GNU C and C++ – Optimized compilers to obtain full capabilities of SPARC64TM VIIIfx • SIMD, 256 FP registers, programmable L2 cache (sector cache), and so on. – OpenMP 3.0 is supported

• MPI library based on MPI-2.1 specification – Low-latency and high-throughput – Collective communications are optimized for the Tofu interconnect • Bcast, Allgather, Alltoall, and Allreduce

• Numerical libraries – BLAS, LAPACK, FFTW, Fujitsu scientific numerical library SSL II, and so on will be provided.

SC'11 (9) Hardware Implementation • Compute rack houses : 796mm – System board x 24 (top:12, bottom:12) – IO system board x 6 – Power supply unit x 9 System Board x 12 Water-cooling SPARC64TM VIIIfx – Service processor board x 2 Pipes for (4 CPUs / 1 SB) – System disk (RAID 5) module Water-cooling

• Peak performance : 12.3TF / Rack IO System Board x 6

• Total memory : 1.5TB / Rack Power 2060mm Supply • Weight: 1,500Kg (max.) Unit x 9 Service Processor board x 2

System Disk

DDR3 Memory (16 GB x 4 / 1 SB)

© Fujitsu Limited. System Board x 12

ICC (Tofu NW LSI) System Board (4 ICCs / 1 SB) (24 boards / 1 rack) Rack © Fujitsu Limited. SC'11 (10) Hardware view

Full System

Racks Compute Rack × > 800 Compute Rack Compute Rack×8 System Board (SB) SB×24 Disk Rack×2 Node IOSB×6 Node×4 CPU×1 ICC×1 Memory > 10 PFLOPS > 1 PB 98.4 TFLOPS 12 TB 512 GFLOPS Disk Rack 128 GFLOPS 64 GB 16 GB 12.3 TFLOPS Compute Rack 1.5 TB SC'11 (11) Image of the K computer

864 racks are housed in the computer room.

SC'11 (12) Facilities for the K computer

SC'11 (13) Location of the K computer in AICS (Advanced Institute for Computational Science) was established at RIKEN in July, 2010.

Kobe Kyoto

Tokyo 450km (280miles) west from Tokyo

SC'11 (14) Layout of buildings

Research Building Computer Building

Chillers

Substation

SC'11 (15) Buildings and Cooling System Research Building Computer Building • Six-story above ground and one below • Three-story above ground and one below • Floor area : ~1,800 m2, ~9,000 m2 (total) • Floor area: ~4,300 m2, ~10,500 m2 (total)

Chillers (Area : 1900m2) Substation (Area : 200m2)

Absorption Centrifugal CGS (5MW) 30MW (max.) Refrigerating Chiller Water Chiller x2 77,000V (receiving) x4 x3 → 6,600V

SC'11 (16) Cooling System • K computer is cooled by air and water SB Parts Temperature Input : ~15℃ Compute racks Water CPU and ICC Output : ~17℃ … … Air the others Room temp. : ~20℃ … …

Heat exchanger Air handlers … …

Absorption Refrigerating Chillers & Centrifugal Water Chillers

SC'11 (17) Installation of the K computer Period of Installation : 2010.9 ~ 2011.8 50m

• No pillar Last racks were installed – flexible arrangement of computer racks at the end of Aug. 2011. – easy installation of network cables The computer room on the third floor of the computer building. 60m

First 8 racks installed on 30th Sep 2010.

SC'11 (18) Installation of the K computer (movie)

SC'11 (19) Thank you for your attention !

AICS Booth #112

A photo in the early evening SC'11 (20)