FujitsuFujitsu’’ss VisionVision forfor HighHigh PerformancePerformance ComputingComputing

October 26, 2004 Yuji Oinaga Ltd.

1 CONTENTSCONTENTS

 HPC Policy and Concept  Current HPC Platform  HPC2500  IA-Cluster  Major HPC customers  Road Map  Toward Peta-Scale Computing

2 FUJITSUFUJITSU HPCHPC PolicyPolicy

~ Developing & Providing Top Level ~

- High Computational Power - High Speed Interconnect - Reliability/Availability - Leading Edge Semiconductor Technology - Highly Reliable, HPC Featured Operating System & Middleware - Sophisticated Compilers and Development Tools

3 HistoryHistory ofof HPCHPC PlatformPlatform

PRIMEPOWER HPC2500

VPP5000 Vector Parallel VPP300/700 IA32/IPF Cluster PRIMEPOWER VPP500 2000

Vector VP2000 AP3000 VP Series

AP1000 F230-75APU Scalar

1980 1985 1990 1995 2000 2005 4 ConceptConcept ofof FujitsuFujitsu HPCHPC

ProvideProvide thethe best/fastestbest/fastest platformplatform forfor eacheach applications.applications. -Scalar SMP : PRIMEPOWER, New Linux SMP -IA Cluster : Cluster

ProvideProvide thethe highhigh speedspeed interconnect.interconnect. -Crossbar : High Speed Optical Interconnect -Multi-Stage : Infiniband

ProvideProvide HighHigh Operability/UsabilityOperability/Usability -Enhancement of Operability/Usability : Extended Partitioning, Dynamic Reconfiguration -Cluster Middleware : Total Cluster Control System ( Parallelnavi ) Shared Rapid File System ( SRFS )

5 FujitsuFujitsu’’ss CurrentCurrent HPCHPC PlatformPlatform ~ Fujitsu is a leading edge company in HPC field ~ SPARC/Solaris Large PRIMEPOWER HPC2500 IPF/Linux Cluster 128 CPU PRIMERGY 2.08GHz ~RXI series 128 Node Cluster IA32/Linux Cluster 2/4WAY SMP PRIMERGY Memory:Max.32GB RX series SPARC/Solaris Thin Server Blade Server Simulation PRIMEPOWER Size Enterprise Model 

1/2WAY 2WAYx10 Memory:Max.6GB Memory:Max.6GBx10 SPARC/Solaris, IA/Linux PRIMERGY RXI CPU:14 Memory:Max.32GB CPU:16128 Memory:Max.512GB PRIMEPOWER Midrange Model 

CPU:216 Memory:Max,64GB Small Low Reliability, Operability, Ease of Maintenance High 6 PRIMEPOWERPRIMEPOWER HPC2500:HPC2500: ConfigurationConfiguration

High Speed Optical Interconnect [System] -# of Node : Max.128 4GB/s x4 128Nodes -# of Processor : Max.16,384 -System Performance : Max.136Tflops -Memory Capacity : Max. 64TB SMP Node SMP Node  SMP Node SMP Node 8128CPUs 8128CPUs 8128CPUs 8128CPUs [Node]  -CPU Performance: 8.32Gflops(2.08GHz) -# of Processor : Max.128 -Node Performance: Max.1064.96Gflops -Memory Capacity : Max. 512GB -High Speed Interconnect Crossbar Network for Uniform Mem. Access (within node) : 4GB/s x 2(in/out) x4

PCI BOX

CPU CPU CPU CPU D D D D Channel Channel CPU CPU CPU CPU T T T T memory  memory CPU CPU CPU CPU U U U U  Adapter Adapter CPU CPU CPU CPU

System Board x16 to Channels to High Speed Optical Interconnect DTU : Data Transfer Unit to I/O Device System Board

7 PRIMEPOWERPRIMEPOWER HPC2500:HPC2500: InterconnectInterconnect

~ Parallel Optical data transfer technology for higher scalability and performance ~

Crossbar Architecture Connects up to 128 nodes(16384 CPUs) Realizes 4GB/s x4 data throughput for each node Allows hundreds of node cabinets to be placed freely with 100m optical cables

PAROLI:PARallel Optical LInk High Speed Optical Interconnect   4GB/s x4

128CPU 128CPU  128CPU Node Node Node

Max. 128Nodes 8 Parallelnavi : Cluster Middleware Single system image for Job execution / File access

Solaris Solaris Solaris Solaris

Job  ::

File Sharing by SRFS

High-Speed Interconnect

NQS Solaris Solaris Single system image Cache Cache Cache job execution by NQS batch system

System User User System User User System User User Disk Disk Disk Disk Disk Disk Disk Disk Disk

9 SRFS(Shared Rapid File System) File Sharing in large cluster system High performance/Single system image file access

Solaris Solaris Solaris application application application

SRFS client SRFS client SRFS client

High speed File Sharing Assure in large File consistency interconnect cluster system via parallel file access Cache Interconnect(striping) Fast I/O -DTU with large cache SRFS sever -InfiniBand - Gigabit Ethernet Fast Parallel I/O Fail over with Primecluster redundant GDS/GFS configuration 10 LanguageLanguage SystemSystem  Programming Model on Cluster System [Hybrid Model] [Flat Model]

Interconnect Interconnect Node0 Node1 Node0 Node1

Process Process T T T T T T T T P P P P P P P P CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

 Inter Node: Parallel by Process  Inter Node: Parallel by Process  Intra Node: Parallel by Threads  Intra Node: Parallel by Process

P : ProcessT : Thread Inter-Process Communication  Fujitsu Language System for Cluster Programming Model/ Inter-Node Intra-Node Process Parallel Method Data Parallel HPF, XPF (Fujitsu) OpenMP Hybrid Compiler Auto Model Message Passing MPI Parallelization Flat Data Parallel HPF, XPF (Fujitsu) Model Message Passing MPI 11 IAIA--Cluster:Cluster: ConfigurationConfiguration

 Compute Node System Configuration - FUJITSU PRIMERGY (1U) Control Node (~3.2GHz): 2CPU -PRIMERGY BX600 Max. 10 blades in a 7U chassis Compute Nodes Xeon(~3.2GHz): 2CPU - PRIMERGY RXI300/RXI600 Itanium2(1.5GHz): 2/4CPU

Giga Ethernet SWITCH  Compute Network Switch - InfiniBand (8Gbps x 2 ports) HCA Card: Fujitsu’s own development 133MHz, 64bit PCI-X bus  Software Interconnect supported -OS : Linux - InfiniBand - Myrinet2000 (2+2Gbps) -Myrinet -Middleware: SCore/Beowulf - Gigabit Ethernet

*SCore is PC Cluster Control System developed by PCCC(PC Cluster Consortium in Japan). 12 Performance Example: SRFS - SRFS performance on HPC2500 - - Write - (On User Site: JAXA)  4GB/s Local FS    Measurement apl.                 Client  

 SRFS High Speed Optical    & & &SRFS  & & !& Transfer block         !! !!   Interconnect "#" %  ! !! !!   ! &' & '  ! 4GB/sec. size   !" 640MB - Read -  SRFS Server  4GB/s   (QFS)   Local File System        Local FS         

 Stripe No.   Stripe Width                        SRFS   !"#!      $  $  $       !" Achieved high performance (Over 3GB/s) with optical interconnect 13 JAXAJAXA :: LargeLarge SMPSMP SystemSystem  9.3 Tflops peak performance, 5.406Tflop/s with Linpack  56 compute nodes, 32 CPUs per node at 5.2Gflop/s per CPU  Fujitsu PRIMEPOWER HPC2500 * 14 (18 in total)  3.6 TB of main memory  57 TB total disk space, 620 TB tape library

Disk Compute nodes 57TB FC RAID 32way SMP56

XBar Network

Tape library I/O node Service/Login nodes 620TB 128way SMP 64way SMP6 14 KyotoKyoto University:University: HPC2500HPC2500 SystemSystem

Supercomputer PRIMEPOWER HPC2500 [Configuration] 9.185Tflops, Memory:5.75TB RAID [PRIMEPOWER HPC2500] ETERNUS6000 Model600 PRIMEPOWER PRIMEPOWER PRIMEPOWER - 128CPU/Node 11Cabinets HPC2500 HPC2500 HPC2500 8.0TB(RAID5) Compute Nodes - 64CPU/Node 1Cabinet I/O Node PRIMEPOWER HPC2500 PRIMEPOWER HPC2500(IO Node)

[HPC2500 at Kyoto University] Tape Library PRIMEPOWER High Speed PRIMEPOWER HPC2500 Optical Interconnect HPC2500

Network Router

PRIMEPOWER PRIMEPOWER HPC2500 HPC2500

PRIMEPOWER PRIMEPOWER PRIMEPOWER HPC2500 HPC2500 HPC2500 Pre-postPre-post operationoperation 15 RIKEN:RIKEN: IAIA--ClusterCluster SystemSystem

IAIA ClusterCluster 11

512Node/1024CPU (CPU:Xeon 3.06GHz) Performance : 6.2Tflops User Interconnect: InfiniBand(8Gbps)

Firewall/VPNFirewall/VPN Total:Total:1024Node/2048CPU 1024Node/2048CPU Gigabit Ethernet DeviceDevice TotalTotal PerformancePerformance ::12.5Tflops12.5Tflops Network

512Node/1024CPU HPSSHPSS SystemSystem 128Node/256CPU] x 4 System LargeLarge MemoryMemory SystemSystem Performance: 6.2Tflops ArchiveArchive SystemSystem Interconnect: Myrinet(2Gbps) 20TB : InfiniBand(8Gbps) FileFile ServerServer SAN (Storage Area Network)

IAIA ClusterCluster 22 Front-endFront-end ServerServer 16 HardwareHardware RoadRoad MapMap

SPARC /Solaris PRIMEPOWER 90nm Process PostPost--HPC2500HPC2500 Leading-Edge CPU HPC2500 SPARC64 V SPARC64 VI SPARC64 V 2.08GHz/$4MB UltraSPARC 1.3GHz/$2MB IPF(IA64) /Linux PRIMERGY NewNew LinuxLinux SMPSMP IPF System  [2or4cpu] Montevale  Montecito 2Core Itanium (R) 2 1.5GHz  IA32/Linux PRIMERGY Perf-Blade EM64T Enhancement according to 2or4cpu/Node ’s Road Map Xeon 3.2GHz

2004 2005 2006 2007 CY 17 SoftwareSoftware RoadRoad MapMap

SPARC /Solaris PRIMEPOWER PostPost--HPC2500HPC2500

HPC2500 ParallelnaviParallelnavi ParallelnaviParallelnavi ParallelnaviParallelnavi 2.62.6 Parallelnavi 2.52.5 Parallelnavi 2.42.4 2.32.3

IPF(IA64) /Linux PRIMERGY NewNew LinuxLinux SMPSMP IPF System [2or4cpu/Node] SCore/BeowulfSCore/Beowulf IA32/Linux PRIMERGY SRFS , Parallelnavi Perf-Blade 2or4cpu/Node Score/BeowulfScore/Beowulf

2004 2005 2006 2007 CY 18 FujitsuFujitsu’’ss HPCHPC SoftwareSoftware StructureStructure

Fujitsu’s HPC Software Products

Compiler/MPL Programming tool Math Library Fortran, C, C++ SSL II, C-SSL II Parallelnavi Language (Serial, OpenMP, Auto-Parallel) BLAS,LAPACK Workbench XPF, MPI ScaLAPack

Job Management /HPC Environment Fast Network File System Cluster Management Parallelnavi* Blast Band SRFS

Compute Grid Engine Data Grid Engine Systemwalker CyberGrip Interstage Shunsaku DataManager GRID Resource Control Sysytemwalker Resource Coordinator/Service Quality Coordinator

OS Solaris OE/Linux

* Parallelnavi also includes language product 19 IntegratedIntegrated HPCHPC EnvironmentEnvironment  Support heterogeneous platform Single system image  Integrated user interface  Integrated file system  Grid enable platform User Parallelnavi : System Management SRFS : Cluster File System

High speed network Grid MW InfiniBand IA/Linux GigaEther File server

SPARC/Solaris IA/Linux

Interconnect Grid computing resources 20 TowardToward PetaPeta--ScaleScale Computing(2010)Computing(2010)

Achieving practical Peta-Scale computing, Fujitsu’s goals are 3Pflops at peak and 1Pflops at sustain. We need high-performance CPU, low power consumption, faster memory access and interconnect. And also, high reliable, intelligent and more than 10K node connection capability, and cluster software with highly parallel system management capability.

ComputingComputing Node NodeComputingComputing Node Node   5005001,000Gflops1,000Gflops 1K10K Nodes 5005001,000Gflops1,000Gflops 100GB/s   class I/F  LeafLeaf SWSW LeafLeaf SWSW 

SpineSpine SW SW

Fujitsu organized Peta-Scale Computing Research Center on October 2004. 21 22