FujitsuFujitsu’’ss VisionVision forfor HighHigh PerformancePerformance ComputingComputing
October 26, 2004 Yuji Oinaga Fujitsu Ltd.
1 CONTENTSCONTENTS
HPC Policy and Concept Current HPC Platform HPC2500 IA-Cluster Major HPC customers Road Map Toward Peta-Scale Computing
2 FUJITSUFUJITSU HPCHPC PolicyPolicy
~ Developing & Providing Top Level Supercomputer ~
- High Computational Power - High Speed Interconnect - Reliability/Availability - Leading Edge Semiconductor Technology - Highly Reliable, HPC Featured Operating System & Middleware - Sophisticated Compilers and Development Tools
3 HistoryHistory ofof HPCHPC PlatformPlatform
PRIMEPOWER HPC2500
VPP5000 Vector Parallel VPP300/700 IA32/IPF Cluster PRIMEPOWER VPP500 2000
Vector VP2000 AP3000 VP Series
AP1000 F230-75APU Scalar
1980 1985 1990 1995 2000 2005 4 ConceptConcept ofof FujitsuFujitsu HPCHPC
ProvideProvide thethe best/fastestbest/fastest platformplatform forfor eacheach applications.applications. -Scalar SMP : PRIMEPOWER, New Linux SMP -IA Cluster : PRIMERGY Cluster
ProvideProvide thethe highhigh speedspeed interconnect.interconnect. -Crossbar : High Speed Optical Interconnect -Multi-Stage : Infiniband
ProvideProvide HighHigh Operability/UsabilityOperability/Usability -Enhancement of Operability/Usability : Extended Partitioning, Dynamic Reconfiguration -Cluster Middleware : Total Cluster Control System ( Parallelnavi ) Shared Rapid File System ( SRFS )
5 FujitsuFujitsu’’ss CurrentCurrent HPCHPC PlatformPlatform ~ Fujitsu is a leading edge company in HPC field ~ SPARC/Solaris Large PRIMEPOWER HPC2500 IPF/Linux Cluster 128 CPU PRIMERGY 2.08GHz ~RXI series 128 Node Cluster IA32/Linux Cluster 2/4WAY SMP PRIMERGY Memory:Max.32GB RX series SPARC/Solaris Thin Server Blade Server Simulation PRIMEPOWER Size Enterprise Model
1/2WAY 2WAYx10 Memory:Max.6GB Memory:Max.6GBx10 SPARC/Solaris, IA/Linux PRIMERGY RXI CPU:14 Memory:Max.32GB CPU:16128 Memory:Max.512GB PRIMEPOWER Midrange Model
CPU:216 Memory:Max,64GB Small Low Reliability, Operability, Ease of Maintenance High 6 PRIMEPOWERPRIMEPOWER HPC2500:HPC2500: ConfigurationConfiguration
High Speed Optical Interconnect [System] -# of Node : Max.128 4GB/s x4 128Nodes -# of Processor : Max.16,384 -System Performance : Max.136Tflops -Memory Capacity : Max. 64TB SMP Node SMP Node SMP Node SMP Node 8128CPUs 8128CPUs 8128CPUs 8128CPUs [Node] -CPU Performance: 8.32Gflops(2.08GHz) -# of Processor : Max.128 -Node Performance: Max.1064.96Gflops -Memory Capacity : Max. 512GB -High Speed Interconnect Crossbar Network for Uniform Mem. Access (within node) : 4GB/s x 2(in/out) x4
PCI BOX
CPU CPU CPU CPU D D D D Channel Channel CPU CPU CPU CPU T T T T memory memory CPU CPU CPU CPU U U U U Adapter Adapter CPU CPU CPU CPU
System Board x16 to Channels to High Speed Optical Interconnect DTU : Data Transfer Unit to I/O Device System Board
7 PRIMEPOWERPRIMEPOWER HPC2500:HPC2500: InterconnectInterconnect
~ Parallel Optical data transfer technology for higher scalability and performance ~
Crossbar Architecture Connects up to 128 nodes(16384 CPUs) Realizes 4GB/s x4 data throughput for each node Allows hundreds of node cabinets to be placed freely with 100m optical cables
PAROLI:PARallel Optical LInk High Speed Optical Interconnect 4GB/s x4
128CPU 128CPU 128CPU Node Node Node
Max. 128Nodes 8 Parallelnavi : Cluster Middleware Single system image for Job execution / File access
Solaris Solaris Solaris Solaris
Job ::
File Sharing by SRFS
High-Speed Interconnect
NQS Solaris Solaris Single system image Cache Cache Cache job execution by NQS batch system
System User User System User User System User User Disk Disk Disk Disk Disk Disk Disk Disk Disk
9 SRFS(Shared Rapid File System) File Sharing in large cluster system High performance/Single system image file access
Solaris Solaris Solaris application application application
SRFS client SRFS client SRFS client
High speed File Sharing Assure in large File consistency interconnect cluster system via parallel file access Cache Interconnect(striping) Fast I/O -DTU with large cache SRFS sever -InfiniBand - Gigabit Ethernet Fast Parallel I/O Fail over with Primecluster redundant GDS/GFS configuration 10 LanguageLanguage SystemSystem Programming Model on Cluster System [Hybrid Model] [Flat Model]
Interconnect Interconnect Node0 Node1 Node0 Node1
Process Process T T T T T T T T P P P P P P P P CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU
Inter Node: Parallel by Process Inter Node: Parallel by Process Intra Node: Parallel by Threads Intra Node: Parallel by Process
P : ProcessT : Thread Inter-Process Communication Fujitsu Language System for Cluster Programming Model/ Inter-Node Intra-Node Process Parallel Method Data Parallel HPF, XPF (Fujitsu) OpenMP Hybrid Compiler Auto Model Message Passing MPI Parallelization Flat Data Parallel HPF, XPF (Fujitsu) Model Message Passing MPI 11 IAIA--Cluster:Cluster: ConfigurationConfiguration
Compute Node System Configuration - FUJITSU PRIMERGY (1U) Control Node Xeon(~3.2GHz): 2CPU -PRIMERGY BX600 Max. 10 blades in a 7U chassis Compute Nodes Xeon(~3.2GHz): 2CPU - PRIMERGY RXI300/RXI600 Itanium2(1.5GHz): 2/4CPU
Giga Ethernet SWITCH Compute Network Switch - InfiniBand (8Gbps x 2 ports) HCA Card: Fujitsu’s own development 133MHz, 64bit PCI-X bus Software Interconnect supported -OS : Linux - InfiniBand - Myrinet2000 (2+2Gbps) -Myrinet -Middleware: SCore/Beowulf - Gigabit Ethernet
*SCore is PC Cluster Control System developed by PCCC(PC Cluster Consortium in Japan). 12 Performance Example: SRFS - SRFS performance on HPC2500 - - Write - (On User Site: JAXA) 4GB/s Local FS Measurement apl. Client
SRFS High Speed Optical & & &SRFS & & !& Transfer block !! !! Interconnect "#" % ! !! !! ! &' & ' ! 4GB/sec. size !" 640MB - Read - SRFS Server 4GB/s (QFS) Local File System Local FS
Stripe No. Stripe Width SRFS !"#! $ $ $ !" Achieved high performance (Over 3GB/s) with optical interconnect 13 JAXAJAXA :: LargeLarge SMPSMP SystemSystem 9.3 Tflops peak performance, 5.406Tflop/s with Linpack 56 compute nodes, 32 CPUs per node at 5.2Gflop/s per CPU Fujitsu PRIMEPOWER HPC2500 * 14 (18 in total) 3.6 TB of main memory 57 TB total disk space, 620 TB tape library
Disk Compute nodes 57TB FC RAID 32way SMP56
XBar Network
Tape library I/O node Service/Login nodes 620TB 128way SMP 64way SMP6 14 KyotoKyoto University:University: HPC2500HPC2500 SystemSystem
Supercomputer PRIMEPOWER HPC2500 [Configuration] 9.185Tflops, Memory:5.75TB RAID [PRIMEPOWER HPC2500] ETERNUS6000 Model600 PRIMEPOWER PRIMEPOWER PRIMEPOWER - 128CPU/Node 11Cabinets HPC2500 HPC2500 HPC2500 8.0TB(RAID5) Compute Nodes - 64CPU/Node 1Cabinet I/O Node PRIMEPOWER HPC2500 PRIMEPOWER HPC2500(IO Node)
[HPC2500 at Kyoto University] Tape Library PRIMEPOWER High Speed PRIMEPOWER HPC2500 Optical Interconnect HPC2500
Network Router
PRIMEPOWER PRIMEPOWER HPC2500 HPC2500
PRIMEPOWER PRIMEPOWER PRIMEPOWER HPC2500 HPC2500 HPC2500 Pre-postPre-post operationoperation 15 RIKEN:RIKEN: IAIA--ClusterCluster SystemSystem
IAIA ClusterCluster 11
512Node/1024CPU (CPU:Xeon 3.06GHz) Performance : 6.2Tflops User Interconnect: InfiniBand(8Gbps)
Firewall/VPNFirewall/VPN Total:Total:1024Node/2048CPU 1024Node/2048CPU Gigabit Ethernet DeviceDevice TotalTotal PerformancePerformance ::12.5Tflops12.5Tflops Network
512Node/1024CPU HPSSHPSS SystemSystem 128Node/256CPU] x 4 System LargeLarge MemoryMemory SystemSystem Performance: 6.2Tflops ArchiveArchive SystemSystem Interconnect: Myrinet(2Gbps) 20TB : InfiniBand(8Gbps) FileFile ServerServer SAN (Storage Area Network)
IAIA ClusterCluster 22 Front-endFront-end ServerServer 16 HardwareHardware RoadRoad MapMap
SPARC /Solaris PRIMEPOWER 90nm Process PostPost--HPC2500HPC2500 Leading-Edge CPU HPC2500 SPARC64 V SPARC64 VI SPARC64 V 2.08GHz/$4MB UltraSPARC 1.3GHz/$2MB IPF(IA64) /Linux PRIMERGY NewNew LinuxLinux SMPSMP IPF System [2or4cpu] Montevale Montecito 2Core Itanium (R) 2 1.5GHz IA32/Linux PRIMERGY Perf-Blade EM64T Enhancement according to 2or4cpu/Node Intel’s Road Map Xeon 3.2GHz
2004 2005 2006 2007 CY 17 SoftwareSoftware RoadRoad MapMap
SPARC /Solaris PRIMEPOWER PostPost--HPC2500HPC2500
HPC2500 ParallelnaviParallelnavi ParallelnaviParallelnavi ParallelnaviParallelnavi 2.62.6 Parallelnavi 2.52.5 Parallelnavi 2.42.4 2.32.3
IPF(IA64) /Linux PRIMERGY NewNew LinuxLinux SMPSMP IPF System [2or4cpu/Node] SCore/BeowulfSCore/Beowulf IA32/Linux PRIMERGY SRFS , Parallelnavi Perf-Blade 2or4cpu/Node Score/BeowulfScore/Beowulf
2004 2005 2006 2007 CY 18 FujitsuFujitsu’’ss HPCHPC SoftwareSoftware StructureStructure
Fujitsu’s HPC Software Products
Compiler/MPL Programming tool Math Library Fortran, C, C++ SSL II, C-SSL II Parallelnavi Language (Serial, OpenMP, Auto-Parallel) BLAS,LAPACK Workbench XPF, MPI ScaLAPack
Job Management /HPC Environment Fast Network File System Cluster Management Parallelnavi* Blast Band SRFS
Compute Grid Engine Data Grid Engine Systemwalker CyberGrip Interstage Shunsaku DataManager GRID Resource Control Sysytemwalker Resource Coordinator/Service Quality Coordinator
OS Solaris OE/Linux
* Parallelnavi also includes language product 19 IntegratedIntegrated HPCHPC EnvironmentEnvironment Support heterogeneous platform Single system image Integrated user interface Integrated file system Grid enable platform User Parallelnavi : System Management SRFS : Cluster File System
High speed network Grid MW InfiniBand IA/Linux GigaEther File server
SPARC/Solaris IA/Linux
Interconnect Grid computing resources 20 TowardToward PetaPeta--ScaleScale Computing(2010)Computing(2010)
Achieving practical Peta-Scale computing, Fujitsu’s goals are 3Pflops at peak and 1Pflops at sustain. We need high-performance CPU, low power consumption, faster memory access and interconnect. And also, high reliable, intelligent and more than 10K node connection capability, and cluster software with highly parallel system management capability.
ComputingComputing Node NodeComputingComputing Node Node 5005001,000Gflops1,000Gflops 1K10K Nodes 5005001,000Gflops1,000Gflops 100GB/s class I/F LeafLeaf SWSW LeafLeaf SWSW
SpineSpine SW SW
Fujitsu organized Peta-Scale Computing Research Center on October 2004. 21 22