Fujitsu's Vision for High Performance Computing
Total Page:16
File Type:pdf, Size:1020Kb
FujitsuFujitsu’’ss VisionVision forfor HighHigh PerformancePerformance ComputingComputing October 26, 2004 Yuji Oinaga Fujitsu Ltd. 1 CONTENTSCONTENTS HPC Policy and Concept Current HPC Platform HPC2500 IA-Cluster Major HPC customers Road Map Toward Peta-Scale Computing 2 FUJITSUFUJITSU HPCHPC PolicyPolicy ~ Developing & Providing Top Level Supercomputer ~ - High Computational Power - High Speed Interconnect - Reliability/Availability - Leading Edge Semiconductor Technology - Highly Reliable, HPC Featured Operating System & Middleware - Sophisticated Compilers and Development Tools 3 HistoryHistory ofof HPCHPC PlatformPlatform PRIMEPOWER HPC2500 VPP5000 Vector Parallel VPP300/700 IA32/IPF Cluster PRIMEPOWER VPP500 2000 Vector VP2000 AP3000 VP Series AP1000 F230-75APU Scalar 1980 1985 1990 1995 2000 2005 4 ConceptConcept ofof FujitsuFujitsu HPCHPC ProvideProvide thethe best/fastestbest/fastest platformplatform forfor eacheach applications.applications. -Scalar SMP : PRIMEPOWER, New Linux SMP -IA Cluster : PRIMERGY Cluster ProvideProvide thethe highhigh speedspeed interconnect.interconnect. -Crossbar : High Speed Optical Interconnect -Multi-Stage : Infiniband ProvideProvide HighHigh Operability/UsabilityOperability/Usability -Enhancement of Operability/Usability : Extended Partitioning, Dynamic Reconfiguration -Cluster Middleware : Total Cluster Control System ( Parallelnavi ) Shared Rapid File System ( SRFS ) 5 FujitsuFujitsu’’ss CurrentCurrent HPCHPC PlatformPlatform ~ Fujitsu is a leading edge company in HPC field ~ SPARC/Solaris Large PRIMEPOWER HPC2500 IPF/Linux Cluster 128 CPU PRIMERGY 2.08GHz ~RXI series 128 Node Cluster IA32/Linux Cluster 2/4WAY SMP PRIMERGY Memory:Max.32GB RX series SPARC/Solaris Thin Server Blade Server Simulation PRIMEPOWER Size Enterprise Model 1/2WAY 2WAYx10 Memory:Max.6GB Memory:Max.6GBx10 SPARC/Solaris, IA/Linux PRIMERGY RXI CPU:14 Memory:Max.32GB CPU:16128 Memory:Max.512GB PRIMEPOWER Midrange Model CPU:216 Memory:Max,64GB Small Low Reliability, Operability, Ease of Maintenance High 6 PRIMEPOWERPRIMEPOWER HPC2500:HPC2500: ConfigurationConfiguration High Speed Optical Interconnect [System] -# of Node : Max.128 4GB/s x4 128Nodes -# of Processor : Max.16,384 -System Performance : Max.136Tflops -Memory Capacity : Max. 64TB SMP Node SMP Node SMP Node SMP Node 8128CPUs 8128CPUs 8128CPUs 8128CPUs [Node] -CPU Performance: 8.32Gflops(2.08GHz) -# of Processor : Max.128 -Node Performance: Max.1064.96Gflops -Memory Capacity : Max. 512GB -High Speed Interconnect Crossbar Network for Uniform Mem. Access (within node) : 4GB/s x 2(in/out) x4 PCI BOX <DTU Board> <System Board> <System Board> CPU CPU CPU CPU D D D D Channel Channel CPU CPU CPU CPU T T T T memory memory CPU CPU CPU CPU U U U U Adapter Adapter CPU CPU CPU CPU System Board x16 to Channels to High Speed Optical Interconnect DTU : Data Transfer Unit to I/O Device System Board 7 PRIMEPOWERPRIMEPOWER HPC2500:HPC2500: InterconnectInterconnect ~ Parallel Optical data transfer technology for higher scalability and performance ~ Crossbar Architecture Connects up to 128 nodes(16384 CPUs) Realizes 4GB/s x4 data throughput for each node Allows hundreds of node cabinets to be placed freely with 100m optical cables PAROLI:PARallel Optical LInk High Speed Optical Interconnect 4GB/s x4 128CPU 128CPU 128CPU Node Node Node Max. 128Nodes 8 Parallelnavi : Cluster Middleware Single system image for Job execution / File access Solaris Solaris Solaris Solaris Job :: File Sharing by SRFS High-Speed Interconnect NQS Solaris Solaris Single system image Cache Cache Cache job execution by NQS batch system System User User System User User System User User Disk Disk Disk Disk Disk Disk Disk Disk Disk 9 SRFS(Shared Rapid File System) File Sharing in large cluster system High performance/Single system image file access Solaris Solaris Solaris application application application SRFS client SRFS client SRFS client High speed File Sharing Assure in large File consistency interconnect cluster system via parallel file access Cache Interconnect(striping) Fast I/O -DTU with large cache SRFS sever -InfiniBand - Gigabit Ethernet Fast Parallel I/O Fail over with Primecluster redundant GDS/GFS configuration 10 LanguageLanguage SystemSystem Programming Model on Cluster System [Hybrid Model] [Flat Model] Interconnect Interconnect Node0 Node1 Node0 Node1 Process Process T T T T T T T T P P P P P P P P CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Inter Node: Parallel by Process Inter Node: Parallel by Process Intra Node: Parallel by Threads Intra Node: Parallel by Process P : ProcessT : Thread Inter-Process Communication Fujitsu Language System for Cluster Programming Model/ Inter-Node Intra-Node Process Parallel Method Data Parallel HPF, XPF (Fujitsu) OpenMP Hybrid Compiler Auto Model Message Passing MPI Parallelization Flat Data Parallel HPF, XPF (Fujitsu) Model Message Passing MPI 11 IAIA--Cluster:Cluster: ConfigurationConfiguration Compute Node System Configuration - FUJITSU PRIMERGY (1U) Control Node Xeon(~3.2GHz): 2CPU -PRIMERGY BX600 Max. 10 blades in a 7U chassis Compute Nodes Xeon(~3.2GHz): 2CPU - PRIMERGY RXI300/RXI600 Itanium2(1.5GHz): 2/4CPU Giga Ethernet SWITCH Compute Network Switch - InfiniBand (8Gbps x 2 ports) HCA Card: Fujitsu’s own development 133MHz, 64bit PCI-X bus Software Interconnect supported -OS : Linux - InfiniBand - Myrinet2000 (2+2Gbps) -Myrinet -Middleware: SCore/Beowulf - Gigabit Ethernet *SCore is PC Cluster Control System developed by PCCC(PC Cluster Consortium in Japan). 12 Performance Example: SRFS - SRFS performance on HPC2500 - - Write - (On User Site: JAXA) 4GB/s Local FS Measurement apl. Client SRFS High Speed Optical & & &SRFS & & !& Transfer block !! !! Interconnect "#" % ! !! !! ! &' & ' ! 4GB/sec. size !" 640MB - Read - SRFS Server 4GB/s (QFS) Local File System Local FS Stripe No. Stripe Width SRFS !"#! $ $ $ !" Achieved high performance (Over 3GB/s) with optical interconnect 13 JAXAJAXA :: LargeLarge SMPSMP SystemSystem 9.3 Tflops peak performance, 5.406Tflop/s with Linpack 56 compute nodes, 32 CPUs per node at 5.2Gflop/s per CPU Fujitsu PRIMEPOWER HPC2500 * 14 (18 in total) 3.6 TB of main memory 57 TB total disk space, 620 TB tape library Disk Compute nodes 57TB FC RAID 32way SMP56 XBar Network Tape library I/O node Service/Login nodes 620TB 128way SMP 64way SMP6 14 KyotoKyoto University:University: HPC2500HPC2500 SystemSystem Supercomputer PRIMEPOWER HPC2500 [Configuration] 9.185Tflops, Memory:5.75TB RAID [PRIMEPOWER HPC2500] ETERNUS6000 Model600 PRIMEPOWER PRIMEPOWER PRIMEPOWER - 128CPU/Node 11Cabinets HPC2500 HPC2500 HPC2500 8.0TB(RAID5) Compute Nodes - 64CPU/Node 1Cabinet I/O Node PRIMEPOWER HPC2500 PRIMEPOWER HPC2500(IO Node) [HPC2500 at Kyoto University] Tape Library PRIMEPOWER High Speed PRIMEPOWER HPC2500 Optical Interconnect HPC2500 Network Router PRIMEPOWER PRIMEPOWER HPC2500 HPC2500 PRIMEPOWER PRIMEPOWER PRIMEPOWER HPC2500 HPC2500 HPC2500 Pre-postPre-post operationoperation 15 RIKEN:RIKEN: IAIA--ClusterCluster SystemSystem IAIA ClusterCluster 11 512Node/1024CPU (CPU:Xeon 3.06GHz) Performance : 6.2Tflops User Interconnect: InfiniBand(8Gbps) Firewall/VPNFirewall/VPN Total:Total:1024Node/2048CPU 1024Node/2048CPU Gigabit Ethernet DeviceDevice TotalTotal PerformancePerformance ::12.5Tflops12.5Tflops Network 512Node/1024CPU HPSSHPSS SystemSystem 128Node/256CPU] x 4 System LargeLarge MemoryMemory SystemSystem Performance: 6.2Tflops ArchiveArchive SystemSystem Interconnect: Myrinet(2Gbps) 20TB : InfiniBand(8Gbps) FileFile ServerServer SAN (Storage Area Network) IAIA ClusterCluster 22 Front-endFront-end ServerServer 16 HardwareHardware RoadRoad MapMap SPARC /Solaris PRIMEPOWER 90nm Process PostPost--HPC2500HPC2500 Leading-Edge CPU HPC2500 SPARC64 V SPARC64 VI SPARC64 V 2.08GHz/$4MB UltraSPARC 1.3GHz/$2MB IPF(IA64) /Linux PRIMERGY NewNew LinuxLinux SMPSMP IPF System [2or4cpu] Montevale Montecito 2Core Itanium (R) 2 1.5GHz IA32/Linux PRIMERGY Perf-Blade EM64T Enhancement according to 2or4cpu/Node Intel’s Road Map Xeon 3.2GHz 2004 2005 2006 2007 CY 17 SoftwareSoftware RoadRoad MapMap SPARC /Solaris PRIMEPOWER PostPost--HPC2500HPC2500 HPC2500 ParallelnaviParallelnavi ParallelnaviParallelnavi ParallelnaviParallelnavi 2.62.6 Parallelnavi 2.52.5 Parallelnavi 2.42.4 2.32.3 IPF(IA64) /Linux PRIMERGY NewNew LinuxLinux SMPSMP IPF System [2or4cpu/Node] SCore/BeowulfSCore/Beowulf IA32/Linux PRIMERGY SRFS , Parallelnavi Perf-Blade 2or4cpu/Node Score/BeowulfScore/Beowulf 2004 2005 2006 2007 CY 18 FujitsuFujitsu’’ss HPCHPC SoftwareSoftware StructureStructure Fujitsu’s HPC Software Products Compiler/MPL Programming tool Math Library Fortran, C, C++ SSL II, C-SSL II Parallelnavi Language (Serial, OpenMP, Auto-Parallel) BLAS,LAPACK Workbench XPF, MPI ScaLAPack Job Management /HPC Environment Fast Network File System Cluster Management Parallelnavi* Blast Band SRFS Compute Grid Engine Data Grid Engine Systemwalker CyberGrip Interstage Shunsaku DataManager GRID Resource Control Sysytemwalker Resource Coordinator/Service Quality Coordinator OS Solaris OE/Linux * Parallelnavi also includes language product 19 IntegratedIntegrated HPCHPC EnvironmentEnvironment Support heterogeneous platform Single system image Integrated user interface Integrated file system Grid enable platform User Parallelnavi : System Management SRFS : Cluster File System High speed network Grid MW InfiniBand IA/Linux GigaEther