How to Use FX10 (Oakleaf-FX)
Total Page:16
File Type:pdf, Size:1020Kb
2016/05/30 How to use FX10, Parallel Numerical Algorithms 2016 1 Satoshi OHSHIMA (Assistant professor, Supercomputing Research Division, Information Technology Center, The University of Tokyo) 2016/05/30 2 1. Introduction of FX10 (Oakleaf-FX) – Introduction of SCD/ITC, UTokyo – System overview (Hardware, Software, and Services) 2. How to use Oakleaf-FX – First step to login – How to use “job management system” 3. Optimization Techniques Q&A 2016/05/30 3 2016/05/30 4 • Oakleaf- FX? FX10? – product name = FUJITSU PRIMEHPC FX10 • commercial version of “K” computer – nickname = Oakleaf-FX • Oakleaf- FX is installed in Information Technology Center, The University of Tokyo (ITC, UTokyo) 2016/05/30 5 Kashiwa Campus 「柏の葉」area oak leaf Oakleaf-FX Hongo Campus Komaba Campus Oakbridge-FX Yayoi 2016/05/30 6 • Campus/Nation-wide Services on Infrastructure for Information, related Research & Education • Established in 1999 – Campus-wide Communication & Computation Division – Digital Library/Academic Information Science Division – Network Division – Supercomputing Division • Core Institute of Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN) since 2010 • Key Institute of HPCI (HPC Infrastructure) 2016/05/30 7 http://www.cc.u-tokyo.ac.jp/ • 11 Faculty Members + 8 Technical Staff – System Software, Numerical Library, Applications, GPU, etc. • History – Supercomputing Center, UTokyo (1965~1999) • Oldest Academic Supercomputer Center in Japan • Nation-Wide, Joint-Use Facility – Information Technology Center (1999~) (4 divisions) • Services & Operations, Research, Education 2016/05/30 8 http://www.cc.u-tokyo.ac.jp/ • Collaboration with Users – Linear Solvers, Parallel Vis., Performance Tuning • Research Projects – FP3C (collab. with French Institutes) (FY.2010-2013) • Tsukuba, Tokyo Tech, Kyoto – Feasibility Study of Advanced HPC in Japan (towards Japanese Exascale Project) (FY.2012-2013) • 1 of 4 Teams: General Purpose Processors, Latency Cores – ppOpen -HPC (FY.2011-2015) – Post K with RIKEN AICS (FY.2014-) – ESSEX-II (FY.2016-2018): German-Japan Collaboration • International Collaborations – Lawrence Berkeley National Laboratory (USA) – National Taiwan University (Taiwan) – National Central University (Taiwan) – Intel Parallel Computing Center – ESSEX-II/SPPEXA/DFG (Germany) 2016/05/30 99 FY 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 Hitachi SR16000/M1 Hitachi SR11000/J2 based on IBM Power-7 18.8TFLOPS, 16.4TB 54.9 TFLOPS, 11.2 TB Fat nodes with large memory Our last SMP, to be switched to MPP Hitachi HA8000 (T2K) 140TFLOPS, 31.3TB today (Flat) MPI, good comm. performance Fujitsu PRIMEHPC FX10 based on SPARC64 IXfx 1.13 PFLOPS, 150 TB 2 big systems Turning point to Hybrid Parallel Prog. Model 6year cycle Post T2K 25+ PFLOPS Peta 京(=K) Initial Plan 2016/05/30 10 Oakleaf-fx (retired, March 2014) (Fujitsu PRIMEHPC FX10) Total Peak performance : 1.13 PFLOPS Yayoi T2K-Todai Total number of nodes : 4800 (Hitachi SR16000/M1) (Hitachi HA8000-tc/RS425 ) Total memory : 150 TB Total Peak performance : 54.9 TFLOPS Total Peak performance : 140 TFLOPS Peak performance / node : 236.5 GFLOPS Total number of nodes : 56 Total number of nodes : 952 Main memory per node : 32 GB Total memory : 11200 GB Total memory : 32000 GB Disk capacity : 1.1 PB + 2.1 PB Peak performance / node : 980.48 GFLOPS Peak performance / node : 147.2 GFLOPS SPARC64 Ixfx 1.84GHz Main memory per node : 200 GB Main memory per node : 32 GB, 128 GB Disk capacity : 556 TB Disk capacity : 1 PB Oakbridge-fx IBM POWER 7 3.83GHz AMD Quad Core Opteron 2.3GHz small size FX10, for long-time job execution 136.2 TFLOPS, 576 nodes Total Users > 2,000 2016/05/30 11 FX10 (Oakleaf-FX) SMP (Yayoi) HA8000 (T2K) PRIMEHPC FX10 SR16000/M1 (retired) CPU FUJITSU SPARC64IXfx IBM Power7 3.83GHz AMD Quad Core 1.8GHz Opteron 2.3GHz Total # of core 76800 1792 15232 Total Peak FLOPS 1.13 PFLOPS 54.9 TFLOPS 140 TFLOPS Total # of nodes 4800 56 952 Total Memory 150 TB 11200 GB 32 TB # of core / node 16 32 16 Perk FLOPS / node 236.5 GFLOPS 980.5 GFLOPS 147.2 GFLOPS Memory / node 32 GB 200 GB 32 GB, 128 GB Network Tofu 6D Mesh/Torus Hierarchical Full- Myrinet 10G bisection Full-bisection Storage 1.1PB + 2.1 PB 556 TB 1 PB 2016/05/30 12 • Well -Balanced System – Peak Performance: 1.13 PFLOPS, 398 TB/sec – Max. Power Consumption < 1.40 MW (<2.00MW with A/C) • Strict Requirement after March 11, 2011 • 1.043 PFLOPS for Linpack with 1.177 MW (excluding A/C) • 6- Dim. Mesh/Torus Interconnect – Highly Scalable Tofu Interconnect – 5.0x2 GB/sec/link, 6 TB/sec for Bi-Section Bandwidth • High- Performance File System – FEFS (Fujitsu Exabyte File System) based on Lustre • Flexible Switching between Full/Partial Operation • K compatible (16 cores/node, K: 8 cores/node) ! • Open-Source Libraries/Applications • Highly Scalable for both of Flat MPI and Hybrid (OpenMP + MPI) 2016/05/30 13 計算ノード群・インタラクティブノード群Compute nodes, Interactive nodes Management servers PRIMEHPCPRIMEHPC FX10 FX1050筐体構成 x 50 racks 計算ノード ノード (4,800(4,800 compute+ 300 nodes IO ) ) Job management, operation management, authentication servers: [総理論演算性能 : 1.13PFLOPS]Peak Performance: 1.13 petaflops [総主記憶容量 : 150TByte]Memory capacity: 150 TB PRIMERGY RX200S6 x 16 [インターコネクト: 6次元メッシュInterconnect:/トーラス 6D mesh/torus] - ”Tofu” External connection router Local file system External Ethernet InfiniBand file system PRIMERGY RX300 S6 x 2 (MDS) network ETERNUS DX80 S2 x 150 (OST) network Storage capacity: 1.1PB (RAID-5) Campus LAN Shared file system PRIMERGY RX300 S6 x 8 (MDS) PRIMERGY RX300 S6 x 40 (OSS) End users ETERNUS DX80 S2 x 4 (MDT) Log-in nodes ETERNUS DX410 S2 x 80 (OST) InfiniBand PRIMERGY RX300 S6 x 8 Ethernet Storage capacity: 2.1PB (RAID-6) FibreChannel • Aggregate memory bandwidth: 398 TB/sec. • Local file system for staging with 1.1 PB of capacity and 131 GB/sec of aggregate I/O performance (for staging) • Shared file system for storing data with 2.1 PB and 136 GB/sec. • External file system: 3.6 PB 2016/05/30 14 SPARC64™ IXfx SPARC64™ VIIIfx CPU 1.848 GHz 2.000 GHz “K” computer Number of Cores/Node 16 8 Size of L2 Cache/Node 12 MB 6 MB Peak Performance/Node 236.5 GFLOPS 128.0 GFLOPS Memory/Node 32 GB 16 GB Memory Bandwidth/Node 85 GB/sec (DDR3-1333) 64 GB/sec (DDR3-1000) 2016/05/30 15 • Enhanced instruction set for the SPARC-V9 instruction set arch. – High -Performance & Power-Aware • Extended Number of Registers – FP Registers: 32→256 • S/W Controllable Cache – “Sector Cache” – for keeping reusable data sets on cache • High-Performance, Efficient – Optimized FP functions – Conditional Operation 2016/05/30 16 • A “System Board” with 4 nodes • A “Rack” with 24 system boards (= 96 nodes) • Full System with 50 Racks, 4,800 nodes 2016/05/30 17 • Node Group – 12 nodes = 1 group – A/C-axis : on system board, B-axis: 3 system boards • 6D : (X,Y,Z,A,B,C) – ABC 3D Mesh : connects 12 nodes of each node group – XYZ 3D Mesh : connects “ABC 3D Mesh” group 2016/05/30 18 Computing/Interactive Nodes Login Nodes OS Special OS(XTCOS) Red Hat Enterprise Linux Fujitsu Fujitsu (Cross Compiler) Fortran 77/90 Fortran 77/90 Compiler C/C++ C/C++ GNU GNU (Cross Compiler) GCC,g95 GCC,g95 Fujitsu SSL II (Scientific Subroutine Library II),C-SSL II,SSL II/MPI Library Open Source BLAS,LAPACK,ScaLAPACK,FFTW,SuperLU,PETSc,METIS, SuperLU_DIST,Parallel NetCDF OpenFOAM,ABINIT-MP,PHASE, Applications FrontFlow/blue FrontSTR,REVOCAP File System FEFS (based on Lustre) bash, tcsh, zsh, emacs, autoconf, automake, bzip2, cvs, gawk, gmake, gzip, make, Free Software less, sed, tar, vim, etc. NO ISV/Commercial Applications (e.g. NASTRAN, ABAQUS, STAR-CD etc.) FY.2014: 83.6% Average Oakleaf-FX + Oakbridge-FX 100 Oakleaf-FX 90 Oakbridge-FX Yayoi 80 70 60 % 50 40 30 20 10 0 19 20 Engineering Earth/Space Material Energy/Physics Information Sci. Education Industry Bio Economics Applications 21 EngineeringEngineering Earth/SpaceEarth/Space MaterialMaterial Energy/PhysicsEnergy/Physics InformationInformation Sci.Sci. EducationEducation IndustryIndustry BioBio EconomicsEconomics Oakleaf-FX + Oakbridge-FX 22 General Group Users HPCI JHPCN Industry Education HPC-Challenge Personal Users Young Researcher Oakleaf-FX + Oakbridge-FX 2016/05/30 23 • Not FREE • Service Fee = Cost for Electricity (System+A/C) – 2M USD for Oakleaf-FX (2 MW) – 1M USD for T2K (1 MW) (~March 2014) 2016/05/30 24 • Originally, only academic users have been allowed to access our supercomputer systems. • Since FY.2008, we started services for industry – supports to start large-scale computing for future business – not compete with private data centers, cloud services … – basically, results must be open to public – max 10% total comp. resource is open for usage by industry – special qualification processes/special (higher) fee for usage • Currently Oakleaf-FX is open for industry – Normal usage (more expensive than academic users) • 3- 4 groups per year, fundamental research – Trial usage with discount rate – Research collaboration with academic rate (e.g. Taisei) – Open -Source/In-House Codes (NO ISV/Commercial App.) 2016/05/30 25 • 2-Day “Hands-on” Tutorials for Parallel Programming by Faculty Members of SCD/ITC (Free) – Fundamental MPI (3 times per year) – Advanced MPI (2 times per year) – OpenMP for Multicore Architectures (2 times per year) – Participants from industry are accepted. • Graduate/Undergraduate Classes with Supercomputer System (Free) – We encourage faculty members to introduce hands-on tutorial of supercomputer system into graduate/undergraduate classes. – Up to 12 nodes (192 cores) of Oakleaf-FX – Proposal-based – Not limited to Classes of the University of Tokyo, 2-3 of 10 • RIKEN AICS Summer/Spring School (2011~) 2016/05/30 26 • Proposal -based Research Project • Each group with accepted proposal can use full- system of Oakleaf-FX with 4,800 nodes for 24 hours • Once per month • Open to public 2016/05/30 27 • First step to login • How to use “job management system” 2016/05/30 28 We can’t login to compute nodes! We are here! We have to use login nodes and use job management system to utilize compute nodes.