SPARC64 Viiifx: CPU for the K Computer

Total Page:16

File Type:pdf, Size:1020Kb

SPARC64 Viiifx: CPU for the K Computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd.’s 45-nm CMOS process for semiconductors and is composed of eight cores, a 6 MB shared level 2 cache, and memory controllers. Peak performance of 128 GFLOPS at an operating frequency of 2 GHz is achieved with power consumption as low as 58 W. The performance per unit of power is more than six times that of the SPARC processor, our previous model. To achieve this performance per unit of power, we extended the SPARC-V9 architecture to develop high performance computing-arithmetic computational extensions (HPC-ACE), the optimum instruction set for scientific computations. In addition, we successfully reduced the leakage power by water cooling and dynamic power by clock gating to achieve a lower power consumption. Furthermore, high-reliability technology for mainframes and UNIX servers is used to ensure stable operation of a system connecting more than 80 000 processors. This paper outlines the technologies used to achieve the high performance, low power consumption and high reliability of SPARC64 VIIIfx. 1. Introduction 1) High performance Fujitsu developed SPARC64 VIIIfx1) (Fig­ SPARC64 VIIIfx is a multicore processor ure 1) as a processor for the supercomputer note)i (“K computer”). The K computer has more high-speed serial I/O (HSIO) than 80 000 processors installed to give it a computational performance in excess of 10 Core 5 L2 cache data Core 7 PFLOPS. These processors need to have high performance, low power consumption, and high reliability. This paper outlines the technologies Core 4 Core 6 used to achieve these goals. MAC L2 cache control MAC MAC MAC 2. Goals in development of DDR3 Interface Core 1 Core 3 DDR3 Interface SPARC64 VIIIfx DDR3 Interface The goals in the development of SPARC64 L2 cache data VIIIfx include: Core 0 Core 2 note)i “K computer” is the English name that RIKEN has been using for the supercomputer of this project since July 2010. “K” comes from the Japanese word “Kei,” which means ten peta or 10 to the Figure 1 SPARC64 VIIIfx chip. 16th power. 274 FUJITSU Sci. Tech. J., Vol. 48, No. 3, pp. 274–279 (July 2012) T. Yoshida et al.: SPARC64 VIIIfx: CPU for the K computer integrating eight cores, a shared level 2 cache, consumption of the processor needed to be memory access controllers (MACs), and a high- reduced to 58 W or less. To that end, Fujitsu speed serial I/O (HSIO). used low-leakage transistors and water cooling For each core to exhibit high-execution to lower the junction temperature to 30°C so as performance in a real application, we extended to reduce the leakage power. In addition, the the SPARC-V9 architecture2)–4) and developed processor must make full use of clock gating to High Performance Computing-Arithmetic reduce its dynamic power. Computational Extensions (HPC-ACE),5) an 3) High reliability instruction set capable of efficiently executing The processor requires high-reliability scientific computations. technology used for mainframes and UNIX To achieve a higher speed of parallel servers6),7) to ensure it operates stably. processing by having eight cores on the chip, the architecture must also have a function to share 3. Microarchitecture of SPARC64 the level 2 cache across all cores and synchronize VIIIfx cores by means of hardware. Combining this with The pipeline of SPARC64 VIIIfx is shown Fujitsu’s automatic parallel compiler allows the in Figure 2 and the specifications are shown in user to handle multiple cores when programming Table 1. as if it were one high-speed CPU without having A core is composed of an instruction to be aware of the multiple cores. At Fujitsu, this control unit, execution unit and level 1 cache. is called Virtual Single Processor by Integrated The instruction control unit is responsible for Multicore Parallel Architecture (VISIMPACT). instruction fetch, instruction decode, out-of-order 2) Low power consumption instruction control, and instruction commit Due to the limited amount of power control. available to the entire system, the power The execution unit is equipped with Out-of-order execution Register Fetch Decode Issue Memory access Commit read Execution Commit control Fetch EAGA port Instruction Instruction Address Fixed-point Data Program level 1 decode reservation register counter station EAGB Store level 1 cache port cache Control Fixed-point Fixed-point EXA register reservation renaming station register EXB Branch history Floating-point FLA reservation Floating-point station register FLB FLC Branch Floating-point reservation renaming station register FLD Level 2 MAC cache 8 cores Figure 2 SPARC64 VIIIfx pipeline. FUJITSU Sci. Tech. J., Vol. 48, No. 3 (July 2012) 275 T. Yoshida et al.: SPARC64 VIIIfx: CPU for the K computer Table 1 In addition, a K computer exclusive SPARC64 VIIIfx specifications. Item Specification InterConnect Controller chip and HSIO are No. of cores 8 used for connections to ensure a high inter-chip Level 2 cache 6 MB communication throughput. Operating frequency 2 GHz Process technology FSL 45-nm CMOS 4. Instruction extensions HPC- Die size 22.7 mm × 22.6 mm ACE No. of transistors Approx. 760 million HPC-ACE is an extended instruction set Peak performance 128 GFLOPS intended for scientific computation for the Memory bandwidth 64 GB/s (theoretical peak value) SPARC-V9 architecture. It was developed based Power consumption 58 W (process condition: TYP) on the analysis of many HPC applications jointly FSL: Fujitsu Semiconductor Ltd. with Fujitsu software development department. 1) Expansion of number of registers two fixed-point functional units (EXA/B), The number of floating-point registers of two functional units for load/store address SPARC-V9 is 32, which is not sufficient for HPC computation (EAGA/B) and four floating-point applications. Increasing the number of registers, multiply-and-add (FMA) units (FLA/B/C/D). The however, is not possible with the 32-bit SPARC FMA units have a single instruction multiple architecture because there is an insufficient data (SIMD) architecture and execute two instruction length. As a solution to this, a parallel operations with one instruction. One new prefix instruction called the set extended FMA unit is capable of conducting floating-point arithmetic register (SXAR) has been defined for multiplication and addition for each cycle and HPC-ACE. An SXAR instruction extends register each core can execute eight double-precision addressing for up to two following instructions. floating-point operations per cycle. Hence the The register address length is extended by 3 bits, chip is capable of making 64 double-precision which allows for 256 floating-point registers, floating-point operations per cycle. The operating eight times that of SPARC-V9 (Figure 3). frequency is 2 GHz and the peak performance is The compiler uses this high-capacity set 128 GFLOPS. There are 192 fixed-point registers of registers for optimization including software and 256 floating-point registers. pipelining and maximizes the instruction-level The level 1 cache processes load/store parallelism of an application. In terms of the instructions. Each core has a 32 Kbyte two-way Himeno Benchmark, a representative HPC instruction cache and data cache. The data benchmark, the performance has improved by cache has a dual-port structure capable of two 1.65 times. simultaneous load accesses and executes two 2) SIMD operations and load/store instructions 16-byte SIMD loads or one 16-byte SIMD store. SIMD is a technology to allow parallel The level 2 cache is shared by the eight cores execution of more than one data process with one and cache coherence is ensured for each core. An instruction. HPC-ACE uses SIMD technology inter-core hardware barrier is provided to allow to execute two FMA operations with one high-speed synchronization between the cores, as instruction. SIMD operations for more quickly will be described later. multiplying complex numbers are also supported. SPARC64 VIIIfx incorporates memory con- In addition, SIMD execution is possible with load trollers to reduce its latency and improve the and store instructions. SIMD processing of load throughput of its memory access. The memory instructions is executed without penalty in an bandwidth is 64 GB/s as a theoretical peak value. 8-byte alignment for double precision and 4-byte 276 FUJITSU Sci. Tech. J., Vol. 48, No. 3 (July 2012) T. Yoshida et al.: SPARC64 VIIIfx: CPU for the K computer Floating-point register SPARC-V9 32 SXAR Extended address (3 bits) Address Register address Extended 224 (5 bits) 8 bits in total register Subsequent instruction Figure 3 Register address extension by SXAR instruction. alignment for single precision. To efficiently process loops containing 3) Sector cache mechanism if statements, it is necessary to eliminate For HPC-ACE, a cache mechanism (sector conditional branch instructions. For that cache) that can be software-controlled has been purpose, conditional execution instructions developed. The conventional caches cannot be have been added for HPC-ACE. Specifically, controlled by software. Even if the user is aware a new compare instruction is used to write the of the high frequency with which data is reused, result of comparison in a floating-point register the hardware evicts the data from the cache and a conditional execution instruction is used when registering other data in the cache, which based on the result of such comparison. As the might hinder improvements in performance. To conditional execution instructions, data transfer address this issue, the sector cache mechanism between floating-point registers and store splits the cache into two sectors and allows from a floating-point register to memory have software to be used to register frequently reused been defined.
Recommended publications
  • Arm Arhitektura Napreduje – No Postoje Izazovi
    ARM ARHITEKTURA NAPREDUJE – NO POSTOJE IZAZOVI SAŽETAK Procesori ARM ISA arhitekture ne koriste se više "samo" za mobitele i tablete, nego i za serverska računala (pa i superračunala), te laptop / desktop računala. I Apple je kod Macintosh računala prešao sa Intel arhitekture na ARM arhitekturu. Veliki izazov kod tog prelaska bio je - kako omogućiti da se programski kod pisan za Intelovu ISA arhitekturu, izvršava na procesoru M1 koji ima ARM arhitekturu. Općenito, želja je da se programi pisani za jednu ISA arhitekturu mogu sa što manje napora izvršavati na računalima koja imaju drugačiju ISA arhitekturu. Problem je u tome što različite ISA arhitekture mogu imati dosta različite memorijske modele. ABSTRACT ARM ISA processors are no longer used "only" for mobile phones and tablets, but also for server computers (even supercomputers) and laptops / desktops. Apple has also switched from Intel to ARM on Macintosh computers. The big challenge in that transition was - how to enable program code written for Intel's ISA architecture to run on an M1 processor that has an ARM architecture. In general, the desire is that programs written for one ISA architecture can be executed with as little effort as possible on computers that have a different ISA architecture. The problem is that different ISA architectures can have quite different memory models. 1. UVOD Procesori ARM ISA arhitekture ne koriste se više "samo" za mobitele i tablete, nego i za serverska računala (pa i superračunala), te laptop/desktop računala. Koriste se i u industriji, u proizvodnim procesima, a i kao ugradbeni čipovi u ostale proizvode, npr. za ugradnju u IOT uređaje i u vozila (posebno autonomna vozila).
    [Show full text]
  • Installation and Upgrade General Checklist Report
    VeritasTM Services and Operations Readiness Tools Installation and Upgrade Checklist Report for InfoScale Storage 7.3.1, Solaris 11, SPARC Index Back to top Important Notes Components for InfoScale System requirements Product features Product documentation Patches for InfoScale Storage and Platform Platform configuration Host bus adapter (HBA) parameters and switch parameters Operations Manager Array Support Libraries (ASLs) Additional tasks to consider Important Notes Back to top Note: You have not selected the latest product version. Consider installing or upgrading to the latest version; otherwise, you may encounter issues that have already been fixed. To run a new report for the latest product version, select it from the Veritas product drop-down list. Array Support Libraries (ASLs) When installing Veritas products, please be aware that ASLs updates are not included in the patches update bundle. Please go to the Array Support Libraries (ASLs) to get the latest updates for your disk arrays. Components for InfoScale Back to top The InfoScale product you selected contains the following component(s). Dynamic Multi-Pathing Storage Foundation Storage Foundation Cluster File System Get more introduction of Veritas InfoScale. System requirements Back to top Required CPU number and memory: Required number of CPUs Required memory SF or SFHA N/A 2GB SFCFS or SFCFSHA 2 2GB 1 of 11 VeritasTM Services and Operations Readiness Tools Required disk space: Partitions Minimum space required Maximum space required Recommended space /opt 99 MB 1051 MB 134 MB /root 1 MB 30 MB 1 MB /usr 3 MB 237 MB 3 MB /var 1 MB 1 MB 1 MB Supported architectures: SPARC M5 series [1] SPARC M6 series [1] SPARC M7 series [1][2] SPARC S7 series [1] SPARC T3 series [1] SPARC T4 series [1] SPARC T5 series [1] SPARC T7 series [1][2] SPARC T8 series [1][2] SPARC64 X+ series [1] SPARC64 XII series [1] SPARC64-V series SPARC64-VI series SPARC64-VII/VII+ series SPARC64-X series [1] UltraSPARC II series UltraSPARC III series UltraSPARC IV series UltraSPARC T1 series [1] UltraSPARC T2/T2+ series [1] 1.
    [Show full text]
  • Supercomputer Fugaku
    Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst.
    [Show full text]
  • Fujitsu's Vision for High Performance Computing
    FujitsuFujitsu’’ss VisionVision forfor HighHigh PerformancePerformance ComputingComputing October 26, 2004 Yuji Oinaga Fujitsu Ltd. 1 CONTENTSCONTENTS HPC Policy and Concept Current HPC Platform HPC2500 IA-Cluster Major HPC customers Road Map Toward Peta-Scale Computing 2 FUJITSUFUJITSU HPCHPC PolicyPolicy ~ Developing & Providing Top Level Supercomputer ~ - High Computational Power - High Speed Interconnect - Reliability/Availability - Leading Edge Semiconductor Technology - Highly Reliable, HPC Featured Operating System & Middleware - Sophisticated Compilers and Development Tools 3 HistoryHistory ofof HPCHPC PlatformPlatform PRIMEPOWER HPC2500 VPP5000 Vector Parallel VPP300/700 IA32/IPF Cluster PRIMEPOWER VPP500 2000 Vector VP2000 AP3000 VP Series AP1000 F230-75APU Scalar 1980 1985 1990 1995 2000 2005 4 ConceptConcept ofof FujitsuFujitsu HPCHPC ProvideProvide thethe best/fastestbest/fastest platformplatform forfor eacheach applications.applications. -Scalar SMP : PRIMEPOWER, New Linux SMP -IA Cluster : PRIMERGY Cluster ProvideProvide thethe highhigh speedspeed interconnect.interconnect. -Crossbar : High Speed Optical Interconnect -Multi-Stage : Infiniband ProvideProvide HighHigh Operability/UsabilityOperability/Usability -Enhancement of Operability/Usability : Extended Partitioning, Dynamic Reconfiguration -Cluster Middleware : Total Cluster Control System ( Parallelnavi ) Shared Rapid File System ( SRFS ) 5 FujitsuFujitsu’’ss CurrentCurrent HPCHPC PlatformPlatform ~ Fujitsu is a leading edge company in HPC field ~ SPARC/Solaris
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Programming on K Computer
    Programming on K computer Koh Hotta The Next Generation Technical Computing Fujitsu Limited Copyright 2010 FUJITSU LIMITED System Overview of “K computer” Target Performance : 10PF over 80,000 processors Over 640K cores Over 1 Peta Bytes Memory Cutting-edge technologies CPU : SPARC64 VIIIfx 8 cores, 128GFlops Extension of SPARC V9 Interconnect, “Tofu” : 6-D mesh/torus Parallel programming environment. 1 Copyright 2010 FUJITSU LIMITED I have a dream that one day you just compile your programs and enjoy high performance on your high-end supercomputer. So, we must provide easy hybrid parallel programming method including compiler and run-time system support. 2 Copyright 2010 FUJITSU LIMITED User I/F for Programming for K computer K computer Client System FrontFront End End BackBack End End (80000 (80000 Nodes) Nodes) IDE Interface CCommommandand JobJob C Controlontrol IDE IntInteerfrfaceace debugger DDebugebuggerger App IntInteerfrfaceace App Interactive Debugger GUI Data Data Data CoConvnveerrtteerr Data SamplSampleerr debugging partition official op VisualizedVisualized SSaammpplingling Stage out partition Profiler DaDattaa DaDattaa (InfiniBand) 3 Copyright 2010 FUJITSU LIMITED Parallel Programming 4 Copyright 2010 FUJITSU LIMITED Hybrid Parallelism on over-640K cores Too large # of processes to manipulate To reduce number of processes, hybrid thread-process programming is required But Hybrid parallel programming is annoying for programmers Even for multi-threading, procedure level or outer loop parallelism was desired Little opportunity for such coarse grain parallelism System support for “fine grain” parallelism is required 5 Copyright 2010 FUJITSU LIMITED Targeting inner-most loop parallelization Automatic vectorization technology has become mature, and vector-tuning is easy for programmers. Inner-most loop parallelism, which is fine-grain, should be an important portion for peta-scale parallelization.
    [Show full text]
  • Microprocessor
    MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE FUJITSU’S SPARC64 V IS REAL DEAL Fujitsu’s design differs from HAL’s version; faster than US III By Kevin Krewell {10/21/02-01} At Microprocessor Forum 2002, Fujitsu stepped forward to carry the banner for the fastest SPARC processor, surpassing Sun’s fastest UltraSPARC III on clock frequency and SPEC performance. In fact, at 1.35GHz, the Fujitsu processor has the highest clock frequency of any 64-bit server processor now in production. This by a Fujitsu acquisition, HAL Computer. At Micro- clock-speed advantage—plus large on-chip processor Forum 1999, Mike Shebanow, then vice caches; a fast system-crossbar switch; and a four- president and CTO of the HAL Computer Divi- issue, out-of-order core—translates into good sion of Fujitsu, proposed a complex and very wide SPEC benchmark performance for Fujitsu’s superscalar version of the SPARC64 V processor, SPARC64 V processor. The SPARC64 V is an evo- with an instruction trace cache, superspeculation, lutionary step in Fujitsu’s SPARC processor line and split L2 cache (see MPR 11/15/99-01,“Hal instead of the more-radical approach proposed three Makes Sparcs Fly”). Mike’s innovative chip never made years ago by HAL Computer at Microprocessor Forum it to market, but the trace-cache idea became a mainstream 1999. This latest version of Fujitsu’s SPARC processor fam- processor feature, shipping in Intel’s Pentium 4 microarchi- ily, revealed by Fujitsu’s director of Development Depart- tecture. The Pentium 4 also paralleled another of Mike’s ment (Processor Development Division) Aiichiro Inoue, is based on an enhanced version of the previous SPARC64 GP processor core, with larger caches and advanced semicon- L1I$ Decode & FP (16) L1D$ L2$ 128KB I-buffer Issue (4) SP (10) 128KB 2MB ductor process technology.
    [Show full text]
  • 40 Jahre Erfolgsgeschichte BS2000 Vortrag Von 2014
    Fujitsu NEXT Arbeitskreis BS2000 40 Jahre BS2000: Eine Erfolgsgeschichte Achim Dewor Fujitsu NEXT AK BS2000 5./6. Mai 2014 0 © FUJITSU LIMITED 2014 Was war vor 40 Jahren eigentlich so los? (1974) 1 © FUJITSU LIMITED 2014 1974 APOLLO-SOJUS-KOPPLUNG Die amtierenden Staatspräsidenten Richard Nixon und Leonid Breschnew unterzeichnen das Projekt. Zwei Jahre arbeitet man an den Vorbereitungen für eine neue gemeinsame Kopplungsmission 2 © FUJITSU LIMITED 2014 1974 Der erste VW Golf wird ausgeliefert 3 © FUJITSU LIMITED 2014 1974 Willy Brandt tritt zurück 6. MAI 1974 TRITT BUNDESKANZLER WILLY BRANDT IM VERLAUF DER SPIONAGE-AFFÄRE UM SEINEN PERSÖNLICHEN REFERENTEN, DEN DDR-SPION GÜNTER GUILLAUME 4 © FUJITSU LIMITED 2014 1974 Muhammad Ali gegen George Foreman „Ich werde schweben wie ein Schmetterling, zustechen wie eine Biene", "Ich bin das Größte, das jemals gelebt hat“ "Ich weiß nicht immer, wovon ich rede. Aber ich weiß, dass ich recht habe“ (Zitate von Ali) 5 © FUJITSU LIMITED 2014 1974 Deutschland wird Fußball-Weltmeister Breitners Elfer zum Ausgleich und das unvergängliche, typische Müller- Siegtor aus der Drehung. Es war der sportliche Höhepunkt der wohl besten deutschen Mannschaft aller Zeiten. 6 © FUJITSU LIMITED 2014 Was ist eigentlich „alt“? These: „Stillstand macht alt …. …. ewig jung bleibt, was laufend weiterentwickelt wird!“ 7 © FUJITSU LIMITED 2014 Was ist „alt“ - was wurde weiterentwickelt ? Wird das Fahrrad bald 200 Jahre alt !? Welches? Das linke oder das rechte? Fahrrad (Anfang 1817) Fahrrad (2014) 8 © FUJITSU LIMITED 2014 Was ist „alt“ - was wurde weiterentwickelt ? Soroban (Japan) Im Osten, vom Balkan bis nach China, wird er hier und da noch als preiswerte Rechenmaschine bei kleineren Geschäften verwendet.
    [Show full text]
  • Guía Sobre Tiflotecnología Y Tecnología De Apoyo Para Uso
    Guía sobre Tiflotecnología y Tecnología de Apoyo para uso educativo (Última actualización: febrero 2016) Guía sobre Tiflotecnología y Tecnología de Apoyo para uso educativo ÍNDICE INTRODUCCIÓN ..................................................................................................................................... 6 CONOCIMIENTOS BÁSICOS .................................................................................................................. 11 DEFINICIONES.................................................................................................................................... 11 HARDWARE ....................................................................................................................................... 14 SOFTWARE ........................................................................................................................................ 18 OTROS DISPOSITIVOS ........................................................................................................................ 19 SISTEMA OPERATIVO (VERSIONES) WINDOWS, OS X, IOS, ANDROID, WINDOWS MOBILE, LINUX ........ 22 WINDOWS ......................................................................................................................................... 23 OS X ................................................................................................................................................... 25 LINUX ...............................................................................................................................................
    [Show full text]
  • Toward Automated Cache Partitioning for the K Computer
    IPSJ SIG Technical Report Toward Automated Cache Partitioning for the K Computer Swann Perarnau1 Mitsuhisa Sato1,2 Abstract: The processor architecture available on the K computer (SPARC64VIIIfx) features an hardware cache par- titioning mechanism called sector cache. This facility enables software to split the memory cache in two independent sectors and to select which one will receive a line when it is retrieved from memory. Such control over the cache by an application enables significant performance optimization opportunities for memory intensive programs, that several studies on software-controlled cache partitioning environments already demonstrated. Most of these previous studies share the same implementation idea: the use of page coloring and an overriding of the operating system virtual memory manager. However, the sector cache differs in several key points over these imple- mentations, making the use of the existing analysis or optimization strategies impractical, while enabling new ones. For example, while most studies overlooked the issue of phase changes in an application because of the prohibitive cost of a repartitioning, the sector cache provides this feature without any costs, allowing multiple cache distribution strategies to alternate during an execution, providing the best allocation of the current locality pattern. Unfortunately, for most application programmers, this hardware cache partitioning facility is not easy to take advantage of. It requires intricate knowledge of the memory access patterns of a code and the ability to identify data structures that would benefit from being isolated in cache. This optimization process can be tedious, with multiple code modifications and countless application runs necessary to achieve a good optimization scheme.
    [Show full text]
  • Fujitsu Standard Tool
    Toward Building up ARM HPC Ecosystem Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Sept. 12th, 2017 0 Copyright 2017 FUJITSU LIMITED Outline Fujitsu’s Super computer development history and Post-K Processor Overview Compiler Development for ARMv8 with SVE Towards building up ARM HPC Ecosystem 1 Copyright 2017 FUJITSU LIMITED Fujitsu’s Super computer development history and Post-K Processor Overview 2 Copyright 2017 FUJITSU LIMITED Fujitsu Supercomputers Fujitsu has been providing high performance FX100 supercomputers for 40 years, increasing application performance while maintaining FX10 Post-K Under Development application compatibility w/ RIKEN No.1 in Top500 (June and Nov., 2011) K computer World’s Fastest FX1 Vector Processor (1999) Most Efficient Performance VPP5000 SPARC in Top500 (Nov. 2008) NWT* Enterprise Developed with NAL No.1 in Top500 PRIMEQUEST (Nov. 1993) VPP300/700 PRIMEPOWER PRIMERGY CX400 Gordon Bell Prize Skinless server (1994, 95, 96) ⒸJAXA HPC2500 VPP500 World’s Most Scalable PRIMERGY VP Series Supercomputer BX900 Cluster node AP3000 (2003) HX600 F230-75APU Cluster node AP1000 PRIMERGY RX200 Cluster node Japan’s Largest Japan’s First Cluster in Top500 Vector (Array) (July 2004) Supercomputer (1977) *NWT: Numerical Wind Tunnel 3 Copyright 2017 FUJITSU LIMITED FUJITSU high-end supercomputers development 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 K computer and PRIMEHPC FX10/FX100 in operation FUJITSU The CPU and interconnect of FX10/FX100 inherit the K PRIMEHPC FX10 PRIMEHPC FX100
    [Show full text]
  • World's Fastest Computer
    HISTORY OF FUJITSU LIMITED SUPERCOMPUTING “FUGAKU” – World’s Fastest Computer Birth of Fugaku Fujitsu has been developing supercomputer systems since 1977, equating to over 40 years of direct hands-on experience. Fujitsu determined to further expand its HPC thought leadership and technical prowess by entering into a dedicated partnership with Japan’s leading research institute, RIKEN, in 2006. This technology development partnership led to the creation of the “K Computer,” successfully released in 2011. The “K Computer” raised the bar for processing capabilities (over 10 PetaFLOPS) to successfully take on even larger and more complex challenges – consistent with Fujitsu’s corporate charter of tackling societal challenges, including global climate change and sustainability. Fujitsu and Riken continued their collaboration by setting aggressive new targets for HPC solutions to further extend the ability to solve complex challenges, including applications to deliver improved energy efficiency, better weather and environmental disaster prediction, and next generation manufacturing. Fugaku, another name for Mt. Fuji, was announced May 23, 2019. Achieving The Peak of Performance Fujitsu Ltd. And RIKEN partnered to design the next generation supercomputer with the goal of achieving Exascale performance to enable HPC and AI applications to address the challenges facing mankind in various scientific and social fields. Goals of the project Ruling the Rankings included building and delivering the highest peak performance, but also included wider adoption in applications, and broader • : Top 500 list First place applicability across industries, cloud and academia. The Fugaku • First place: HPCG name symbolizes the achievement of these objectives. • First place: Graph 500 Fugaku’s designers also recognized the need for massively scalable systems to be power efficient, thus the team selected • First place: HPL-AI the ARM Instruction Set Architecture (ISA) along with the Scalable (real world applications) Vector Extensions (SVE) as the base processing design.
    [Show full text]