HPC on the Ultrasparc CMT Processors

Total Page:16

File Type:pdf, Size:1020Kb

HPC on the Ultrasparc CMT Processors HPC on the UltraSPARC CMT Processors Deepak Jeevan Kumar, Verdi March, Henry Kasim, Simon See {deepak.jeevankumar, verdi.march, henry.kasim, simon.see}@sun.com Version 2.7b – 16st September 2008 1. Will the UltraSPARC Tx be Suitable for HPC workloads? This question is a Holy Grail that is currently being treasure hunted not only by SPARC fans but also by the hard-core x64 fans. Why? The T1, T2 and the T2 Plus are such enigmatic processors that have an insane plethora of threads (32/64 threads), have the highest core count on a single- die (8 cores). At the same time it has just too many seemingly “anti-HPC” characteristics – it is contented with just 1 or 2 physical CPUs per server, runs at an absurdly low “non-HPC” clock speed (1.2–1.4 GHz), has room for just one FPU per core, and a relatively weak SIMD support compared to x64 which supports the SSE instruction set family. To verify whether these characteristics in practice hinder the performance of actual floating-point- intensive applications, we summarize and analyze the public results of SPECfp_rate2006 on four real dual-chip systems: ● A system with two Sun UltraSPARC T2 Plus CPU chips (eight cores) ● Three dual-chip (aka dual-socket) systems representing the alternative state-of-the-art CPU architectures, namely quad-core AMD Opteron, dual-core IBM Power 6, and quad-core Intel Xeon. The representative for a CPU architecture is the dual-chip system of that particular CPU, which attains the highest SPECfp_rate2006. See the References and Acknowledgments section for more information. Our two major findings are: 1. The T2 Plus system achieves the highest result for SPECfp_rate2006 for dual-chip systems. The SPECfp_rate2006 metric represents the overall performance of a system based on 17 benchmark applications. See the References and Acknowledgments section for more information. 2. Of the 17 benchmarks comprising the SPECfp_rate2006 metric, the T2 Plus system achieves the top-two performance for 13 applications (76%). As a comparison, Opteron, Power 6, and Xeon achieve the top-two performance only on 5 (29%), 9 (53%), and 7 (41%) applications, respectively. This demonstrates that CMT and high memory bandwidth address the need of existing HPC applications, and therefore, it justifies Sun's commitment in focusing on balanced system design rather than solely on the raw floating-point capability. 2. Processors and Servers in the UltraSPARC Tx family We are currently in the third generation of the UltraSPARC Tx family of processors also known as CMT (Chip Multithreading Processors) or Coolthreads processors (http://www.sun.com/ coolthreads). Table 1 shows Sun servers based on this processor family. Page 1 of 7 Table 1 Processor Server Max. # sockets / # Max. cores / Rack GHz server server Units UltraSPARC T1 Sun Fire T1000 1.0 1 8 1 RU UltraSPARC T1 Sun Fire T2000 1.4 1 8 2 RU UltraSPARC T1 Sun Blade T6300 1.4 1 8 Blade UltraSPARC T2 Sun SPARC Enterprise T5120 1.4 1 8 1 RU UltraSPARC T2 Sun SPARC Enterprise T5220 1.4 1 8 2 RU UltraSPARC T2 Sun Blade T6320 1.4 1 8 Blade UltraSPARC T2 Plus Sun SPARC Enterprise T5140 1.2 2 16 1 RU UltraSPARC T2 Plus Sun SPARC Enterprise T5240 1.4 2 16 2 RU Note: The number of sockets refers to the number of chips. 3. Theoretical Peak Floating-Point Performance of Dual-Socket (aka Dual-Chip) Servers Table 2 summarizes the peak floating-point performance of the dual-socket (aka dual-chip) servers available today with the fastest CPUs from different vendors. As mentioned in Section 1, these vendors are chosen as they attain the highest SPECfp_rate2006 for their CPU architecture. Table 2 Server Processor GHz # cores / server Rpeak (GFLOPS) Sun SPARC Enterprise T5240 2 x UltraSPARC T2 Plus 1.4 16 22.40 3DBOXX WORKSTATION 8400 Special Edition 2 x Xeon X5482 3.2 8 96.00 IBM BladeCenter LS22 2 x Opteron 2356 2.3 8 73.60 IBM System p 570 2 x Power 6 4.7 4 75.2 Does this low Rpeak of 22.40 GFLOPS mean much in real world applications? To answer this question, we first analyze the SPECfp_rate2006 results. 4. SPECfp_rate2006 on the UltraSPARC T2 Plus 4.1 Analysis of the Peak Results – UltraSPARC T2 Plus Beats Power 6 As the first step we took a quick look at SPECfp_rate2006. We were pleasantly shocked. Table 3 below summarizes the peak SPECfp_rate2006. Table 3 Server Processor GHz # cores / server SPECfp_rate2006 Sun SPARC Enterprise T5240 2 x UltraSPARC T2 Plus 1.4 16 119.53 3DBOXX WORKSTATION 8400 Special Edition 2 x Xeon X5482 3.2 8 88.7 IBM BladeCenter LS22 2 x Opteron 2356 2.3 8 94.7 IBM System p 570 2 x Power 6 4.7 4 116 Page 2 of 7 The peak score of 119 is the fastest SPECfp_rate2006 score amongst the scores of all dual-chip servers. Now let us analyze Table 4 which divides the SPECfp_rate2006 number by the Rpeak. Since SPECfp_rate2006 is not measured in GFLOPS, each ratio by itself does not carry any information. Rather, it can be used to compare only with the ratio of another system. Table 4 Server Processor GHz # cores / server SPECfp_rate2006 / Rpeak Sun SPARC Enterprise T5240 2 x UltraSPARC T2 Plus 1.4 16 5.34 3DBOXX WORKSTATION 8400 Special Edition 2 x Xeon X5482 3.2 8 0.92 IBM BladeCenter LS22 2 x Opteron 2356 2.3 8 1.29 IBM System p 570 2 x Power 6 4.7 4 1.54 Apparently, the UltraSPARC T2 Plus chip did an excellent job in squeezing out the maximum performance from its FPUs. The 128 threads and high per-core memory bandwidth indeed result in a significantly higher SPECfp_rate2006: 3.5–5.8 times of the other three systems. It even beats the IBM p570 (2 x Power 6) that has 3.3 times its peak performance (Rpeak). 4.2 Analysis of Individual SPECfp_rate2006 Test Results To find out more whether the UltraSPARC T2 Plus chips are suitable for certain kinds of floating- point workloads, we analyzed the results of Specfp_rate2006 on the 4 different dual-chip servers. These results comprises of the results of 17 applications and the geometric mean of those (Table 5). Table 5 2 x UltraSPARC 2 x Xeon 2 x Opteron 2 x Power 6 T2 Plus X5482 2356 Abbreviation S X O P Highest-to-Lowest SPECfp_rate2006 119 88.7 94.7 116 S-P-O-X 410.bwaves 151 44.3 96.7 188 P-S-O-X 416.gamess 106 191 123 92.8 X-O-S-P 433.milc 146 30.9 73.1 86.1 S-P-O-X 434.zeusmp 107 84.5 95.8 135 P-S-O-X 435.gromacs 104 162 108 78.6 X-O-S-P 436.cactusADM 105 112 100 121 P-X-S-O 437.leslie3d 102 38.0 62.8 113 P-S-O-X 444.namd 117 136 96.4 99.5 X-S-P-O 447.dealII 193 151 166 159 S-O-P-X 450.soplex 117 48.4 61.8 119 P-S-O-X 453.povray 200 254 137 99.9 X-S-O-P 454.calculix 77.7 179 115 112 X-O-P-S 459.GemsFDTD 87.8 35.9 59.5 88.8 P-S-O-X 465.tonto 127 139 117 112 X-S-O-P 470.lbm 85.3 52.1 59.7 189 P-S-O-X 481.wrf 129 69.7 106 82.3 S-O-P-X 482.sphinx3 148 105 102 174 P-S-X-O LEGEND Highest Lowest Page 3 of 7 From the above table, the followings are observed: ● UltraSPARC T2 Plus scores the highest result of SPECfp_rate2006, beating the Power 6 system by 2.6%. ● UltraSPARC T2 Plus achieve good performance on majority of the 17 applications, as detailed below: ○ UltraSPARC T2 Plus emerges as the best performer in 3 of the 17 tests (milc, dealII and wrf). ○ UltraSPARC T2 Plus is the 1st or 2nd in 13 of the 17 tests (76%) ○ UltraSPARC T2 Plus is the 3rd or 4th in 4 of the 17 tests (gamess, gromacs, cactusADM and calculix). ○ UltraSPARC T2 Plus scores the lowest rate only in 1 of the 17 tests (calculix) ● UltraSPARC T2 Plus compares favorably versus the other three systems although it has a lower theoretical peak performance (Rpeak): ○ UltraSPARC T2 Plus beats x64 (Opteron and Xeon) in 10 of the 17 tests (bwaves, milc, zeusmp, leslie3d, dealII, soplex, GemsFDTD, lbm, wrf, sphinx3) ■ UltraSPARC T2 Plus beats the Xeon system in 10 of the 17 tests ■ UltraSPARC T2 Plus beats the Opteron system in 14 of the 17 tests ○ UltraSPARC T2 Plus beats the Power 6 system in 8 of the 17 tests (gamess, milc, gormacs, namd, dealII, povray, tonto, wrf) ● Xeon performs the best in 6 of 17 tests – especially in molecular dynamics applications ● The AMD Opteron never emerge as the best performer ● Power 6 is the worst performer in 4 tests (gamess, gromacs, povray, tonto) Table 6 shows the applications on which each system achieves the top-two SPECfp_rate2006. As can be observed, the T2 Plus system achieves the top-two performance on 13 applications out of 17 (76%), whereas the Opteron, Power 6, and Xeon system achieves the top-two performance only on 5 (29%), 9 (53%), and 7 (41%) applications, respectively. Table 6 2 x UltraSPARC 2 x Xeon 2 x Opteron 2 x Power 6 T2 Plus X5482 2356 Abbreviation S X O P Highest-to-Lowest SPECfp_rate2006 119 88.7 94.7 116 S-P-O-X 410.bwaves 151 44.3 96.7 188 P-S-O-X 416.gamess 106 191 123 92.8 X-O-S-P 433.milc 146 30.9 73.1 86.1 S-P-O-X 434.zeusmp 107 84.5 95.8 135 P-S-O-X 435.gromacs 104 162 108 78.6 X-O-S-P 436.cactusADM 105 112 100 121 P-X-S-O 437.leslie3d 102 38.0 62.8 113 P-S-O-X 444.namd 117 136 96.4 99.5 X-S-P-O 447.dealII 193 151 166 159 S-O-P-X 450.soplex 117 48.4 61.8 119 P-S-O-X 453.povray 200 254 137 99.9 X-S-O-P 454.calculix 77.7 179 115 112 X-O-P-S 459.GemsFDTD 87.8 35.9 59.5 88.8 P-S-O-X 465.tonto 127 139 117 112 X-S-O-P Page 4 of 7 470.lbm 85.3 52.1 59.7 189 P-S-O-X 481.wrf 129 69.7 106 82.3 S-O-P-X 482.sphinx3 148 105 102 174 P-S-X-O LEGEND Top-Two Performance Bottom-Two Performance Our findings indicate that in many cases, the actual performance is not hindered by the perceived lack of floating-point capability of UltraSPARC T2 Plus.
Recommended publications
  • Datasheet Fujitsu Sparc Enterprise T5440 Server
    DATASHEET FUJITSU SPARC ENTERPRISE T5440 SERVER DATASHEET FUJITSU SPARC ENTERPRISE T5440 SERVER THE SYSTEM THAT MOVES WEB APPLICATION CONSOLIDATION INTO MID-RANGE COMPUTING. UP TO 4 HIGH PERFORMANCE PROCESSORS, HIGH MEMORY AND EXTENSIVE CONNECTIVITY PROVIDE THE INFRASTRUCTURE FOR BACK OFFICE AND DATA CENTER CONSOLIDATION TASKS. FUJITSU SPARC ENTERPRISE FOR WEB SECURITY, SPARC ENVIRONMENTS MEAN MANAGEABILITY AND EFFICIENCY AND PERFORMANCE RELIABILITY Fujitsu SPARC Enterprise throughput computing Based on a four socket design, Fujitsu SPARC servers are the ultimate in Web and front-end Enterprise T5440 provides up to 256 threads and business processes. Designed for space efficiency, 512GB of memory for outstanding workload low power consumption, and maximum compute consolidation. These servers can deliver outstanding performance they provide high throughput, data throughput performance in web and network energy-saving, and space-saving solutions, in Web environments while also delivering excellent server server deployment. Built on UltraSPARC T2 or consolidation capability for back office and UltraSPARC T2 Plus processors, everything is departmental database solutions. Fully supported by integrated together on each processor chip to reduce solid management and the top scalability and the overall component count. This speeds openness of the Solaris Operating system, you have performance lowers power use and reduces the ability to maximise thread utilization, deliver component failure. Add in the no-cost virtualization application capability, and scale as large as you technology from Logical Domains and Solaris need. Containers and you have a fully scalable environment for server consolidation. Finish it off with on-chip The intrinsic service management in Fujitsu SPARC encryption and 10 Giga-bit Ethernet freeways and Enterprise T5440 combined with the SPARC they provide the compete environment for secure hardware architecture and Solaris operating system data processing and lightening fast throughput.
    [Show full text]
  • Oracle® Developer Studio 12.6
    ® Oracle Developer Studio 12.6: C++ User's Guide Part No: E77789 July 2017 Oracle Developer Studio 12.6: C++ User's Guide Part No: E77789 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.
    [Show full text]
  • Debugging Multicore & Shared- Memory Embedded Systems
    Debugging Multicore & Shared- Memory Embedded Systems Classes 249 & 269 2007 edition Jakob Engblom, PhD Virtutech [email protected] 1 Scope & Context of This Talk z Multiprocessor revolution z Programming multicore z (In)determinism z Error sources z Debugging techniques 2 Scope and Context of This Talk z Some material specific to shared-memory symmetric multiprocessors and multicore designs – There are lots of problems particular to this z But most concepts are general to almost any parallel application – The problem is really with parallelism and concurrency rather than a particular design choice 3 Introduction & Background Multiprocessing: what, why, and when? 4 The Multicore Revolution is Here! z The imminent event of parallel computers with many processors taking over from single processors has been declared before... z This time it is for real. Why? z More instruction-level parallelism hard to find – Very complex designs needed for small gain – Thread-level parallelism appears live and well z Clock frequency scaling is slowing drastically – Too much power and heat when pushing envelope z Cannot communicate across chip fast enough – Better to design small local units with short paths z Effective use of billions of transistors – Easier to reuse a basic unit many times z Potential for very easy scaling – Just keep adding processors/cores for higher (peak) performance 5 Parallel Processing z John Hennessy, interviewed in the ACM Queue sees the following eras of computer architecture evolution: 1. Initial efforts and early designs. 1940. ENIAC, Zuse, Manchester, etc. 2. Instruction-Set Architecture. Mid-1960s. Starting with the IBM System/360 with multiple machines with the same compatible instruction set 3.
    [Show full text]
  • Sun SPARC Enterprise T5440 Servers
    Sun SPARC Enterprise® T5440 Server Just the Facts SunWIN token 526118 December 16, 2009 Version 2.3 Distribution restricted to Sun Internal and Authorized Partners Only. Not for distribution otherwise, in whole or in part T5440 Server Just the Facts Dec. 16, 2009 Sun Internal and Authorized Partner Use Only Page 1 of 133 Copyrights ©2008, 2009 Sun Microsystems, Inc. All Rights Reserved. Sun, Sun Microsystems, the Sun logo, Sun Fire, Sun SPARC Enterprise, Solaris, Java, J2EE, Sun Java, SunSpectrum, iForce, VIS, SunVTS, Sun N1, CoolThreads, Sun StorEdge, Sun Enterprise, Netra, SunSpectrum Platinum, SunSpectrum Gold, SunSpectrum Silver, and SunSpectrum Bronze are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company, Ltd. T5440 Server Just the Facts Dec. 16, 2009 Sun Internal and Authorized Partner Use Only Page 2 of 133 Revision History Version Date Comments 1.0 Oct. 13, 2008 - Initial version 1.1 Oct. 16, 2008 - Enhanced I/O Expansion Module section - Notes on release tabs of XSR-1242/XSR-1242E rack - Updated IBM 560 and HP DL580 G5 competitive information - Updates to external storage products 1.2 Nov. 18, 2008 - Number
    [Show full text]
  • Day 2, 1640: Leveraging Opensparc
    Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina – TEC-EDD OpenSPARC T1 • The T1 is a new-from-the-ground-up SPARC microprocessor implementation that conforms to the UltraSPARC architecture 2005 specification and executes the full SPARC V9 instruction set. Sun has produced two previous multicore processors: UltraSPARC IV and UltraSPARC IV+, but UltraSPARC T1 is its first microprocessor that is both multicore and multithreaded. • The processor is available with 4, 6 or 8 CPU cores, each core able to handle four threads. Thus the processor is capable of processing up to 32 threads concurrently. • Designed to lower the energy consumption of server computers, the 8-cores CPU uses typically 72 W of power at 1.2 GHz. G.Furano, L.Messina – TEC-EDD 72W … 1.2 GHz … 90nm … • Is a cutting edge design, targeted for high-end servers. • NOT FOR SPACE USE • But, let’s see which are the potential spin-in … G.Furano, L.Messina – TEC-EDD Why OPEN ? On March 21, 2006, Sun made the UltraSPARC T1 processor design available under the GNU General Public License. The published information includes: • Verilog source code of the UltraSPARC T1 design, including verification suite and simulation models • ISA specification (UltraSPARC Architecture 2005) • The Solaris 10 OS simulation images • Diagnostics tests for OpenSPARC T1 • Scripts, open source and Sun internal tools needed to simulate the design and to do synthesis of the design • Scripts and documentation to help with FPGA implementation
    [Show full text]
  • Performance Analysis of Multiple Threads/Cores Using the Ultrasparc T1
    Performance Analysis of Multiple Threads/Cores Using the UltraSPARC T1 Dimitris Kaseridis and Lizy K. John Department of Electrical and Computer Engineering The University of Texas at Austin {kaseridi, ljohn}@ece.utexas.edu Abstract- By including multiple cores on a single chip, Chip to the Server-on-Chip execution model. Under such an envi- Multiprocessors (CMP) are emerging as promising ways of utiliz- ronment, the diverged execution threads will place dissimilar ing the additional die area that is available due to process scaling demands on the shared resources of the system and therefore, at smaller semiconductor feature-size technologies. However, due to resource contention, compete against each other. Con- such an execution environment with multiple hardware context sequently, such competition could result in severe destructive threads on each individual core, that is able to execute multiple threads of the same or different workloads, significantly diverges interference between the concurrently executing threads. Such from the typical, well studied, uniprocessor model and introduces behavior is non-deterministic since the execution of each a high level of non-determinism. There are not enough studies to thread significantly depends on the behavior of the rest of the analyze the performance impact of the contention of shared re- simultaneously executing applications, especially for the case sources of a processor due to multiple executing threads. We of CMP where multiple processes run on each individual core. demonstrate the existence destructive interference on Chip Mul- So far, many researchers have recognized the need of tiprocessing (CMP) architectures using both a multiprogrammed Quality of Service (QoS) that both the software [6] and hard- and a multithreaded workload, on a real, Chip Multi-Threaded ware stack [7-10] has to provide to each individual thread in (CMT) system, the UltraSPARC T1 (Niagara).
    [Show full text]
  • Sparc Enterprise T5440 Server Architecture
    SPARC ENTERPRISE T5440 SERVER ARCHITECTURE Unleashing UltraSPARC T2 Plus Processors with Innovative Multi-core Multi-thread Technology White Paper July 2009 TABLE OF CONTENTS THE ULTRASPARC T2 PLUS PROCESSOR 0 THE WORLD'S FIRST MASSIVELY THREADED SYSTEM ON A CHIP (SOC) 0 TAKING CHIP MULTITHREADED DESIGN TO THE NEXT LEVEL 1 ULTRASPARC T2 PLUS PROCESSOR ARCHITECTURE 3 SERVER ARCHITECTURE 8 SYSTEM-LEVEL ARCHITECTURE 8 CHASSIS DESIGN INNOVATIONS 13 ENTERPRISE-CLASS MANAGEMENT AND SOFTWARE 19 SYSTEM MANAGEMENT TECHNOLOGY 19 SCALABILITY AND SUPPORT FOR INNOVATIVE MULTITHREADING TECHNOLOGY21 CONCLUSION 28 0 The UltraSPARC T2 Plus Processors Chapter 1 The UltraSPARC T2 Plus Processors The UltraSPARC T2 and UltraSPARC T2 Plus processors are the industry’s first system on a chip (SoC), supplying the most cores and threads of any general-purpose processor available, and integrating all key system functions. The World's First Massively Threaded System on a Chip (SoC) The UltraSPARC T2 Plus processor eliminates the need for expensive custom hardware and software development by integrating computing, security, and I/O on to a single chip. Binary compatible with earlier UltraSPARC processors, no other processor delivers so much performance in so little space and with such small power requirements letting organizations rapidly scale the delivery of new network services with maximum efficiency and predictability. The UltraSPARC T2 Plus processor is shown in Figure 1. Figure 1. The UltraSPARC T2 Plus processor with CoolThreads technology 1 The UltraSPARC
    [Show full text]
  • Opensparc – an Open Platform for Hardware Reliability Experimentation
    OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James C. Hoe and Babak Falsafi Carnegie Mellon University Sarita V. Adve and Josep Torrellas University of Illinois at Urbana- Champaign Subhasish Mitra Stanford University IEEE SELSE 4 - March 26, 2008 www.OpenSPARC.net Outline 1.Chip Multi-threading (CMT) 2.OpenSPARC T2 and T1 processors 3.Reliability in OpenSPARC processors 4.What is available in OpenSPARC 5.Current university research using OpenSPARC 6.Future research directions IEEE SELSE 4 – March 26, 2008 2 www.OpenSPARC.net World's First 64-bit Open Source Microprocessor OpenSPARC.net Governed by GPLv2 Complete processor architecture & implementation Register Transfer Level (RTL) Hypervisor API Verification suite and architectural models Simulation model for operating system bringup on s/w IEEE SELSE 4 – March 26, 2008 3 www.OpenSPARC.net Chip Multithreading (CMT) Instruction- Low Low Low Medium Low High level Parallelism Thread-level Parallelism High High High High High Instruction/Data Large Large Medium Large Large Working Set Data Sharing Low Medium High Medium High Medium IEEE SELSE 4 – March 26, 2008 4 www.OpenSPARC.net Memory Bottleneck Relative Performance 10000 CPU Frequency DRAM Speeds 1000 2 Years 100 Every Gap 2x -- CPU 6 10 -- 2x Every DRAM Years 1 1980 1985 1990 1995 2000 2005 Source: Sun World Wide Analyst Conference Feb. 25, 2003 IEEE SELSE 4 – March 26, 2008 5 www.OpenSPARC.net Single Threading HURRY Up to 85% Cycles Waiting for Memory
    [Show full text]
  • Ultrasparc T1 Sparc History Sun + Sparc = Ultrasparc
    ULTRASPARC T1 SUN + SPARC = ULTRASPARC THE PROCESSOR FORMERLY KNOWN AS “NIAGARA” Processor Cores Threads/Core Clock L1D L1I L2 Cache UltraSPARC IIi 1 1 550Mhz, 650Mhz 16KiB 16KiB 512KiB UltraSPARC IIIi 1 1 1.593Ghz I D 1MBa UltraSPARC III 1 1 1.05-1.2GHz 64KiB 32KiB 8MiBb UltraSPARC IV 2c 1 1.05-1.35Ghz 64KiB 32KiB 16MiBd UltraSPARC IV+ 1 2 1.5Ghz I D 2MiBe UltraSPARC T1 8 4 1.2Ghz 32KiB 16KiBf 3MiBg UltraSPARC T2h 16 (?) 8 2Ghz+ (?) ? ? ? Slide 1 Slide 3 aOn-chip bExternal, on chip tags cUltraSPARC III cores d8MiB per core e32MiB off chip L3 fI/D Cache per core g4 way banked hSecond-half 2007 This work supported by UNSW and HP through the Gelato Federation SPARC HISTORY INSTRUCTION SET ➜ Scalable Processor ARCHitecture ➜ RISC! ➜ 1985 – Sun Microsystems ➜ Berkeley RISC – 1980-1984 ➜ Load–store only through registers ➜ MIPS – 1981-1984 ➜ Fixed size instructions (32 bits) ➜ register + register Slide 2 Architecture v Implementation: Slide 4 ➜ register + 13 bit immediate ➜ SPARC Architecture ➜ Branch delay slot ➜ SPARC V7 – 1986 X Condition Codes ➜ SPARC Interntaional, Ltd – 1989 V (V9) CC and non-CC instructions ➜ SPARC V8 – 1990 V (V9) Compare on integer registers ➜ SPARC V9 – 1994 ➜ Synthesised instructions ➜ Privileged v Non-Privileged SUN + SPARC = ULTRASPARC 1 CODE EXAMPLE 2 CODE EXAMPLE V9 REGISTER WINDOWS void addr(void) { int i = 0xdeadbeef; } 00000054 <addr>: Slide 5 54: 9d e3 bf 90 save %sp, -112, %sp Slide 7 58: 03 37 ab 6f sethi %hi(0xdeadbc00), %g1 5c: 82 10 62 ef or %g1, 0x2ef, %g1 60: c2 27 bf f4 st %g1, [ %fp + -12 ] 64:
    [Show full text]
  • Exploiting Simple Analytical Models for Modeling Hardware Accelerators
    Exploiting Simple Analytical Models for Modeling Hardware Accelerators by Muhammad Shoaib Bin Altaf A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical & Computer Engineering) at the UNIVERSITY OF WISCONSIN–MADISON 2016 Date of final oral examination: 12/08/2016 The dissertation is approved by the following members of the Final Oral Committee: Mark Hill, Professor, Computer Science Mikko Lipasti, Professor, Electrical & Computer Engineering Karthikeyan Sankaralingam, Associate Professor, Computer Science Michael Swift, Associate Professor, Computer Science David Wood, Professor, Computer Science © Copyright by Muhammad Shoaib Bin Altaf 2016 All Rights Reserved i To my parents Tehseen Kausar and Sheikh Altaf Hussain, and my wife Iram Majeed for their love and support. ii acknowledgments I consider myself fortunate enough to work under the guidance of my advisor, David Wood. I would not have completed my thesis without his support. Working with David, can be a challenge in the beginning and you take time in getting settled with his unique style of mentoring. He gave me the freedom to choose a problem of my own choice but made sure that I stayed on the right path. He has a knack for communicating ideas succinctly, and expects (and forces) his students to develop the same. Thanks to David, I consider myself a better writer and researcher. Thanks David. I am also thankful to my committee members for providing useful feedback and com- ments on my work. Mark Hill encouraged and showed excitement about the modeling framework right form the beginning. His advice on making slides has helped me become a better presenter.
    [Show full text]
  • Table of Contents
    1 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Eine phatastische Reise ins Innere der Hardware Franz Haberhauer Stefan Hinker Oracle Hardware in 3D 5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. T5 and M5 PCIe Carrier Card . Supports standard low-profile PCIe cards Air Flow PCIe Retimer x16 Connector (x8 electrical) 6 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. PCIe Data Paths: Full System . Two root complexes per T5 processor . Each PCIe port on a T5 processor controls a single PCIe slot 7 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. T5-2 Block Diagram DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB T5-0 T5-1 CPU CPU TPM Host & CPU PCIe Debug CPU PCIe Debug Data Flash DC/DCs 0 1 Port DC/DCs 0 1 Port x8 x8 FPGA x8 x4 x8 x1 HDD0 DBG SAS/SATA x1 HDD0 IO Controller x4 x4 PCIe PCIe SP Module HDD0 get rid of all inside x8 x8 SAS/SATA smallSwitch boxes 0 Switch 1 FRUID HDD0 IO Controller Sideband Mgmt DRAM HDD0 USB 1.1 Keyboard Mouse Service SPI x8 USB 3.0 x8 USB 2.0 Storage Flash HDD0 Host Processor SATA DVD NAND USB 2.0 Hub USB USB 3.0 USB Internal USB Hub VGA VGA REAR IO Board USB2 USB3 VGA USB0 USB1 VGA Serial Enet Quad 10Gig Enet DB15 Mgmt Mgmt Slot 2 (8) 2 Slot (8) 3 Slot (8) 4 Slot (8) 5 Slot (8) 6 Slot (8) 7 Slot (8) 8 Slot Slot 1 (8) 1 Slot 10/100 FAN BOARD REAR IO 8 Copyright © 2013, Oracle and/or its affiliates.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]