ASIC DESIGN of the OPENSPARC T1 PROCESSOR CORE By

Total Page:16

File Type:pdf, Size:1020Kb

ASIC DESIGN of the OPENSPARC T1 PROCESSOR CORE By ASIC DESIGN OF THE OPENSPARC T1 PROCESSOR CORE By Mohamed Mahmoud Mohamed Farag A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in ELECTRONICS AND COMMUNICATIONS ENGINEERING FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2013 I ASIC DESIGN OF THE OPENSPARC T1 PROCESSOR CORE By Mohamed Mahmoud Mohamed Farag A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in ELECTRONICS AND COMMUNICATIONS ENGINEERING Under the Supervision of Prof. Dr. Serag El-Din Habib Dr. Hossam A. H. Fahmy Professor of Electronics Associate Professor Electronics and Communications Electronics and Communications Department Department Faculty of Engineering, Cairo University Faculty of Engineering, Cairo University FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2013 II ASIC DESIGN OF THE OPENSPARC T1 PROCESSOR CORE By Mohamed Mahmoud Mohamed Farag A Thesis Submitted to the Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE in ELECTRONICS AND COMMUNICATIONS ENGINEERING Approved by the Examining Committee ____________________________ Prof. Dr. El Sayed Mostafa Saad, External Examiner ____________________________ Prof. Dr. Ibrahim Mohamed Qamar, Internal Examiner ____________________________ Prof. Dr. Serag El-Din Habib, Thesis Main Advisor ____________________________ Dr. Hossam A. H. Fahmy, Thesis Advisor FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, EGYPT 2013 III Engineer’s Name: Mohamed Mahmoud Mohamed Farag Date of Birth: 29/12/1985 Nationality: Egyptian Insert photo here E-mail: [email protected] Phone: 01068823040 Address: 7 Ramzy Farag Street – Al Haram Registration Date: …./…./…….. Awarding Date: 6 / 1 / 2013 Degree: Master of Science Department: ELECTRONICS AND COMMUNICATIONS ENGINEERING Supervisors: Prof. Dr. Serag El-Din Habib Dr. Hossam A. H. Fahmy Examiners: Prof. El Sayed Mostafa Saad (External examiner) Prof. Ibrahim Mohamed Qamar (Internal examiner) Porf. Serag El-Din Habib (Thesis main advisor) Porf. Hossam A. H. Fahmy (Thesis advisor) Title of Thesis: ASIC Design Of The OpenSPARC T1 Processor Key Words: ASIC; SPARC; Layout; Processor; Design Summary: The objective of this thesis is to carry out an ASIC design of the OpenSPARC T1 processor core using the 130nm CMOS technology. Starting from the open-source RTL description of the OpenSPARC T1 processor core, several modifications like memory mapping, reducing the processor threads and suppressing the test pins were made in order to reduce the processor size to fit into a 4x4 mm2 die. The correct functionality of the modified RTL description of this processor was verified against over 500 test scripts given by SUN Inc. Next, to convert the design from the RTL form to its gate level equivalent the design was synthesized. Subsequently, the design was physically implemented using the Place and Route flow, including the normal steps like floorplanning, placement, optimization, clock tree synthesis (CTS) and routing. Finally, the design was verified by a series of verification and performance evaluation tests to guarantee its functionality and performance. The designed processor core supports two threads and runs at a 100MHz clock frequency. It occupies an area of 16 mm2, including pads. IV Acknowledgments First I thank ALLAH (God) for being to my side during all the time I spent working on this thesis, HE blessed me with the strength and guidance that I needed to complete it. I would like to give my sincere appreciation to my parents for always giving me the moral support that I needed to continue working on this thesis and always praying for me. I would like to give a very special thank you to my wife who stood to my side and withstood a lot of burden during my work on this project. I would like to thank my manager at work "Ahmed Gharieb" for his endless support and understanding. I would like also to thank my Mentor Graphics family. Special thanks to "Mohamed Ali" my previous manager for being there for me whenever I needed him. I would like to thank my suppervisors "Dr. Serag" and "Dr. Hossam" for their support, guidance and understanding that they gave me throughtout my research. Finally I acknowledge MOSIS for providing the PDK of the IBM 130nm CMOS process under the MEP account number 4960. Contents AcknowledgmentsV Table of contentsVI List of figures XII List of tablesXV List of symbols and abbreviations XVI Abstract XXV 1 Introduction1 1.1 SPARC architecture...................1 1.1.1 SPARC V8 vs V9.................2 1.1.2 SPARC-V9 Processor...............4 VII 1.1.3 SPARC-V9 Instructions.............4 1.1.4 SPARC-V9 Traps.................6 1.1.5 SPARC-V9 Data Formats............7 1.1.6 SPARC-V9 Registers...............8 1.1.7 SPARC-V9 Memory Models...........8 1.1.8 SPARC-V9 Operation.............. 11 1.2 Thesis Objective..................... 12 1.3 Thesis Map........................ 13 2 OpenSparc T1 15 2.1 OpenSPARC T1 architecture.............. 16 2.1.1 SPARC Core................... 19 2.1.2 Floating-Point Unit................ 22 2.1.3 CPU-Cache Crossbar............... 22 2.1.4 L2-Cache..................... 24 2.1.5 DRAM Controller................. 25 2.1.6 I/O Bridge.................... 26 2.1.7 J-Bus Interface.................. 26 2.1.8 Serial System Interface.............. 27 2.1.9 Electronic Fuse.................. 27 2.2 OpenSPARC T1 core................... 28 2.2.1 Instruction fetch unit (IFU)........... 30 VIII 2.2.2 Execution unit (EXU).............. 32 2.2.3 Load store unit (LSU).............. 34 2.2.4 Trap logic unit (TLU).............. 36 2.2.5 Stream processing unit (SPU).......... 38 2.2.6 Memory management unit (MMU)....... 39 2.2.7 Floating-point frontend unit (FFU)....... 41 2.3 OpenSPARC T1 design history............. 43 2.3.1 Xilinx FPGA design 1 [6]............ 43 2.3.2 Xilinx FPGA design 2 [6]............ 44 2.3.3 UltraSparc T1 (Niagara) release [6]....... 44 2.3.4 Commercial processors based on the UltraSparc architecture [1].................. 46 2.4 Proposed Design..................... 48 3 OpenSPARC T1 core Implementation 51 3.1 RTL preparation and functional verification...... 51 3.2 Synthesis.......................... 61 3.3 Design Floorplanning................... 63 3.4 Design Placement..................... 68 3.5 CTS............................ 71 3.6 Routing.......................... 73 3.7 Post Layout Verification................. 74 IX 4 Results 76 4.1 Timing results....................... 77 4.2 Physical results...................... 87 4.3 Design comparison.................... 93 4.3.1 Processor size................... 94 4.3.2 Processor speed.................. 96 5 Conclusion 98 5.1 Future work........................ 100 5.1.1 Tighten the design constraints.......... 101 5.1.2 Custom implement the IRF memory module.. 101 5.1.3 Optimize clock network.............. 101 5.1.4 Use smaller technology node........... 103 A Digital design ASIC Flow 104 A.1 RTL preparation..................... 105 A.2 RTL functionality Verification.............. 109 A.3 Logic Synthesis...................... 112 A.4 Gate Level functionality Verification.......... 115 A.5 Floorplaning........................ 116 A.6 Placement......................... 131 A.7 Clock Tree Synthesis................... 136 X A.8 Routing.......................... 142 A.9 Post layout verification.................. 145 References 148 XI List of Figures 1.1 Memory Models from Least Restrictive (RMO) to Most Restrictive (TSO) [2]................... 10 2.1 OpenSPARC T1 Processor Block Diagram [5]..... 18 2.2 SPARC Core Pipeline [5]................. 21 2.3 CCX Block Diagram [5]................. 24 2.4 SPARC Core Block Diagram [5]............. 28 2.5 Execution Unit Diagram [5]............... 33 2.6 LSU Pipeline Graph [5].................. 35 2.7 TLU Role With Respect to All Other Backlogs in a SPARC Core [5]...................... 37 2.8 MMU and TLBs Relationship [5]............ 40 2.9 Top-Level FFU Block Diagram [5]............ 42 2.10 Sun UltraSPARC T1 (Niagara 8 Core) [6]....... 46 XII 3.1 Processor blocks rough estimation [5].......... 64 3.2 Design Power Grid.................... 65 3.3 Macro placement..................... 67 3.4 Empty spaces used up by Macros............ 68 3.5 Macro Power Connection................. 69 3.6 Top blocks placement................... 70 4.1 Setup slack distribution................. 79 4.2 Hold slack distribution.................. 80 4.3 Worst setup timing path.................. 82 4.4 Worst hold timing path.................. 83 4.5 Area percentage distribution............... 87 4.6 Die area distribution.................... 88 4.7 Cell density histogram................... 89 4.8 UltraSPARC T1 processor core [8]............ 95 5.1 IRF block area highlighted................ 102 A.1 Digital ASIC Design Flow................ 105 A.2 Simulation VS Formal Verification [17]......... 111 A.3 Synthesis Stage...................... 114 A.4 Floorplan of the design [21]............... 122 A.5 Pad driven VS Core driven [22]............. 124 XIII A.6 Power grid of the design [21]............... 128 A.7 Example of macro placement [21]............ 130 A.8 Global router G-cells [21]................. 136 A.9 Tools used in design flow................. 147 XIV List of Tables 3.1 Reduction in the processor core............. 58 4.1 Setup and hold timing path groups........... 78 4.2 Worst 10 setup paths..................
Recommended publications
  • Day 2, 1640: Leveraging Opensparc
    Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina – TEC-EDD OpenSPARC T1 • The T1 is a new-from-the-ground-up SPARC microprocessor implementation that conforms to the UltraSPARC architecture 2005 specification and executes the full SPARC V9 instruction set. Sun has produced two previous multicore processors: UltraSPARC IV and UltraSPARC IV+, but UltraSPARC T1 is its first microprocessor that is both multicore and multithreaded. • The processor is available with 4, 6 or 8 CPU cores, each core able to handle four threads. Thus the processor is capable of processing up to 32 threads concurrently. • Designed to lower the energy consumption of server computers, the 8-cores CPU uses typically 72 W of power at 1.2 GHz. G.Furano, L.Messina – TEC-EDD 72W … 1.2 GHz … 90nm … • Is a cutting edge design, targeted for high-end servers. • NOT FOR SPACE USE • But, let’s see which are the potential spin-in … G.Furano, L.Messina – TEC-EDD Why OPEN ? On March 21, 2006, Sun made the UltraSPARC T1 processor design available under the GNU General Public License. The published information includes: • Verilog source code of the UltraSPARC T1 design, including verification suite and simulation models • ISA specification (UltraSPARC Architecture 2005) • The Solaris 10 OS simulation images • Diagnostics tests for OpenSPARC T1 • Scripts, open source and Sun internal tools needed to simulate the design and to do synthesis of the design • Scripts and documentation to help with FPGA implementation
    [Show full text]
  • Sparc Enterprise T5440 Server Architecture
    SPARC ENTERPRISE T5440 SERVER ARCHITECTURE Unleashing UltraSPARC T2 Plus Processors with Innovative Multi-core Multi-thread Technology White Paper July 2009 TABLE OF CONTENTS THE ULTRASPARC T2 PLUS PROCESSOR 0 THE WORLD'S FIRST MASSIVELY THREADED SYSTEM ON A CHIP (SOC) 0 TAKING CHIP MULTITHREADED DESIGN TO THE NEXT LEVEL 1 ULTRASPARC T2 PLUS PROCESSOR ARCHITECTURE 3 SERVER ARCHITECTURE 8 SYSTEM-LEVEL ARCHITECTURE 8 CHASSIS DESIGN INNOVATIONS 13 ENTERPRISE-CLASS MANAGEMENT AND SOFTWARE 19 SYSTEM MANAGEMENT TECHNOLOGY 19 SCALABILITY AND SUPPORT FOR INNOVATIVE MULTITHREADING TECHNOLOGY21 CONCLUSION 28 0 The UltraSPARC T2 Plus Processors Chapter 1 The UltraSPARC T2 Plus Processors The UltraSPARC T2 and UltraSPARC T2 Plus processors are the industry’s first system on a chip (SoC), supplying the most cores and threads of any general-purpose processor available, and integrating all key system functions. The World's First Massively Threaded System on a Chip (SoC) The UltraSPARC T2 Plus processor eliminates the need for expensive custom hardware and software development by integrating computing, security, and I/O on to a single chip. Binary compatible with earlier UltraSPARC processors, no other processor delivers so much performance in so little space and with such small power requirements letting organizations rapidly scale the delivery of new network services with maximum efficiency and predictability. The UltraSPARC T2 Plus processor is shown in Figure 1. Figure 1. The UltraSPARC T2 Plus processor with CoolThreads technology 1 The UltraSPARC
    [Show full text]
  • Implementing Elliptic Curve Cryptography (A Narrow Survey)
    Implementing Elliptic Curve Cryptography (a narrow survey) Institute of Computing – UNICAMP Campinas, Brazil April 2005 Darrel Hankerson Auburn University Implementing ECC – 1/110 Overview Objective: sample selected topics of practical interest. Talk will favor: I Software solutions on general-purpose processors rather than dedicated hardware. I Techniques with broad applicability. I Methods targeted to standardized curves. Goals: I Present proposed methods in context. I Limit coverage of technical details (but “implementing” necessarily involves platform considerations). Implementing ECC – 2/110 Focus: higher-performance processors “Higher-performance” includes processors commonly associated with workstations, but also found in surprisingly small portable devices. Sun and IBM workstations RIM pager circa 1999 SPARC or Intel x86 (Pentium) Intel x86 (custom 386) 256 MB – 8 GB 2 MB “disk”, 304 KB RAM 0.5 GHz – 3 GHz 10 MHz, single AA battery heats entire building fits in shirt pocket Implementing ECC – 3/110 Optimizing ECC Elliptic Curve Digital Signature Algorithm (ECDSA) Random number Big number and Curve generation modular arithmetic arithmetic Fq field arithmetic General categories of optimization efforts: 1. Field-level optimizations. 2. Curve-level optimizations. 3. Protocol-level optimizations. Constraints: security requirements, hardware limitations, bandwidth, interoperability, and patents. Implementing ECC – 4/110 Optimizing ECC... 1. Field-level optimizations. I Choose fields with fast multiplication and inversion. I Use special-purpose hardware (cryptographic coprocessors, DSP, floating-point, SIMD). 2. Curve-level optimizations. I Reduce the number of point additions (windowing). I Reduce the number of field inversions (projective coords). I Replace point doubles (endomorphism methods). 3. Protocol-level optimizations. I Develop efficient protocols. I Choose methods and protocols that favor your computations or hardware.
    [Show full text]
  • Opensparc – an Open Platform for Hardware Reliability Experimentation
    OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James C. Hoe and Babak Falsafi Carnegie Mellon University Sarita V. Adve and Josep Torrellas University of Illinois at Urbana- Champaign Subhasish Mitra Stanford University IEEE SELSE 4 - March 26, 2008 www.OpenSPARC.net Outline 1.Chip Multi-threading (CMT) 2.OpenSPARC T2 and T1 processors 3.Reliability in OpenSPARC processors 4.What is available in OpenSPARC 5.Current university research using OpenSPARC 6.Future research directions IEEE SELSE 4 – March 26, 2008 2 www.OpenSPARC.net World's First 64-bit Open Source Microprocessor OpenSPARC.net Governed by GPLv2 Complete processor architecture & implementation Register Transfer Level (RTL) Hypervisor API Verification suite and architectural models Simulation model for operating system bringup on s/w IEEE SELSE 4 – March 26, 2008 3 www.OpenSPARC.net Chip Multithreading (CMT) Instruction- Low Low Low Medium Low High level Parallelism Thread-level Parallelism High High High High High Instruction/Data Large Large Medium Large Large Working Set Data Sharing Low Medium High Medium High Medium IEEE SELSE 4 – March 26, 2008 4 www.OpenSPARC.net Memory Bottleneck Relative Performance 10000 CPU Frequency DRAM Speeds 1000 2 Years 100 Every Gap 2x -- CPU 6 10 -- 2x Every DRAM Years 1 1980 1985 1990 1995 2000 2005 Source: Sun World Wide Analyst Conference Feb. 25, 2003 IEEE SELSE 4 – March 26, 2008 5 www.OpenSPARC.net Single Threading HURRY Up to 85% Cycles Waiting for Memory
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • 2 the VIS Instruction Set Pdist Instruction
    The VISTM Instruction Set Version 1.0 June 2002 A White Paper This document provides an overview of the Visual Instruction Set. ® 4150 Network Circle Santa Clara, CA 95054 USA www.sun.com Copyright © 2002 Sun Microsystems, Inc. All Rights reserved. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED"AS IS" WITHOUT ANY EXPRESS REPRESENTATIONS OR WARRANTIES. IN ADDITION, SUN MICROSYSTEMS, INC. DISCLAIMS ALL IMPLIED REPRESENTATIONS AND WARRANTIES, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT OF THIRD PARTY INTELLECTUAL PROPERTY RIGHTS. This document contains proprietary information of Sun Microsystems, Inc. or under license from third parties. No part of this document may be reproduced in any form or by any means or transferred to any third party without the prior written consent of Sun Microsystems, Inc. Sun, Sun Microsystems, the Sun Logo, VIS, Java, and mediaLib are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through x/Open Company Ltd. The information contained in this document is not designed or intended for use in on-line control of aircraft, air traffic, aircraft navigation or aircraft communications; or in the design, construction, operation or maintenance of any nuclear facility. Sun disclaims any express or implied warranty of fitness for such uses.
    [Show full text]
  • SIMD-Swift: Improving Performance of Swift Fault Detection
    Master thesis SIMD-Swift: Improving Performance of Swift Fault Detection Oleksii Oleksenko 15. November 2015 Technische Universität Dresden Department of Computer Science Systems Engineering Group Supervisor: Prof. Christof Fetzer Adviser: MSc. Dmitrii Kuvaiskii Declaration Herewith I declare that this submission is my own work and that, to the best of my knowledge, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher education, except where due acknowledgment has been made in the text. Dresden, 9. November 2015 Oleksii Oleksenko Abstract The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program’s operations on two copies of the same data.
    [Show full text]
  • Marrying Prediction and Segmentation to Identify Sales Leads
    Marrying Prediction and Segmentation to Identify Sales Leads Jim Porzak, The Generations Network Alex Kriney, Sun Microsystems February 19, 2009 1 Business Challenge ● In 2005 Sun Microsystems began the process of open-sourcing and making its software stack freely available ● Many millions of downloads per month* ● Heterogeneous registration practices ● Multichannel (web, email, phone, in- person) and multitouch marketing strategy ● 15%+ response rates; low contact-to-lead ratio ● Challenge: Identify the sales leads * Solaris, MySQL, GlassFish, NetBeans, OpenOffice, OpenSolaris, xVM Virtualbox, JavaFX, Java, etc. Sun Confidential: Internal Only 2 The Project ● Focus on Solaris 10 (x86 version) ● Data sources: download registration, email subscriptions, other demographics databases, product x purchases ● What are the characteristics of someone with a propensity to purchase? ● How can we become more efficient at identifying potential leads? ● Apply learnings to marketing strategy for all products and track results Sun Confidential: Internal Only 3 Predictive Models Employed ● Building a purchase model using random forests ● Creating prospect persona segmentation using cluster analysis Sun Confidential: Internal Only 4 Random Forest Purchase Model Sun Confidential: Internal Only 5 Random Forests • Developed by Leo Breiman of Cal Berkeley, one of the four developers of CART, and Adele Cutler, now at Utah State University. • Accuracy comparable with modern machine learning methods. (SVMs, neural nets, Adaboost) • Built in cross-validation using “Out of Bag” data. (Prediction error estimate is a by product) • Large number candidate predictors are automatically selected. (Resistant to over training) • Continuous and/or categorical predicting & response variables. (Easy to set up.) • Can be run in unsupervised for cluster discovery. (Useful for market segmentation, etc.) • Free Prediction and Scoring engines run on PC’s, Unix/Linux & Mac’s.
    [Show full text]
  • Industry Perspective on Chip Multi-Threading: Bridging the Gap with Academia Using Opensparc Dwayne Lee Opensparc Community Manager
    Industry perspective on Chip Multi-Threading: Bridging the gap with academia using OpenSPARC Dwayne Lee OpenSPARC Community Manager Shrenik Mehta Senior Director, Frontend Technologies & OpenSPARC Program Microelectronics Sun Microsystems, Inc. www.opensparc.net Agenda 1.Chip Multi-Threading (CMT) Era 2.Microarchitecture of OpenSPARC T1 3.OpenSPARC T1 Program 4.OpenSPARC in Academia www.opensparc.net Workshop on Computer Architecture Education, June 9 2007, San Diego 2 Making the Right Waves Chip Multi-threading (CoolThreadsTM) Symmetrical Multi-processing (SMP) e c n Reduced Instruction Set a m Computing (RISC) r o f r e P / e c i r P d e v o r p m I 1980 1990 2000 2010 www.opensparc.net Workshop on Computer Architecture Education, June 9 2007, San Diego 3 The Processor Growth Single Chip Multiprocessors Multiple cores Integer Unit Data Center >350M xstors <20K gates 1982 1990 1998 2005 RISC SMP RAS SWAP 32 bit 64 bit CMT VIS 1.0 Hypervisor Source: Sun Network San Francisco, NC03Q3, Sep. 17, 2003 The Big Bang Is Happening— Four Converging Trends Network Computing Is Moore’s Law Thread Rich A fraction of the die can Web services, JavaTM already build a good applications, database processor core; how am I transactions, ERP . going to use a billion transistors? Worsening Growing Complexity Memory Latency of Processor Design It’s approaching 1000s Forcing a rethinking of of CPU cycles! Friend or foe? processor architecture – modularity, less is more, time-to-market www.opensparc.net Workshop on Computer Architecture Education, June 9 2007,
    [Show full text]
  • Ultrasparc T1™ Supplement to the Ultrasparc Architecture 2005
    UltraSPARC T1™ Supplement to the UltraSPARC Architecture 2005 Draft D2.1, 14 May 2007 Privilege Levels: Hyperprivileged, Privileged, and Nonprivileged Distribution: Public Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 Part No.No: 8xx-xxxx-xx819-3404-05 ReleaseRevision: 1.0, Draft 2002 D2.1, 14 May 2007 ii UltraSPARC T1 Supplement • Draft D2.1, 14 May 2007 Copyright 2002-2006 Sun Microsystems, Inc., 4150 Network Circle • Santa Clara, CA 950540 USA. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. For Netscape Communicator™, the following notice applies: Copyright 1995 Netscape Communications Corporation. All rights reserved. Sun, Sun Microsystems, the Sun logo, Solaris, and VIS are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc.
    [Show full text]
  • Idisa+: a Portable Model for High Performance Simd Programming
    IDISA+: A PORTABLE MODEL FOR HIGH PERFORMANCE SIMD PROGRAMMING by Hua Huang B.Eng., Beijing University of Posts and Telecommunications, 2009 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Computing Science Faculty of Applied Science c Hua Huang 2011 SIMON FRASER UNIVERSITY Fall 2011 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for Fair Dealing. Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. APPROVAL Name: Hua Huang Degree: Master of Science Title of Thesis: IDISA+: A Portable Model for High Performance SIMD Pro- gramming Examining Committee: Dr. Kay C. Wiese Associate Professor, Computing Science Simon Fraser University Chair Dr. Robert D. Cameron Professor, Computing Science Simon Fraser University Senior Supervisor Dr. Thomas C. Shermer Professor, Computing Science Simon Fraser University Supervisor Dr. Arrvindh Shriraman Assistant Professor, Computing Science Simon Fraser University SFU Examiner Date Approved: ii Declaration of Partial Copyright Licence The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.
    [Show full text]
  • HPC-Event-Return-Of-Vector-20160801
    The OrionX Constellation: Data Center High Performance Computing The Return of Vector Processing Shahin Khan Digital transformation, cloud computing and application elasticity, open source software, and new app areas like AI and IoT are driving a renaissance in system architecture. This is an area of interest and research for us at OrionX. Vector processing was an interesting topic to re-emerge recently. First during the International Supercomputing Conference (ISC), and again in various announcements in implicit and explicit ways. On the Monday of the ISC conference, a new leader on the TOP500 list was announced. The Sunway TaihuLight system uses a new processor architecture that is Single-Instruction-Multiple-Data (SIMD) with a pipeline that can do eight 64-bit floating-point calculations per cycle. 10,649,600 computing cores comprising 40,960 nodes produce 93 petaflop/s running the LINPACK benchmark. Later that day, at the “ISC 2016 Vendor Showdown”, NEC had a presentation about its project “Aurora”. This project aims to combine x86 clusters and NEC’s vector processors in the same high bandwidth system. NEC has a long history of advanced vector processors with its SX architecture. Among many achievements, it built the Earth Simulator, a vector-parallel system that was #1 on the TOP500 list from 2002 to 2004. At its debut, it had a substantial (nearly 5x) lead over the previous #1. Vector Processing and Parallelism Vector processing is a time-honored system architecture that started the supercomputing market, starting with the CDC STAR-100 system (STrings and ARrays, performing at 100 million floating point operations per second), and then led by the legendary Seymour Cray and his superb team.
    [Show full text]