Debugging Multicore & Shared- Memory Embedded Systems
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Oracle® Developer Studio 12.6
® Oracle Developer Studio 12.6: C++ User's Guide Part No: E77789 July 2017 Oracle Developer Studio 12.6: C++ User's Guide Part No: E77789 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. -
Dynamic Helper Threaded Prefetching on the Sun Ultrasparc® CMP Processor
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC® CMP Processor Jiwei Lu, Abhinav Das, Wei-Chung Hsu Khoa Nguyen, Santosh G. Abraham Department of Computer Science and Engineering Scalable Systems Group University of Minnesota, Twin Cities Sun Microsystems Inc. {jiwei,adas,hsu}@cs.umn.edu {khoa.nguyen,santosh.abraham}@sun.com Abstract [26], [28], the processor checkpoints the architectural state and continues speculative execution that Data prefetching via helper threading has been prefetches subsequent misses in the shadow of the extensively investigated on Simultaneous Multi- initial triggering missing load. When the initial load Threading (SMT) or Virtual Multi-Threading (VMT) arrives, the processor resumes execution from the architectures. Although reportedly large cache checkpointed state. In software pre-execution (also latency can be hidden by helper threads at runtime, referred to as helper threads or software scouting) [2], most techniques rely on hardware support to reduce [4], [7], [10], [14], [24], [29], [35], a distilled version context switch overhead between the main thread and of the forward slice starting from the missing load is helper thread as well as rely on static profile feedback executed, minimizing the utilization of execution to construct the help thread code. This paper develops resources. Helper threads utilizing run-time a new solution by exploiting helper threaded pre- compilation techniques may also be effectively fetching through dynamic optimization on the latest deployed on processors that do not have the necessary UltraSPARC Chip-Multiprocessing (CMP) processor. hardware support for hardware scouting (such as Our experiments show that by utilizing the otherwise checkpointing and resuming regular execution). idle processor core, a single user-level helper thread Initial research on software helper threads is sufficient to improve the runtime performance of the developed the underlying run-time compiler main thread without triggering multiple thread slices. -
Chapter 2 Java Processor Architectural
INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. ProQuest Information and Leaming 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 UMI" The JAFARDD Processor: A Java Architecture Based on a Folding Algorithm, with Reservation Stations, Dynamic Translation, and Dual Processing by Mohamed Watheq AH Kamel El-Kharashi B. Sc., Ain Shams University, 1992 M. Sc., Ain Shams University, 1996 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of D o c t o r o f P h il o s o p h y in the Department of Electrical and Computer Engineering We accept this dissertation as conforming to the required standard Dr. F. Gebali, Supervisor (Department of Electrical and Computer Engineering) Dr. -
Day 2, 1640: Leveraging Opensparc
Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina – TEC-EDD OpenSPARC T1 • The T1 is a new-from-the-ground-up SPARC microprocessor implementation that conforms to the UltraSPARC architecture 2005 specification and executes the full SPARC V9 instruction set. Sun has produced two previous multicore processors: UltraSPARC IV and UltraSPARC IV+, but UltraSPARC T1 is its first microprocessor that is both multicore and multithreaded. • The processor is available with 4, 6 or 8 CPU cores, each core able to handle four threads. Thus the processor is capable of processing up to 32 threads concurrently. • Designed to lower the energy consumption of server computers, the 8-cores CPU uses typically 72 W of power at 1.2 GHz. G.Furano, L.Messina – TEC-EDD 72W … 1.2 GHz … 90nm … • Is a cutting edge design, targeted for high-end servers. • NOT FOR SPACE USE • But, let’s see which are the potential spin-in … G.Furano, L.Messina – TEC-EDD Why OPEN ? On March 21, 2006, Sun made the UltraSPARC T1 processor design available under the GNU General Public License. The published information includes: • Verilog source code of the UltraSPARC T1 design, including verification suite and simulation models • ISA specification (UltraSPARC Architecture 2005) • The Solaris 10 OS simulation images • Diagnostics tests for OpenSPARC T1 • Scripts, open source and Sun internal tools needed to simulate the design and to do synthesis of the design • Scripts and documentation to help with FPGA implementation -
Performance Analysis of Multiple Threads/Cores Using the Ultrasparc T1
Performance Analysis of Multiple Threads/Cores Using the UltraSPARC T1 Dimitris Kaseridis and Lizy K. John Department of Electrical and Computer Engineering The University of Texas at Austin {kaseridi, ljohn}@ece.utexas.edu Abstract- By including multiple cores on a single chip, Chip to the Server-on-Chip execution model. Under such an envi- Multiprocessors (CMP) are emerging as promising ways of utiliz- ronment, the diverged execution threads will place dissimilar ing the additional die area that is available due to process scaling demands on the shared resources of the system and therefore, at smaller semiconductor feature-size technologies. However, due to resource contention, compete against each other. Con- such an execution environment with multiple hardware context sequently, such competition could result in severe destructive threads on each individual core, that is able to execute multiple threads of the same or different workloads, significantly diverges interference between the concurrently executing threads. Such from the typical, well studied, uniprocessor model and introduces behavior is non-deterministic since the execution of each a high level of non-determinism. There are not enough studies to thread significantly depends on the behavior of the rest of the analyze the performance impact of the contention of shared re- simultaneously executing applications, especially for the case sources of a processor due to multiple executing threads. We of CMP where multiple processes run on each individual core. demonstrate the existence destructive interference on Chip Mul- So far, many researchers have recognized the need of tiprocessing (CMP) architectures using both a multiprogrammed Quality of Service (QoS) that both the software [6] and hard- and a multithreaded workload, on a real, Chip Multi-Threaded ware stack [7-10] has to provide to each individual thread in (CMT) system, the UltraSPARC T1 (Niagara). -
Ultrasparc-III Ultrasparc-III Vs Intel IA-64
UltraSparc-III UltraSparc-III vs Intel IA-64 vs • Introduction Intel IA-64 • Framework Definition Maria Celeste Marques Pinto • Architecture Comparition Departamento de Informática, Universidade do Minho • Future Trends 4710 - 057 Braga, Portugal [email protected] • Conclusions ICCA’03 ICCA’03 Introduction Framework Definition • UltraSparc-III (US-III) is the third generation from the • Reliability UltraSPARC family of Sun • Instruction level Parallelism (ILP) • Is a RISC processor and uses the 64-bit SPARC-V9 architecture – instructions per cycle • IA-64 is Intel’s extension into a 64-bit architecture • Branch Handling • IA-64 processor is based on a concept known as EPIC (Explicitly – Techniques: Parallel Instruction Computing) • branch delay slots • predication – Strategies: • static •dynamic ICCA’03 ICCA’03 Framework Definition Framework Definition • Memory Hierarchy • Pipeline – main memory and cache memory – increase the speed of CPU processing – cache levels location – several stages that performs part of the work necessary to execute an instruction – cache organization: • fully associative - every entry has a slot in the "cache directory" to indicate • Instruction Set (IS) where it came from in memory – is the hardware "language" in which the software tells the processor what to do • one-way set associative - only a single directory entry be searched – can be divided into four basic types of operations such as arithmetic, logical, • two-way set associative - two entries per slot to be searched (and is extended to program-control -
Sun Blade 1000 and 2000 Workstations
Sun BladeTM 1000 and 2000 Workstations Just the Facts Copyrights 2002 Sun Microsystems, Inc. All Rights Reserved. Sun, Sun Microsystems, the Sun logo, Sun Blade, PGX, Solaris, Ultra, Sun Enterprise, Starfire, SunPCi, Forte, VIS, XGL, XIL, Java, Java 3D, SunVideo, SunVideo Plus, Sun StorEdge, SunMicrophone, SunVTS, Solstice, Solstice AdminTools, Solstice Enterprise Agents, ShowMe, ShowMe How, ShowMe TV, Sun Workstation, StarOffice, iPlanet, Solaris Resource Manager, Java 2D, OpenWindows, SunCD, Sun Quad FastEthernet, SunFDDI, SunATM, SunCamera, SunForum, PGX32, SunSpectrum, SunSpectrum Platinum, SunSpectrum Gold, SunSpectrum Silver, SunSpectrum Bronze, SunSolve, SunSolve EarlyNotifier, and SunClient are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trademark in the United States and in other countries, exclusively licensed through X/Open Company, Ltd. FireWire is a registered trademark of Apple Computer, Inc., used under license. OpenGL is a trademark of Silicon Graphics, Inc., which may be registered in certain jurisdictions. Netscape is a trademark of Netscape Communications Corporation. PostScript and Display PostScript are trademarks of Adobe Systems, Inc., which may be registered in -
Sparc Enterprise T5440 Server Architecture
SPARC ENTERPRISE T5440 SERVER ARCHITECTURE Unleashing UltraSPARC T2 Plus Processors with Innovative Multi-core Multi-thread Technology White Paper July 2009 TABLE OF CONTENTS THE ULTRASPARC T2 PLUS PROCESSOR 0 THE WORLD'S FIRST MASSIVELY THREADED SYSTEM ON A CHIP (SOC) 0 TAKING CHIP MULTITHREADED DESIGN TO THE NEXT LEVEL 1 ULTRASPARC T2 PLUS PROCESSOR ARCHITECTURE 3 SERVER ARCHITECTURE 8 SYSTEM-LEVEL ARCHITECTURE 8 CHASSIS DESIGN INNOVATIONS 13 ENTERPRISE-CLASS MANAGEMENT AND SOFTWARE 19 SYSTEM MANAGEMENT TECHNOLOGY 19 SCALABILITY AND SUPPORT FOR INNOVATIVE MULTITHREADING TECHNOLOGY21 CONCLUSION 28 0 The UltraSPARC T2 Plus Processors Chapter 1 The UltraSPARC T2 Plus Processors The UltraSPARC T2 and UltraSPARC T2 Plus processors are the industry’s first system on a chip (SoC), supplying the most cores and threads of any general-purpose processor available, and integrating all key system functions. The World's First Massively Threaded System on a Chip (SoC) The UltraSPARC T2 Plus processor eliminates the need for expensive custom hardware and software development by integrating computing, security, and I/O on to a single chip. Binary compatible with earlier UltraSPARC processors, no other processor delivers so much performance in so little space and with such small power requirements letting organizations rapidly scale the delivery of new network services with maximum efficiency and predictability. The UltraSPARC T2 Plus processor is shown in Figure 1. Figure 1. The UltraSPARC T2 Plus processor with CoolThreads technology 1 The UltraSPARC -
Opensparc – an Open Platform for Hardware Reliability Experimentation
OpenSPARC – An Open Platform for Hardware Reliability Experimentation Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James C. Hoe and Babak Falsafi Carnegie Mellon University Sarita V. Adve and Josep Torrellas University of Illinois at Urbana- Champaign Subhasish Mitra Stanford University IEEE SELSE 4 - March 26, 2008 www.OpenSPARC.net Outline 1.Chip Multi-threading (CMT) 2.OpenSPARC T2 and T1 processors 3.Reliability in OpenSPARC processors 4.What is available in OpenSPARC 5.Current university research using OpenSPARC 6.Future research directions IEEE SELSE 4 – March 26, 2008 2 www.OpenSPARC.net World's First 64-bit Open Source Microprocessor OpenSPARC.net Governed by GPLv2 Complete processor architecture & implementation Register Transfer Level (RTL) Hypervisor API Verification suite and architectural models Simulation model for operating system bringup on s/w IEEE SELSE 4 – March 26, 2008 3 www.OpenSPARC.net Chip Multithreading (CMT) Instruction- Low Low Low Medium Low High level Parallelism Thread-level Parallelism High High High High High Instruction/Data Large Large Medium Large Large Working Set Data Sharing Low Medium High Medium High Medium IEEE SELSE 4 – March 26, 2008 4 www.OpenSPARC.net Memory Bottleneck Relative Performance 10000 CPU Frequency DRAM Speeds 1000 2 Years 100 Every Gap 2x -- CPU 6 10 -- 2x Every DRAM Years 1 1980 1985 1990 1995 2000 2005 Source: Sun World Wide Analyst Conference Feb. 25, 2003 IEEE SELSE 4 – March 26, 2008 5 www.OpenSPARC.net Single Threading HURRY Up to 85% Cycles Waiting for Memory -
Ultrasparc T1 Sparc History Sun + Sparc = Ultrasparc
ULTRASPARC T1 SUN + SPARC = ULTRASPARC THE PROCESSOR FORMERLY KNOWN AS “NIAGARA” Processor Cores Threads/Core Clock L1D L1I L2 Cache UltraSPARC IIi 1 1 550Mhz, 650Mhz 16KiB 16KiB 512KiB UltraSPARC IIIi 1 1 1.593Ghz I D 1MBa UltraSPARC III 1 1 1.05-1.2GHz 64KiB 32KiB 8MiBb UltraSPARC IV 2c 1 1.05-1.35Ghz 64KiB 32KiB 16MiBd UltraSPARC IV+ 1 2 1.5Ghz I D 2MiBe UltraSPARC T1 8 4 1.2Ghz 32KiB 16KiBf 3MiBg UltraSPARC T2h 16 (?) 8 2Ghz+ (?) ? ? ? Slide 1 Slide 3 aOn-chip bExternal, on chip tags cUltraSPARC III cores d8MiB per core e32MiB off chip L3 fI/D Cache per core g4 way banked hSecond-half 2007 This work supported by UNSW and HP through the Gelato Federation SPARC HISTORY INSTRUCTION SET ➜ Scalable Processor ARCHitecture ➜ RISC! ➜ 1985 – Sun Microsystems ➜ Berkeley RISC – 1980-1984 ➜ Load–store only through registers ➜ MIPS – 1981-1984 ➜ Fixed size instructions (32 bits) ➜ register + register Slide 2 Architecture v Implementation: Slide 4 ➜ register + 13 bit immediate ➜ SPARC Architecture ➜ Branch delay slot ➜ SPARC V7 – 1986 X Condition Codes ➜ SPARC Interntaional, Ltd – 1989 V (V9) CC and non-CC instructions ➜ SPARC V8 – 1990 V (V9) Compare on integer registers ➜ SPARC V9 – 1994 ➜ Synthesised instructions ➜ Privileged v Non-Privileged SUN + SPARC = ULTRASPARC 1 CODE EXAMPLE 2 CODE EXAMPLE V9 REGISTER WINDOWS void addr(void) { int i = 0xdeadbeef; } 00000054 <addr>: Slide 5 54: 9d e3 bf 90 save %sp, -112, %sp Slide 7 58: 03 37 ab 6f sethi %hi(0xdeadbc00), %g1 5c: 82 10 62 ef or %g1, 0x2ef, %g1 60: c2 27 bf f4 st %g1, [ %fp + -12 ] 64: -
Table of Contents
1 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Eine phatastische Reise ins Innere der Hardware Franz Haberhauer Stefan Hinker Oracle Hardware in 3D 5 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. T5 and M5 PCIe Carrier Card . Supports standard low-profile PCIe cards Air Flow PCIe Retimer x16 Connector (x8 electrical) 6 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. PCIe Data Paths: Full System . Two root complexes per T5 processor . Each PCIe port on a T5 processor controls a single PCIe slot 7 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. T5-2 Block Diagram DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM DIMM BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB BoB T5-0 T5-1 CPU CPU TPM Host & CPU PCIe Debug CPU PCIe Debug Data Flash DC/DCs 0 1 Port DC/DCs 0 1 Port x8 x8 FPGA x8 x4 x8 x1 HDD0 DBG SAS/SATA x1 HDD0 IO Controller x4 x4 PCIe PCIe SP Module HDD0 get rid of all inside x8 x8 SAS/SATA smallSwitch boxes 0 Switch 1 FRUID HDD0 IO Controller Sideband Mgmt DRAM HDD0 USB 1.1 Keyboard Mouse Service SPI x8 USB 3.0 x8 USB 2.0 Storage Flash HDD0 Host Processor SATA DVD NAND USB 2.0 Hub USB USB 3.0 USB Internal USB Hub VGA VGA REAR IO Board USB2 USB3 VGA USB0 USB1 VGA Serial Enet Quad 10Gig Enet DB15 Mgmt Mgmt Slot 2 (8) 2 Slot (8) 3 Slot (8) 4 Slot (8) 5 Slot (8) 6 Slot (8) 7 Slot (8) 8 Slot Slot 1 (8) 1 Slot 10/100 FAN BOARD REAR IO 8 Copyright © 2013, Oracle and/or its affiliates. -
Computer Architectures an Overview
Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.