System-On-Chip Design and Test with Embedded Debug Capabilities

Total Page:16

File Type:pdf, Size:1020Kb

System-On-Chip Design and Test with Embedded Debug Capabilities University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2006 System-on-Chip Design and Test with Embedded Debug Capabilities Tushti Marwah University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Electrical and Computer Engineering Commons Recommended Citation Marwah, Tushti, "System-on-Chip Design and Test with Embedded Debug Capabilities. " Master's Thesis, University of Tennessee, 2006. https://trace.tennessee.edu/utk_gradthes/1734 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Tushti Marwah entitled "System-on-Chip Design and Test with Embedded Debug Capabilities." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Electrical Engineering. Donald W. Bouldin, Major Professor We have read this thesis and recommend its acceptance: Syed Islam, Mohammed Ferdjallah Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Tushti Marwah entitled “System-on- Chip Design and Test with Embedded Debug Capabilities.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirement for the degree of Master of Science, with a major in Electrical Engineering. __Donald W. Bouldin______ Major Professor We have read this thesis and recommend its acceptance: Syed Islam Mohammed Ferdjallah Accepted for the Council: _____Anne Mayhew__________ Vice Chancellor and Dean of Graduate Studies (Original signatures are on file with official student records.) System-on-Chip Design and Test with Embedded Debug Capabilities A Thesis Presented for the Master of Science Degree The University of Tennessee, Knoxville Tushti Marwah August 2006 Acknowledgement First of all, I would like to thank my thesis advisor, Dr. Don Bouldin for his guidance, advice and support throughout my research. It was he who suggested me to pursue my research on the concept of using reconfigurable logic for debugging an SoC and I have greatly enjoyed my work. He was a constant source of encouragement for me throughout my research. I am also very grateful to Dr.Syed K.Islam and Dr.M. Ferdjallah for serving on my thesis committee. I would like to thank Wei Jiang, for his teamwork in this project and instrumentation of the LEON design with the pre-silicon tools that gave me a platform to start from. I would also like to thank DAFCA Inc., where I interned from June 2005-August 2005 for providing me with the pre and post-silicon tools and teaching me their usage. I am thankful to Paul Bradley, Rich Lundblad, Tat Ng and Daniel Hoggar at DAFCA for guiding me during the timing analysis and layout generation of the design. Last but not the least I would like to thank my parents, Dr. Narinder Singh and Mrs. Kamla Marwah and my brother Joban for their unrelenting support and encouragement throughout my education. ii Abstract In this project, I started with a System-on-Chip platform with embedded test structures. The baseline platform consisted of a Leon2 CPU, AMBA on-chip bus, and an Advanced Encryption Standard decryption module. The basic objective of this thesis was to use the embedded reconfigurable logic blocks for post-silicon debug and verification. The System-on-Chip platform was designed at the register transistor level and implemented in a 180-nm IBM process. Test logic instrumentation was done with DAFCA (Design Automation for Flexible Chip Architecture) Inc. pre-silicon tools. The design was then synthesized using the Synopsys Design Compiler and placed and routed using Cadence SOC Encounter. Total transistor count is about 3 million, including 1400K transistors for the debug module serving as on chip logic analyzer. Core size of the design is about 4.8mm x 4.8mm and the system is working at 151MHz. Design verification was done with Cadence NCSim. The controllability and observability of internal signals of the design is greatly increased with the help of pre-silicon tools which helps locate bugs and later fix them with the help of post-silicon tools. This helps prevent re-spins on several occasions thus saving millions of dollars. Post-silicon tools have been used to program assertions and triggers and inject numerous personalities into the reconfigurable fabric which has greatly increased the versatility of the circuit. iii Table of Contents Chapter 1 Overview ............................................................................................ 1 1.1 System-on-Chip (SoC).............................................................................. 1 1.2 Design Testing and Verification .............................................................. 1 1.3 Motivation .................................................................................................. 2 1.4 Thesis Goals.............................................................................................. 3 1.5 Thesis Outline ........................................................................................... 3 Chapter 2 Background ....................................................................................... 4 2.1 Challenges in System-on-Chip Design ................................................... 4 2.2 Types of Testing and their Problems ...................................................... 4 2.3 The DAFCA Solution................................................................................. 5 2.4 Debug Methods Supported by DAFCA.................................................... 6 Chapter 3 Volunteer SoC ................................................................................... 9 3.1 Overview .................................................................................................... 9 3.2 LEON CPU ................................................................................................. 9 3.3 AMBA Buses ........................................................................................... 11 3.4 AES Module ............................................................................................. 13 3.5 Artisan RAM ............................................................................................ 13 Chapter 4 Volunteer SoC Debug using DAFCA Tools................................... 16 4.1 Introduction to the DAFCA Tools .......................................................... 16 4.2 Pre - silicon Tools ................................................................................... 16 4.3 Instrumentation of the Volunteer SoC Using Pre-Silicon Tools ......... 19 4.4 Post-Silicon Tools................................................................................... 20 4.4.1 Personality Editor............................................................................. 22 Chapter 5 Implementation................................................................................ 27 5.1 Customizing the LEON2 Processor....................................................... 27 5.1.1 Configuration.................................................................................... 27 5.1.2 Artisan RAM...................................................................................... 28 5.2 The Integrated SoC ................................................................................. 29 5.3 Insertion of rMatrix to the Design.......................................................... 30 5.4 Synthesis................................................................................................. 31 5.5 Physical Place and Route....................................................................... 32 iv 5.5.1 Floorplanning, Powerplanning and Placement.............................. 32 5.5.2 Clock Tree Synthesis ....................................................................... 34 5.5.3 Routing.............................................................................................. 35 5.6 Simulation Result.................................................................................... 35 5.7 Sending the Design to MOSIS................................................................ 37 Chapter 6 Testing with Post-Silicon tools...................................................... 45 6.1 Tests Performed...................................................................................... 45 6.2 Discussion............................................................................................... 55 6.2.1 Area Comparison.............................................................................. 55 6.2.2 Timing Overhead .............................................................................. 55 6.2.3 General Discussion.......................................................................... 55 Chapter 7 Future Work and Conclusion ......................................................... 57 7.1 Future Work............................................................................................
Recommended publications
  • The Intro to GPGPU CPU Vs
    12/12/11! The Intro to GPGPU . Dr. Chokchai (Box) Leangsuksun, PhD! Louisiana Tech University. Ruston, LA! ! CPU vs. GPU • CPU – Fast caches – Branching adaptability – High performance • GPU – Multiple ALUs – Fast onboard memory – High throughput on parallel tasks • Executes program on each fragment/vertex • CPUs are great for task parallelism • GPUs are great for data parallelism Supercomputing 20082 Education Program 1! 12/12/11! CPU vs. GPU - Hardware • More transistors devoted to data processing CUDA programming guide 3.1 3 CPU vs. GPU – Computation Power CUDA programming guide 3.1! 2! 12/12/11! CPU vs. GPU – Memory Bandwidth CUDA programming guide 3.1! What is GPGPU ? • General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical path of application • Data parallel algorithms leverage GPU attributes – Large data arrays, streaming throughput – Fine-grain SIMD parallelism – Low-latency floating point (FP) computation © David Kirk/NVIDIA and Wen-mei W. Hwu, 2007! ECE 498AL, University of Illinois, Urbana-Champaign! 3! 12/12/11! Why is GPGPU? • Large number of cores – – 100-1000 cores in a single card • Low cost – less than $100-$1500 • Green computing – Low power consumption – 135 watts/card – 135 w vs 30000 w (300 watts * 100) • 1 card can perform > 100 desktops 12/14/09!– $750 vs 50000 ($500 * 100) 7 Two major players 4! 12/12/11! Parallel Computing on a GPU • NVIDIA GPU Computing Architecture – Via a HW device interface – In laptops, desktops, workstations, servers • Tesla T10 1070 from 1-4 TFLOPS • AMD/ATI 5970 x2 3200 cores • NVIDIA Tegra is an all-in-one (system-on-a-chip) ATI 4850! processor architecture derived from the ARM family • GPU parallelism is better than Moore’s law, more doubling every year • GPGPU is a GPU that allows user to process both graphics and non-graphics applications.
    [Show full text]
  • System-On-A-Chip (Soc) & ARM Architecture
    System-on-a-Chip (SoC) & ARM Architecture EE2222 Computer Interfacing and Microprocessors Partially based on System-on-Chip Design by Hao Zheng 2020 EE2222 1 Overview • A system-on-a-chip (SoC): • a computing system on a single silicon substrate that integrates both hardware and software. • Hardware packages all necessary electronics for a particular application. • which implemented by SW running on HW. • Aim for low power and low cost. • Also more reliable than multi-component systems. 2020 EE2222 2 Driven by semiconductor advances 2020 EE2222 3 Basic SoC Model 2020 EE2222 4 2020 EE2222 5 SoC vs Processors System on a chip Processors on a chip processor multiple, simple, heterogeneous few, complex, homogeneous cache one level, small 2-3 levels, extensive memory embedded, on chip very large, off chip functionality special purpose general purpose interconnect wide, high bandwidth often through cache power, cost both low both high operation largely stand-alone need other chips 2020 EE2222 6 Embedded Systems • 98% processors sold annually are used in embedded applications. 2020 EE2222 7 Embedded Systems: Design Challenges • Power/energy efficient: • mobile & battery powered • Highly reliable: • Extreme environment (e.g. temperature) • Real-time operations: • predictable performance • Highly complex • A modern automobile with 55 electronic control units • Tightly coupled Software & Hardware • Rapid development at low price 2020 EE2222 8 EECS222A: SoC Description and Modeling Lecture 1 Design Complexity Challenge Design• Productivity Complexity
    [Show full text]
  • Comparative Study of Various Systems on Chips Embedded in Mobile Devices
    Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol.4, No.7, 2013 - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Comparative Study of Various Systems on Chips Embedded in Mobile Devices Deepti Bansal(Assistant Professor) BVCOE, New Delhi Tel N: +919711341624 Email: [email protected] ABSTRACT Systems-on-chips (SoCs) are the latest incarnation of very large scale integration (VLSI) technology. A single integrated circuit can contain over 100 million transistors. Harnessing all this computing power requires designers to move beyond logic design into computer architecture, meet real-time deadlines, ensure low-power operation, and so on. These opportunities and challenges make SoC design an important field of research. So in the paper we will try to focus on the various aspects of SOC and the applications offered by it. Also the different parameters to be checked for functional verification like integration and complexity are described in brief. We will focus mainly on the applications of system on chip in mobile devices and then we will compare various mobile vendors in terms of different parameters like cost, memory, features, weight, and battery life, audio and video applications. A brief discussion on the upcoming technologies in SoC used in smart phones as announced by Intel, Microsoft, Texas etc. is also taken up. Keywords: System on Chip, Core Frame Architecture, Arm Processors, Smartphone. 1. Introduction: What Is SoC? We first need to define system-on-chip (SoC). A SoC is a complex integrated circuit that implements most or all of the functions of a complete electronic system.
    [Show full text]
  • System-On-Chip Design with Virtual Components
    past designs can a huge chip be com- pleted within a reasonable time. This FEATURE solution usually entails reusing designs from previous generations of products ARTICLE and often leverages design work done by other groups in the same company. Various forms of intercompany cross licensing and technology sharing Thomas Anderson can provide access to design technol- ogy that may be reused in new ways. Many large companies have estab- lished central organizations to pro- mote design reuse and sharing, and to System-on-Chip Design look for external IP sources. One challenge faced by IP acquisi- tion teams is that many designs aren’t well suited for reuse. Designing with with Virtual Components reuse in mind requires extra time and effort, and often more logic as well— requirements likely to be at odds with the time-to-market goals of a product design team. Therefore, a merchant semiconduc- tor IP industry has arisen to provide designs that were developed specifically for reuse in a wide range of applications. These designs are backed by documen- esign reuse for tation and support similar to that d semiconductor provided by a semiconductor supplier. Here in the Recycling projects has evolved The terms “virtual component” from an interesting con- and “core” commonly denote reusable Age, designing for cept to a requirement. Today’s huge semiconductor IP that is offered for system-on-a-chip (SOC) designs rou- license as a product. The latter term is reuse may sound like tinely require millions of transistors. promoted extensively by the Virtual Silicon geometry continues to shrink Socket Interface (VSI) Alliance, a joint a great idea.
    [Show full text]
  • By Erika Azabache Villar a Thesis Submitted in Partial Fulfillment of The
    SMALLER/FASTER DELTA-SIGMA DIGITAL PIXEL SENSORS by Erika Azabache Villar A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Integrated Circuits and Systems Department of Electrical and Computer Engineering University of Alberta c Erika Azabache Villar, 2016 Abstract A digital pixel sensor (DPS) array is an image sensor where each pixel has an analog-to-digital converter (ADC). Recently, a logarithmic delta-sigma (ΔΣ) DPS array, using first-order ΔΣ ADCs, achieved wide dynamic range and high signal-to-noise-and-distortion ratios at video rates, requirements that are difficult to meet using conventional image sensors. However, this state-of-the-art ΔΣ DPS design is either too large for some applications, such as optical imag- ing, or too slow for others, such as gamma imaging. Consequently, this master’s thesis investi- gates smaller or faster ΔΣ DPS designs, relative to the state of the art. All designs are validated through simulations. Commercial image sensors, for optical and gamma imaging, are used as targeted baselines to establish competitive specifications. To achieve a smaller pixel, process scaling is exploited. Three logarithmic ΔΣ DPS designs are presented for 180, 130, and 65 nm fabrication processes, demonstrating a path to competitiveness for the optical imaging market. Decimator and readout circuits are improved, compared to previous work, while reducing area, and capacitors in the modulator prove to be the limiting factor in deep-submicron processes. Area trends are used to construct a roadmap to even smaller pixels. To achieve a faster pixel, a higher-order ΔΣ architecture is exploited.
    [Show full text]
  • Lecture Notes
    Lecture #4-5: Computer Hardware (Overview and CPUs) CS106E Spring 2018, Young In these lectures, we begin our three-lecture exploration of Computer Hardware. We start by looking at the different types of computer components and how they interact during basic computer operations. Next, we focus specifically on the CPU (Central Processing Unit). We take a look at the Machine Language of the CPU and discover it’s really quite primitive. We explore how Compilers and Interpreters allow us to go from the High-Level Languages we are used to programming to the Low-Level machine language actually used by the CPU. Most modern CPUs are multicore. We take a look at when multicore provides big advantages and when it doesn’t. We also take a short look at Graphics Processing Units (GPUs) and what they might be used for. We end by taking a look at Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC). Stanford President John Hennessy won the Turing Award (Computer Science’s equivalent of the Nobel Prize) for his work on RISC computing. Hardware and Software: Hardware refers to the physical components of a computer. Software refers to the programs or instructions that run on the physical computer. - We can entirely change the software on a computer, without changing the hardware and it will transform how the computer works. I can take an Apple MacBook for example, remove the Apple Software and install Microsoft Windows, and I now have a Window’s computer. - In the next two lectures we will focus entirely on Hardware.
    [Show full text]
  • An Area Efficient Sub-Threshold Voltage Level Shifter Using a Modified Wilson Current Mirror for Low Power Applications
    An Area Efficient Sub-threshold Voltage Level Shifter using a Modified Wilson Current Mirror for Low Power Applications Biswarup Mukherjee & Aniruddha Ghosal Published online: 22 May 2019. Submit your article to this journal Article views: 12 View Crossmark data An Area Efficient Sub-threshold Voltage Level Shifter using a Modified Wilson Current Mirror for Low Power Applications Biswarup Mukherjee1 and Aniruddha Ghosal2 1Neotia Institute of Technology, Management and Science, Jhinga, Diamond Harbour Road, 24Pgs. (S), West Bengal, India; 2Institute of Radio Physics and Electronics, University of Calcutta, West Bengal, India ABSTRACT KEYWORDS In the present communication, a new technique has been introduced for implementing low-power Current mirror; CAD area efficient sub-threshold voltage level shifter (LS) circuit. The proposed LS circuit consists of only simulation; Level shifter (LS); nine transistors and can operate up to 100 MHz of input frequency successfully. The proposed LS Low power; Sub-threshold is made of single threshold voltage transistors which show least complexity in fabrication and bet- operation; Typical transistor (TT) ter performance in terms of delay analysis and power consumption compared to other available designs. CAD tool-based simulation at TSMC 180 nm technology and comparison between the pro- posed design and other available designs show that the proposed design performs better than other state-of-the-art designs for a similar range of voltage conversion with the most area efficiency. 1. INTRODUCTION are used to reduce the circuit activity or the capacitance value in order to reduce the switching power consump- With the increasing demand of portable devices in elec- tion.
    [Show full text]
  • Overview of CMOS Sensors for Future Tracking Detectors
    instruments Review Overview of CMOS Sensors for Future Tracking Detectors Ricardo Marco-Hernández Instituto de Física Corpuscular (CSIC-UV), 46980 Valencia, Spain; rmarco@ific.uv.es Received: 23 October 2020; Accepted: 27 November 2020; Published: 30 November 2020 Abstract: Depleted Complementary Metal-Oxide-Semiconductor (CMOS) sensors are emerging as one of the main candidate technologies for future tracking detectors in high luminosity colliders. Their capability of integrating the sensing diode into the CMOS wafer hosting the front-end electronics allows for reduced noise and higher signal sensitivity, due to the direct collection of the sensor signal by the readout electronics. They are suitable for high radiation environments due to the possibility of applying high depletion voltage and the availability of relatively high resistivity substrates. The use of a CMOS commercial fabrication process leads to their cost reduction and allows faster construction of large area detectors. In this contribution, a general perspective of the state of the art of CMOS detectors for High Energy Physics experiments is given. The main developments carried out with regard to these devices in the framework of the CERN RD50 collaboration are summarized. Keywords: DMAPS; CMOS; radiation sensors; electronics 1. Introduction Current large pixel detectors in High Energy Physics, such as the ones which will be part of the A Toroidal LHC (Large Hadron Collider) ApparatuS (ATLAS) Tracker Detector [1] or the Compact Muon Solenoid (CMS) Tracker Detector [2] upgrades for the High-Luminosity Large Hadron Collider (HL-LHC), mostly follow a hybrid approach. In a hybrid pixel detector the sensor and the readout electronics are independent devices connected by means of bump-bonding, as can be seen in Figure1.
    [Show full text]
  • EE Concentration: System-On-A-Chip (Soc)
    EE Concentration: System-on-a-Chip (SoC) Requirements: Complete ESE350, ESE370, CIS371, ESE532 Requirement Flow: Impact: The chip at the heart of your smartphone, tablet, or mp3 player (including the Apple A11, A12) is an SoC. The chips that run almost all of your gadgets today are SoCs. These are the current culmination of miniaturization and part count reduction that allows such systems to built inexpensively and from small part counts. These chips democratize innovation, by providing a platform for the deployment of novel ideas without requiring hundreds of millions of dollars to build new custom ICs. Description: Modern computational and control chips contain billions of transistors and run software that has millions of lines of code. They integrate complete systems including multiple, potentially heterogeneous, processing elements, sophisticated memory hierarchies, communications, and rich interfaces for inputs and outputs including sensing and actuations. To design these systems, engineers must understand IC technology, digital circuits, processor and accelerator architectures, networking, and composition and interfacing and be able to manage hardware/software trade-offs. This concentration prepares students both to participate in the design of these SoC architectures and to use SoC architectures as implementation vehicles for novel embedded computing tasks. Sample industries and companies: ● Integrated Circuit Design: ARM, IBM, Intel, Nvidia, Samsung, Qualcomm, Xilinx ● Consumer Electronics: Apple, Samsung, NEST, Hewlett Packard ● Systems: Amazon, CISCO, Google, Facebook, Microsoft ● Automotive and Aerospace: Boeing, Ford, Space-X, Tesla, Waymo ● Your startup Sample Job Titles: ● Hardware Engineer, Chip Designer, Chip Architect, Architect, Verification Engineer, Software Engineering, Embedded Software Engineer, Member of Technical Staff, VP Engineering, CTO Graduate research in: computer systems and architecture .
    [Show full text]
  • A Area VLSI Design (SCL)
    A Area VLSI Design (SCL) A1 Sub Area ASIC Design (SCL) A1.1 Design of Instrumentation Amplifier A High-Precision Instrumentation Amplifier with large CMRR is required for the Sensor Signal conditioning applications. Instrumentation amplifier is a versatile device used for amplification of small differential mode signals while rejecting large common mode signal at the same time. In a typical signal conditioning chain, the output of the sensor goes to the instrumentation amplifier. The instrumentation amplifier amplifies the small output signal of the sensor and gives it to the ADC for digitization. Target Specification: • Supply Voltage: 3.3V • Temperature Range: -40 degC to 125 degC • Programmable Gain: 1 to 1000 • Low noise < 0.3µV p-p at 0.1 Hz to 10 Hz • Low non linearity < 20ppm (Gain = 1) • High CMRR (Common Mode Rejection Ratio) > 90dB (Gain=1) • Low offset voltage < 100µV • 3 dB Bandwidth: 1MHz (at Gain = 1) The design of the proposed Instrumentation Amplifier is to be carried out in SCL 180 nm process. A1.2 Design of a Current Feedback Amplifier (CFA) Opamps come in two types: the voltage-feedback amplifier (VFA), for which the input error is a voltage; and the current-feedback amplifier (CFA), for which the input error is a current. The CFA noninverting amplifier is relatively free from a gain-bandwidth trade-off. Additional advantage of CFAs compared to VFAs is the absence of slew-rate limiting. CFA is primarily used where both high speed and low distortion signal processing is required. CFAs are mainly used in broadcast video systems, radar systems , IF and RF stages and other high speed circuits.
    [Show full text]
  • EP Activity Report 2015
    EUROPRACTICE IC SERVICE THE RIGHT COCKTAIL OF ASIC SERVICES EUROPRACTICE IC SERVICE OFFERS YOU A PROVEN ROUTE TO ASICS THAT FEATURES: · .QYEQUV#5+%RTQVQV[RKPI · (NGZKDNGCEEGUUVQUKNKEQPECRCEKV[HQTUOCNNCPFOGFKWOXQNWOGRTQFWEVKQPSWCPVKVKGU · 2CTVPGTUJKRUYKVJNGCFKPIYQTNFENCUUHQWPFTKGUCUUGODN[CPFVGUVJQWUGU · 9KFGEJQKEGQH+%VGEJPQNQIKGU · &KUVTKDWVKQPCPFHWNNUWRRQTVQHJKIJSWCNKV[EGNNNKDTCTKGUCPFFGUKIPMKVUHQTVJGOQUVRQRWNCT%#&VQQNU · 46.VQ.C[QWVUGTXKEGHQTFGGRUWDOKETQPVGEJPQNQIKGU · (TQPVGPF#5+%FGUKIPVJTQWIJ#NNKCPEG2CTVPGTU +PFWUVT[KUTCRKFN[FKUEQXGTKPIVJGDGPG«VUQHWUKPIVJG'74124#%6+%'+%UGTXKEGVQJGNRDTKPIPGYRTQFWEVFGUKIPUVQOCTMGV SWKEMN[CPFEQUVGHHGEVKXGN[6JG'74124#%6+%'#5+%TQWVGUWRRQTVUGURGEKCNN[VJQUGEQORCPKGUYJQFQP°VPGGFCNYC[UVJG HWNNTCPIGQHUGTXKEGUQTJKIJRTQFWEVKQPXQNWOGU6JQUGEQORCPKGUYKNNICKPHTQOVJG¬GZKDNGCEEGUUVQUKNKEQPRTQVQV[RGCPF RTQFWEVKQPECRCEKV[CVNGCFKPIHQWPFTKGUFGUKIPUGTXKEGUJKIJSWCNKV[UWRRQTVCPFOCPWHCEVWTKPIGZRGTVKUGVJCVKPENWFGU+% OCPWHCEVWTKPIRCEMCIKPICPFVGUV6JKU[QWECPIGVCNNHTQO'74124#%6+%'+%UGTXKEGCUGTXKEGVJCVKUCNTGCF[GUVCDNKUJGF HQT[GCTUKPVJGOCTMGV THE EUROPRACTICE IC SERVICES ARE OFFERED BY THE FOLLOWING CENTERS: · KOGE.GWXGP $GNIKWO · (TCWPJQHGT+PUVKVWVHWGT+PVGITKGTVG5EJCNVWPIGP (TCWPJQHGT++5 'TNCPIGP )GTOCP[ This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement N° 610018. This funding is exclusively used to support European universities and research laboratories. © imec FOREWORD Dear EUROPRACTICE customers, We are at the start of the
    [Show full text]
  • Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2
    Overview SIMD and MIMD in the Multicore Context Single Instruction Multiple Instruction ● (note: Tute 02 this Weds - handouts) ● Flynn’s Taxonomy Single Data SISD MISD ● multicore architecture concepts Multiple Data SIMD MIMD ● for SIMD, the control unit and processor state (registers) can be shared ■ hardware threading ■ SIMD vs MIMD in the multicore context ● however, SIMD is limited to data parallelism (through multiple ALUs) ■ ● T2: design features for multicore algorithms need a regular structure, e.g. dense linear algebra, graphics ■ SSE2, Altivec, Cell SPE (128-bit registers); e.g. 4×32-bit add ■ system on a chip Rx: x x x x ■ 3 2 1 0 execution: (in-order) pipeline, instruction latency + ■ thread scheduling Ry: y3 y2 y1 y0 ■ caches: associativity, coherence, prefetch = ■ memory system: crossbar, memory controller Rz: z3 z2 z1 z0 (zi = xi + yi) ■ intermission ■ design requires massive effort; requires support from a commodity environment ■ speculation; power savings ■ massive parallelism (e.g. nVidia GPGPU) but memory is still a bottleneck ■ OpenSPARC ● multicore (CMT) is MIMD; hardware threading can be regarded as MIMD ● T2 performance (why the T2 is designed as it is) ■ higher hardware costs also includes larger shared resources (caches, TLBs) ● the Rock processor (slides by Andrew Over; ref: Tremblay, IEEE Micro 2009 ) needed ⇒ less parallelism than for SIMD COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 1 COMP8320 Lecture 2: Multicore Architecture and the T2 2011 ◭◭◭ • ◮◮◮ × 3 Hardware (Multi)threading The UltraSPARC T2: System on a Chip ● recall concurrent execution on a single CPU: switch between threads (or ● OpenSparc Slide Cast Ch 5: p79–81,89 processes) requires the saving (in memory) of thread state (register values) ● aggressively multicore: 8 cores, each with 8-way hardware threading (64 virtual ■ motivation: utilize CPU better when thread stalled for I/O (6300 Lect O1, p9–10) CPUs) ■ what are the costs? do the same for smaller stalls? (e.g.
    [Show full text]