Snitch: a Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
ECE 546 Lecture -20 Power Distribution Networks
ECE 546 Lecture ‐20 Power Distribution Networks Spring 2020 Jose E. Schutt-Aine Electrical & Computer Engineering University of Illinois [email protected] ECE 546 – Jose Schutt‐Aine 1 IC on Package ECE 546 – Jose Schutt‐Aine 2 IC on Package ECE 546 – Jose Schutt‐Aine 3 Power-Supply Noise - Power-supply-level fluctuations - Delta-I noise - Simultaneous switching noise (SSN) - Ground bounce VOH Ideal Actual VOL Vout Vout Time 4 ECE 546 – Jose Schutt‐Aine 4 Voltage Fluctuations • Voltage fluctuations can cause the following Reduction in voltage across power supply terminals. May prevent devices from switching Increase in voltage across power supply terminalsreliability problems Leakage of the voltage fluctuation into transistors Timing errors, power supply noise, delta‐I noise, simultaneous switching noise (SSN) ECE 546 – Jose Schutt‐Aine 5 Power-Supply-Level Fluctuations • Total capacitive load associated with an IC increases as minimum feature size shrinks • Average current needed to charge capacitance increases • Rate of change of current (dI/dt) also increases • Total chip current may change by large amounts within short periods of time • Fluctuation at the power supply level due to self inductance in distribution lines ECE 546 – Jose Schutt‐Aine 6 Reducing Power-Supply-Level Fluctuations Minimize dI/dt noise • Decoupling capacitors • Multiple power & ground pins • Taylored driver turn-on characteristics Decoupling capacitors • Large capacitor charges up during steady state • Assumes role of power supply during current switching • Leads -
Introduction to ASIC Design
’14EC770 : ASIC DESIGN’ An Introduction Application - Specific Integrated Circuit Dr.K.Kalyani AP, ECE, TCE. 1 VLSI COMPANIES IN INDIA • Motorola India – IC design center • Texas Instruments – IC design center in Bangalore • VLSI India – ASIC design and FPGA services • VLSI Software – Design of electronic design automation tools • Microchip Technology – Offers VLSI CMOS semiconductor components for embedded systems • Delsoft – Electronic design automation, digital video technology and VLSI design services • Horizon Semiconductors – ASIC, VLSI and IC design training • Bit Mapper – Design, development & training • Calorex Institute of Technology – Courses in VLSI chip design, DSP and Verilog HDL • ControlNet India – VLSI design, network monitoring products and services • E Infochips – ASIC chip design, embedded systems and software development • EDAIndia – Resource on VLSI design centres and tutorials • Cypress Semiconductor – US semiconductor major Cypress has set up a VLSI development center in Bangalore • VDAT 2000 – Info on VLSI design and test workshops 2 VLSI COMPANIES IN INDIA • Sandeepani – VLSI design training courses • Sanyo LSI Technology – Semiconductor design centre of Sanyo Electronics • Semiconductor Complex – Manufacturer of microelectronics equipment like VLSIs & VLSI based systems & sub systems • Sequence Design – Provider of electronic design automation tools • Trident Techlabs – Power systems analysis software and electrical machine design services • VEDA IIT – Offers courses & training in VLSI design & development • Zensonet Technologies – VLSI IC design firm eg3.com – Useful links for the design engineer • Analog Devices India Product Development Center – Designs DSPs in Bangalore • CG-CoreEl Programmable Solutions – Design services in telecommunications, networking and DSP 3 Physical Design, CAD Tools. • SiCore Systems Pvt. Ltd. 161, Greams Road, ... • Silicon Automation Systems (India) Pvt. Ltd. ( SASI) ... • Tata Elxsi Ltd. -
Stuck-At Fault As a Logic Fault
Stuck-At Fault: Stuck-At Fault A Fault Model for the next Millennium? “I tell you, I get no respect!” -Rodney Dangerfield, Comedian Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign “The news of my death are highly exaggerated” -Mark Twain, Author © 2005 Janak H. Patel 2 Stuck-At Fault a Defect Model? Stuck-At Fault as a Logic Fault You can call it - z Stuck-at Fault is a Functional Fault on a Boolean Abstract (Logic) Function Implementation Logical z It is not a Physical Defect Model Boolean Stuck-at 1 does not mean line is shorted to VDD Functional Stuck-at 0 does not mean line is grounded! Symbolic z It is an abstract fault model or Behavioral …... Fault Model A logic stuck-at 1 means when the line is applied a logic 0, it produces a logical error But don’t call it a Defect Model! A logic error means 0 becomes 1 or vice versa 3 4 Propagate Error To Fault Excitation Primary Output Y 1 1 G ERROR A G s - a - 0 A 0 1 x X 1/0 D 1 D F ERROR F 0 Y 0 C 1/0 C Y 0 0 0 E E B H B H Test Vector A,B,C = 1,0,0 detects fault G s-a-0 Activates the fault s-a-0 on line G by applying a logic value 1 in line G 5 6 Unmodeled Defect Detection Defect Sites z Internal to a Logic Gate or Cell Other Logic Transistor Defects – Stuck-On, Stuck-Open, Leakage, Shorts between treminals ERROR Defect G ERROR z External to a Gate or Cell Interconnect Defects – Shorts and Opens D 1 F 0 Y ERROR C 0 0 0 E B H G I 7 8 Defects in Physical Cells Fault Modeling z Physical z Electrical Physical Cells such as NAND, NOR, XNOR, AOI, OAI, MUX2, etc. -
BDEC2 Poznan Japanese HPC Infrastructure Update
BDEC2 Poznan Japanese HPC Infrastructure Update Presenter: Masaaki Kondo, Riken R-CCS/Univ. of Tokyo (on behalf of Satoshi Matsuoka, Director, Riken R-CCS) 1 Post-K: The Game Changer (2020) 1. Heritage of the K-Computer, HP in simulation via extensive Co-Design • High performance: up to x100 performance of K in real applications • Multitudes of Scientific Breakthroughs via Post-K application programs • Simultaneous high performance and ease-of-programming 2. New Technology Innovations of Post-K Global leadership not just in High Performance, esp. via high memory BW • the machine & apps, but as Performance boost by “factors” c.f. mainstream CPUs in many HPC & Society5.0 apps via BW & Vector acceleration cutting edge IT • Very Green e.g. extreme power efficiency Ultra Power efficient design & various power control knobs • Arm Global Ecosystem & SVE contribution Top CPU in ARM Ecosystem of 21 billion chips/year, SVE co- design and world’s first implementation by Fujitsu High Perf. on Society5.0 apps incl. AI • ARM: Massive ecosystem Architectural features for high perf on Society 5.0 apps from embedded to HPC based on Big Data, AI/ML, CAE/EDA, Blockchain security, etc. Technology not just limited to Post-K, but into societal IT infrastructures e.g. Clouds 2 Arm64fx & Post-K (to be renamed) Fujitsu-Riken design A64fx ARM v8.2 (SVE), 48/52 core CPU HPC Optimized: Extremely high package high memory BW (1TByte/s), on-die Tofu-D network BW (~400Gbps), high SVE FLOPS (~3Teraflops), various AI support (FP16, INT8, etc.) Gen purpose CPU – Linux, -
Supercomputer Fugaku
Supercomputer Fugaku Toshiyuki Shimizu Feb. 18th, 2020 FUJITSU LIMITED Copyright 2020 FUJITSU LIMITED Outline ◼ Fugaku project overview ◼ Co-design ◼ Approach ◼ Design results ◼ Performance & energy consumption evaluation ◼ Green500 ◼ OSS apps ◼ Fugaku priority issues ◼ Summary 1 Copyright 2020 FUJITSU LIMITED Supercomputer “Fugaku”, formerly known as Post-K Focus Approach Application performance Co-design w/ application developers and Fujitsu-designed CPU core w/ high memory bandwidth utilizing HBM2 Leading-edge Si-technology, Fujitsu's proven low power & high Power efficiency performance logic design, and power-controlling knobs Arm®v8-A ISA with Scalable Vector Extension (“SVE”), and Arm standard Usability Linux 2 Copyright 2020 FUJITSU LIMITED Fugaku project schedule 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Fugaku development & delivery Manufacturing, Apps Basic Detailed design & General Feasibility study Installation review design Implementation operation and Tuning Select Architecture & Co-Design w/ apps groups apps sizing 3 Copyright 2020 FUJITSU LIMITED Fugaku co-design ◼ Co-design goals ◼ Obtain the best performance, 100x apps performance than K computer, within power budget, 30-40MW • Design applications, compilers, libraries, and hardware ◼ Approach ◼ Estimate perf & power using apps info, performance counts of Fujitsu FX100, and cycle base simulator • Computation time: brief & precise estimation • Communication time: bandwidth and latency for communication w/ some attributes for communication patterns • I/O time: ◼ Then, optimize apps/compilers etc. and resolve bottlenecks ◼ Estimation of performance and power ◼ Precise performance estimation for primary kernels • Make & run Fugaku objects on the Fugaku cycle base simulator ◼ Brief performance estimation for other sections • Replace performance counts of FX100 w/ Fugaku params: # of inst. commit/cycle, wait cycles of barrier, inst. -
A Study on High-Frequency Performance in Mosfets Scaling
Doctorial Thesis A Study on High-Frequency Performance in MOSFETs Scaling Hiroshi Shimomura March, 2011 Supervisor: Prof. Hiroshi Iwai A Study on High-Frequency Performance in MOSFETs Scaling Contents Contents Ⅰ List of Figures Ⅵ List of Tables ⅩⅤ Chapter.1 Introduction – Background and Motivation of This Study 1.1 Field Effect Transistor Technology 2 1.2 Performance Trends of RF Transistors 5 1.3 Objective and Organization of This Study 11 References 15 Chapter.2 RF Measurement and Characterization 2.1 Introduction 31 2.2 AC S-parameter Measurements 2.2.1 De-embedding Techniques using The OPEN 31 and The SHORT Dummy Device I A Study on High-Frequency Performance in MOSFETs Scaling 2.2.2 Cutoff Frequency, fT 35 2.2.3 Maximum Oscillation Frequency, fmax 36 2.3 Noise Characterization 2.3.1 Noise De-embedding Procedure 39 2.3.2 Noise Parameter Extraction Theory 44 2.4 Large-Signal Measurements 49 2.5 Distortion 50 References 52 Chapter.3 A Mesh-Arrayed MOSFET (MA-MOS) for High-Frequency Analog Applications 3.1 Introduction 55 3.2 Device Configuration and Parasitic Components 55 3.3 AC Characteristics 62 3.4 Conclusion 65 References 66 II A Study on High-Frequency Performance in MOSFETs Scaling Chapter.4 Analysis of RF Characteristics as Scaling MOSFETs from 150 nm to 65 nm Nodes 4.1 Introduction 69 4.2 Methodology and Experimental Procedures 70 4.3 Small Signal and Noise Characteristics 4.3.1 Cut-off frequency :fT and Maximum Oscillation Frequency: fmax 74 4.3.2 Noise Figure: NFmin 79 4.3.3 Intrinsic Gain: gm/gds 84 4.4 Large Signal and Distortion Characteristics 4.4.1 Power-Added Efficiency: PAE and Associated Gain: Ga 86 4.4.2 1dB Compression Point: P1dB 89 4.4.3 Third Order Intercept Point: IP3 91 4.5 Conclusion 95 References 97 Chapter.5 Effect of High Frequency Noise Current Sources on Noise Figure for Sub-50 nm node MOSFETs 5.1 Introduction 103 III A Study on High-Frequency Performance in MOSFETs Scaling 5.2 Noise Theory 5.2.1. -
Download SC18 Final Program
Program November 11-16, 2018 The International Conference Exhibits November 12-15, 2018 for High Performance KAY BAILEY HUTCHISON CONVENTION CENTER Computing, Networking, DALLAS, TEXAS, USA Storage, and Analysis Sponsored by: MICHAEL S. RAWLINGS Mayor of Dallas November 11, 2018 Supercomputing – SC18 Conference: As Mayor of Dallas, it is my pleasure to welcome SC18 – the ultimate international conference for high performance computing, networking, storage, and analysis – to the Kay Bailey Hutchison Convention Center! I applaud SC18 for bringing together high-performance computing (HPC) professionals and students from across the globe to share ideas, attend and participate in technical presentations, papers and workshops, all while inspiring the next generation of HPC professionals. Dallas is the No. 1 visitor destination in Texas and one of the top cities in the nation for meetings and conventions. In addition to having the resources to host major events and conventions, Dallas is a greatly diverse American city and a melting pot of cultures. This important convergence of uniqueness and differences is reflected throughout the sights and sounds of our city. Dallas' authentic arts, music, food, historic landmarks and urban lifestyle all contribute to the city's rich cultural makeup. Let the bold, vibrant spirit of Dallas, with a touch of Texas charm, guide the way to your BIG Dallas moments. Our city is filled with the unique experiences one can only have in Dallas – from cuisine by celebrity chefs, to the cultural landscape in one of the nation’s largest arts districts, to first-class shopping - make the most of your time in our great city! From transportation to hotels, restaurants, retail and entertainment venues, Dallas is ready to roll out our red carpet to host SC18. -
Chapter 6 - Combinational Logic Systems GCSE Electronics – Component 1: Discovering Electronics
Chapter 6 - Combinational logic systems GCSE Electronics – Component 1: Discovering Electronics Combinational logic systems Learners should be able to: (a) recognise 1/0 as two-state logic levels (b) identify and use NOT gates and 2-input AND, OR, NAND and NOR gates, singly and in combination (c) produce a suitable truth table from a given system specification and for a given logic circuit (d) use truth tables to analyse a system of gates (e) use Boolean algebra to represent the output of truth tables or logic gates and use the basic Boolean identities A.B = A+B and A+B = A.B (f) design processing systems consisting of logic gates to solve problems (g) simplify logic circuits using NAND gate redundancy (h) analyse and design systems from a given truth table to solve a given problem (i) use data sheets to select a logic IC for given applications and to identify pin connections (j) design and use switches and pull-up or pull-down resistors to provide correct logic level/edge-triggered signals for logic gates and timing circuits 180 © WJEC 2017 Chapter 6 - Combinational logic systems GCSE Electronics – Component 1: Discovering Electronics Introduction In this chapter we will be concentrating on the basics of digital logic circuits which will then be extended in Component 2. We should start by ensuring that you understand the difference between a digital signal and an analogue signal. An analogue signal Voltage (V) Max This is a signal that can have any value between the zero and maximum of the power supply. Changes between values can occur slowly or rapidly depending on the system involved. -
Mixed Precision Block Fused Multiply-Add
Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh To cite this version: Pierre Blanchard, Nicholas Higham, Florent Lopez, Théo Mary, Srikara Pranesh. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores. SIAM Journal on Scientific Computing, Society for Industrial and Applied Mathematics, 2020, 42 (3), pp.C124-C141. 10.1137/19M1289546. hal-02491076v2 HAL Id: hal-02491076 https://hal.archives-ouvertes.fr/hal-02491076v2 Submitted on 28 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. MIXED PRECISION BLOCK FUSED MULTIPLY-ADD: ERROR ANALYSIS AND APPLICATION TO GPU TENSOR CORES∗ PIERRE BLANCHARDy , NICHOLAS J. HIGHAMz , FLORENT LOPEZx , THEO MARY{, AND SRIKARA PRANESHk Abstract. Computing units that carry out a fused multiply-add (FMA) operation with matrix arguments, referred to as tensor units by some vendors, have great potential for use in scientific computing. However, these units are inherently mixed precision and existing rounding error analyses do not support them. We consider a mixed precision block FMA that generalizes both the usual scalar FMA and existing tensor units. -
HPC Systems in the Next Decade – What to Expect, When, Where
EPJ Web of Conferences 245, 11004 (2020) https://doi.org/10.1051/epjconf/202024511004 CHEP 2019 HPC Systems in the Next Decade – What to Expect, When, Where Dirk Pleiter1;∗ 1Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany Abstract. HPC systems have seen impressive growth in terms of performance over a period of many years. Soon the next milestone is expected to be reached with the deployment of exascale systems in 2021. In this paper, we provide an overview of the exascale challenges from a computer architecture’s perspec- tive and explore technological and other constraints. The analysis of upcoming architectural options and emerging technologies allow for setting expectations for application developers, which will have to cope with heterogeneous archi- tectures, increasingly diverse compute technologies as well as deeper memory and storage hierarchies. Finally, needs resulting from changing science and en- gineering workflows will be discussed, which need to be addressed by making HPC systems available as part of more open e-infrastructures that provide also other compute and storage services. 1 Introduction Over several decades, high-performance computing (HPC) has seen an exponential growth of performance when solving large-scale numerical problems involving dense-matrix opera- tions. This development is well documented by the Top500 list [1], which is published twice a year since 1993. In various regions around the world, significant efforts are being per- formed to reach the next milestone of performance with the deployment of exascale systems. For various reasons, improving on the performance of HPC systems has become increasingly more difficult. In this paper, we discuss some of the reasons. -
The Post-K Project and Fujitsu ARM-SVE Enabled A64FX Processor for Energy-Efficiency and Sustained Application Performance
CEA‐RIKEN Summer School, 13th June 2019 @ MDLS, Paris The post-K project and Fujitsu ARM-SVE enabled A64FX processor for energy-efficiency and sustained application performance Mitsuhisa Sato Team Leader of Architecture Development Team Deputy project leader, FLAGSHIP 2020 project Deputy Director, RIKEN Center for Computational Science (R-CCS) Professor (Cooperative Graduate School Program), University of Tsukuba The name of our system (a.k.a post‐K) was announced as “Fugaku” (May 23, 2019) 富岳 富岳 (Fugaku) = Mt. Fuji http://www.bestweb‐link.net/PD‐Museum‐of‐Art/ukiyoe/ukiyoe/fuga2 ku36/No.027.jpg FLAGSHIP2020 Project Missions • Building the Japanese national flagship supercomputer, “Fugaku” (a.k.a Post‐K), and • Developing wide range of HPC applications, running on Fugaku, in order to solve social and science issues in Japan Planned Budget (from 2014FY to 2020FY) • 110 billion JPY (about 1 billion US$ if 1US$=110JPY, total) includes: • Research and development, and manufacturing of the Fugakusystem • Development of applications Project organization Applications • The government selected 9 social & System development scientific priority issues and their • RIKEN is in charge of development R&D organizations. • Fujitsu is vendor partner. • Additional projects for Exploratory • International collaborations: DOE, JLESC, Issues were selected in Jun 2016 CEA .. 2019/5/17 Target science: 9 Priority Issues 重点課題① ⽣体分⼦システムの機能制御による ⾰新的創薬基盤の構築 重点課題② 個別化・予防医療を⽀援する 統合計算⽣命科学 重点課題③ 地震・津波による複合災害の ①Innovative Drug Discovery ②Personalized and Preventive ③Hazard統合的予測システムの構築 and Disaster induced by Medicine Earthquake and Tsunami Society with health and longevity Disaster prevention RIKEN Quant. Biology Center Inst. Medical Science, U. Tokyo Earthquakeand global climate Res. Inst., U. -
SC20-Final-Program-V2.Pdf
Table of Contents ACM Gordon Bell COVID Finalist Keynote ACM Gordon Bell Finalist More Than HPC Plenary ACM Student Research Competition: Panel Graduate Posters Paper ACM Student Research Competition: Research Posters Undergraduate Posters Scientific Visualization Awards Presentation & Data Analytics Showcase Birds of a Feather SCinet Booth Sessions State of the Practice Talk Doctoral Showcase Students@SC Early Career Program Test of Time Exhibitor Forum Tutorial Exhibits Virtual Student Cluster Competition Invited Talk Workshop Job Posting ACM Gordon Bell COVID Finalist (back to top) Thursday, November 19th 10:00 am - 12:00 pm Gordon Bell COVID-19 Prize Finalist Session 1 Session Description: Enabling Rapid COVID-19 Small Molecule Drug Design Through Scalable Deep Learning of Generative Models Sam Ade Jacobs (Lawrence Livermore National Laboratory), Tim Moon (Lawrence Livermore National Laboratory), Kevin McLoughlin (Lawrence Livermore National Laboratory), Derek Jones (Lawrence Livermore National Laboratory), David Hysom (Lawrence Livermore National Laboratory), Dong H. Ahn (Lawrence Livermore National Laboratory), John Gyllenhaal (Lawrence Livermore National Laboratory), Pythagoras Watson (Lawrence Livermore National Laboratory), Felice C. Lightsone (Lawrence Livermore National Laboratory), Jonathan E. Allen (Lawrence Livermore National Laboratory), Ian Karlin (Lawrence Livermore National Laboratory), Brian Van Essen (Lawrence Livermore National Laboratory) We improved the quality and reduced the time to produce machine-learned models for use in small molecule antiviral design. Our globally asynchronous multi-level parallel training approach strong scales to all of Sierra with up to 97.7% efficiency. We trained a novel, character-based Wasserstein autoencoder that produces a higher quality model trained on 1.613 billion compounds in 23 minutes while the previous state-of-the-art takes a day on 1 million compounds.