The Utilization Wall

UC San Diego UC San Diego Electronic Theses and Dissertations Title Configurable energy-efficient co-processors to scale the utilization wall Permalink https://escholarship.org/uc/item/3g99v4qd Author Venkatesh, Ganesh Publication Date 2011 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA, SAN DIEGO Configurable Energy-efficient Co-processors to Scale the Utilization Wall A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Ganesh Venkatesh Committee in charge: Professor Steven Swanson, Co-Chair Professor Michael Taylor, Co-Chair Professor Pamela Cosman Professor Rajesh Gupta Professor Dean Tullsen 2011 Copyright Ganesh Venkatesh, 2011 All rights reserved. The dissertation of Ganesh Venkatesh is approved, and it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California, San Diego 2011 iii DEDICATION To my dear parents and my loving wife. iv EPIGRAPH The lurking suspicion that something could be simplified is the world's richest source of rewarding challenges. |Edsger Dijkstra v TABLE OF CONTENTS Signature Page.................................. iii Dedication..................................... iv Epigraph.....................................v Table of Contents................................. vi List of Figures.................................. ix List of Tables................................... xi Acknowledgements................................ xii Vita and Publications.............................. xv Abstract of the Dissertation........................... xvii Chapter 1 Introduction............................1 1.1 Utilization Wall.......................2 1.2 Specialization for converting transistors into performance4 1.3 Patchable Conservation Cores: Energy efficient circuits with processor-like lifetimes................5 1.4 Utilizing Conservation Cores to Design Mobile Applica- tion Processors.......................6 1.5 Quasi-ASICs: Trading Area for Energy by Exploiting Similarity across Irregular Codes..............7 1.6 Organization.........................8 Chapter 2 Arsenal: Baseline Architecture and Tool Chain.........9 2.1 Arsenal: Massively Heterogeneous Multiprocessors.......................9 2.1.1 Hardware Organization............... 10 2.1.2 Arsenal Design Flow................ 11 2.1.3 Execution Model.................. 11 2.2 System Overview...................... 12 2.2.1 Specialized Processor Hardware Design...... 13 2.2.2 The CPU/Specialized-processor Interface..... 15 2.2.3 The Runtime System................ 15 2.3 Methodology......................... 15 2.3.1 Toolchain...................... 17 2.3.2 Simulation infrastructure.............. 17 vi 2.3.3 Synthesis....................... 18 2.3.4 Power measurements................ 18 Chapter 3 Patchable Conservation Cores: Energy-efficient circuits with processor-like lifetimes...................... 20 3.1 Case for Reconfigurability in Application Specific Circuits....................... 21 3.2 Reconfigurability support in Conservation Cores............................. 24 3.3 Patching Algorithm..................... 25 3.3.1 Basic block mapping................ 27 3.3.2 Control flow mapping................ 28 3.3.3 Register mapping.................. 29 3.3.4 Patch generation.................. 30 3.3.5 Patched execution example............. 31 3.4 Methodology......................... 31 3.4.1 Generation of Patchable Conservation Core Hardware...................... 32 3.4.2 Generating Configuration Patch.......... 32 3.5 Results............................ 33 3.5.1 Energy savings................... 33 3.5.2 Longevity...................... 37 3.6 Patching Overhead Analysis and Optimizations..... 38 3.6.1 Cost of Reconfigurability.............. 38 3.6.2 Optimizations to Reduce the Patching Overheads 40 3.7 Backward Patching..................... 43 3.7.1 Methodology.................... 47 3.7.2 Results........................ 47 3.8 Conclusion.......................... 50 Chapter 4 Utilizing Conservation Cores to Design Mobile Application Processors............................. 53 4.1 Applicability of Conservation Cores to Android..... 54 4.1.1 Android Platform Analysis............. 54 4.1.2 Case for using c-cores to optimize mobile application processors................... 56 4.2 Android Software Stack................... 56 4.3 Designing Conservation Cores for Android........ 58 4.3.1 Profiling the Android System........... 58 4.3.2 Characterizing the hotspots in the Android system 59 4.4 Results............................ 67 4.4.1 Area requirement.................. 67 4.4.2 Energy-Area Tradeoff................ 69 vii 4.5 Conclusion.......................... 71 Chapter 5 Quasi-ASICs: Trading Area for Energy by Exploiting Similar- ity across Irregular Codes.................... 73 5.1 Motivation.......................... 76 5.2 Quasi-ASIC Design Flow.................. 82 5.2.1 Dependence Graph Generation........... 83 5.2.2 Mining for Similar Code Patterns......... 83 5.2.3 Merging Program Dependence Graphs with similar code structure.................. 84 5.2.4 Qasic Generation.................. 87 5.2.5 Modifying Application Code to utilize Qasics.. 90 5.3 Qasic-selection heuristic.................. 91 5.4 Methodology......................... 92 5.4.1 Designing Qasics for the target application set. 94 5.4.2 QASIC Hardware Design.............. 94 5.5 Results............................ 97 5.5.1 Evaluating the Qasic-selection Heuristic..... 97 5.5.2 Evaluating Qasic's Area and Energy Efficiency. 98 5.6 Conclusion.......................... 104 Chapter 6 Related Work........................... 106 6.1 Heterogeneous Architectures................ 106 6.2 Automatically-designed Specialized Cores......... 110 Chapter 7 Summary............................. 113 Bibliography................................... 115 viii LIST OF FIGURES Figure 2.1: The high-level structure of an Arsenal system.......... 10 Figure 2.2: The high-level design flow of an Arsenal system......... 11 Figure 2.3: The high-level structure of the baseline architecture....... 12 Figure 2.4: Specialized Core Design Example................. 14 Figure 2.5: The C-to-hardware toolchain................... 16 Figure 3.1: Observed changes in the source code across application versions 23 Figure 3.2: An example showing handling of changes in control flow across versions............................... 26 Figure 3.3: Basic block matching example................... 27 Figure 3.4: Patching Algorithm Toolchain................... 30 Figure 3.5: Conservation Core versus MIPS Processor............ 35 Figure 3.6: System-level Efficiency of a c-core-enabled Tile Architecture.. 36 Figure 3.7: Conservation core effectiveness over time............. 37 Figure 3.8: Area and power breakdown for patchable c-cores........ 39 Figure 3.9: Impact of patching optimizations on the area and energy requirements of patchable c-cores.................. 44 Figure 3.10: Conservation Core design with improved backward compatibility support............................. 45 Figure 3.11: The Backward Compatible Conservation Core Toolchain.... 46 Figure 3.12: Improvements in the backward compatibility of Conservation Cores................................ 48 Figure 3.13: Effectiveness of Backward Compatible Conservation Core over time................................. 49 Figure 4.1: Android Software Stack...................... 55 Figure 4.2: Static Instruction Count vs Dynamic Instruction Count Coverage 60 Figure 4.3: Basic block count vs Application execution coverage...... 61 Figure 4.4: Breakdown of Application Runtime across Android Software Stack 62 Figure 4.5: Dynamic Execution Coverage vs Static Instructions Converted into Conservation cores...................... 66 Figure 4.6: Android Execution Coverage vs Conservation core Area.... 68 Figure 4.7: Android Execution Coverage vs Conservation core Area.... 70 Figure 5.1: Qasic's ability to trade off between area and energy efficiency. 75 Figure 5.2: Similar code patterns present across the hotspots of a Sat Solver tool................................. 77 Figure 5.3: Similarity available across hotspots of diverse application set (Table 5.1)............................. 79 Figure 5.4: Quantifying Similarity Present Within and Across Application Domains............................... 80 ix Figure 5.5: Qasic Design Flow........................ 82 Figure 5.6: Qasic Example.......................... 85 Figure 5.7: Expression Merging........................ 88 Figure 5.8: Inferred Dependence Example................... 88 Figure 5.9: Greedy Clustering Algorithm for Designing Qasics....... 92 Figure 5.10: Qasic Toolchain.......................... 93 Figure 5.11: Micro-Benchmark Set: Eight simple loops............ 95 Figure 5.12: Coverage vs Qasic count..................... 95 Figure 5.13: Qasic quality vs. Qasic Count................. 96 Figure 5.14: Impact of generalization on Qasic Area and Energy Efficiency for the Micro-benchmarks..................... 96 Figure 5.15: Impact of generalization on the area and energy Efficiency of Qasics targeting commonly used data structures........ 99 Figure 5.16: Scalability of Qasic's approach.................. 100 Figure 5.17: Impact of generalization on Qasic Area and Energy Efficiency for the Benchmark Set....................... 101 Figure 5.18: Impact of generalization on Qasic Performance for our Bench- mark Set.............................

The Utilization Wall

Conservation Cores: Reducing the Energy of Mature Computations

Nvidia® Gelato™ 1.0 Hardware

FCM 61 Italiano

3D Computer Graphics Compiled By: H

VMD User's Guide

C 2009 Aqeel A. Mahesri TRADEOFFS in DESIGNING MASSIVELY PARALLEL ACCELERATOR ARCHITECTURES

Enabling Compute-Communication Overlap in Distributed Deep

Spatial Data Structures, Sorting and GPU Parallelism for Situated-Agent Simulation and Visualisation

July/August 2021

Graphics Hardware

Gelato Pro 2.0 and Gelato 2.0 Gpu-Accelerated Final-Frame Renderer

Size Oblivious Programming with Infinimem