Reliable and Energy-Efficient Network-On-Chip Architecture Using

RETUNES: Reliable and Energy-Efficient Network-on-Chip Architecture using Adaptive Routing and Approximate Communication A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Padmaja Bhamidipati May 2019 © 2019 Padmaja Bhamidipati. All Rights Reserved. 2 This thesis titled RETUNES: Reliable and Energy-Efficient Network-on-Chip Architecture using Adaptive Routing and Approximate Communication by PADMAJA BHAMIDIPATI has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Avinash Karanth Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract BHAMIDIPATI, PADMAJA, M.S., May 2019, Electrical Engineering RETUNES: Reliable and Energy-Efficient Network-on-Chip Architecture using Adaptive Routing and Approximate Communication (92 pp.) Director of Thesis: Avinash Karanth As the number of processing cores are increasing in a chip multiprocessor (CMP), demand for an energy-efficient and reliable Network-on-Chip (NoC) architecture is increasing. However, energy consumption of NoC continues to increase with the exponential growth in CMPs. Voltage scaling techniques such as Dynamic Voltage and Frequency Scaling (DVFS) and Near Threshold Voltage (NTV) scaling have been proposed to reduce the energy consumption of NoC by scaling the operating voltage and frequency in proportion to the application demand. Apart from DVFS and NTV scaling, recently, approximate communication has been proposed to boost the power savings and reduce latency in NoC for the applications that are not sensitive to imprecise results within an acceptable variance. As transistor technology is scaling down to a few nanometers, aging effects such as Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI) are increasing which worsens the reliability. Scaling down the transistor size along with the supply voltage increases the susceptibility of NoC to soft errors. Faults and disturbances due to aging and voltage scaling causes serious degradation in reliability of NoC. In this thesis, I propose RETUNES - Reliable and energy-efficient NoC design, where power efficient and fault tolerant architecture is modeled without compromising on the performance of NoC. The energy-efficient part of RETUNES is a five voltage/frequency design that includes NTV for high energy gains. The five voltage modes are switched according to the workload for high energy-efficiency and minimum network congestion in NoC. Energy efficiency of RETUNES is further improved by employing approximate 4 communication throughout the execution of application within tolerable error range. The reliability part of RETUNES introduces a hybrid error correction model to handle the faults observed due to aging, voltage scaling, and temperature. In addition to error correction and detection, RETUNES handles uneven aging in NoC which is caused by uneven distribution of traffic. Adaptive routing algorithm is modeled to even out the non-uniform device wear-out and thereby, minimize the impact of aging in NoC. RETUNES decreases power consumption and threshold voltage variation (∆Vth) during low network load with high reliability and increases the network performance during high network load with reduced reliability. Simulation results of RETUNES demonstrated nearly 2.5 × total power savings and 3 × improvement in Energy-Delay Product (EDP) of NoC for Splash-2 and PARSEC benchmarks on a 4 × 4 concentrated mesh architecture. Simulation results also showed 13% decrease in the energy consumption of NoC, 10% decrease in latency, and 19% EDP improvement by incorporating approximate communication technique. 5 Dedication I would like to dedicate this thesis to my wonderful husband and my family, this would not have been possible without their support. 6 Acknowledgments This work was partially supported by National Science Foundation (NSF) grants CCF- 1420718, CCF-1513606 and CCF-1703013. Firstly, I would like to thank Dr. Avinash Karanth for his support and direction. I would also like to thank my committee members Dr. Kaya, Dr. Stinaff, and Dr. Chenji for their feedback and information. 7 Table of Contents Page Abstract.........................................3 Dedication........................................5 Acknowledgments....................................6 List of Tables......................................9 List of Figures...................................... 10 List of Acronyms.................................... 13 1 Introduction..................................... 15 1.1 Network-on-Chip............................... 17 1.2 Energy Efficiency............................... 18 1.2.1 Voltage Scaling............................ 20 1.2.2 Approximate Computing....................... 22 1.3 Reliability................................... 24 1.3.1 Effects of Voltage Scaling and Temperature............. 24 1.3.2 Aging Effects............................. 25 1.3.3 Error Mitigation........................... 29 1.4 Major Contributions.............................. 32 1.5 Organization of Thesis............................ 34 2 RETUNES: Reliable and Energy-Efficient Network-on-Chip........... 35 2.1 Prior Work................................... 35 2.2 RETUNES Architecture........................... 40 2.2.1 Energy Efficiency (EE-Layer).................... 41 2.2.1.1 Voltage Scaling...................... 41 2.2.1.2 Approximate Communication............... 45 2.2.2 Reliability (R-Layer)......................... 49 2.2.2.1 Unified Reliability Model................. 49 2.2.2.2 Encoding Framework................... 50 2.2.2.3 Adaptive Routing..................... 55 2.3 Centralized Control Unit........................... 57 8 3 Performance Evaluation............................... 62 3.1 RETUNES Evaluation Approach....................... 64 3.2 RETUNES Results.............................. 64 3.2.1 Power and Area Overhead Analysis................. 65 3.2.2 Packet Latency Analysis....................... 66 3.2.3 Lifetime Evaluation......................... 67 3.2.4 Reliability Analysis.......................... 70 3.2.5 Energy-Delay Product........................ 71 3.3 Approximate Communication Evaluation.................. 72 3.3.1 Packet Latency Analysis....................... 73 3.3.2 Power and Energy Analysis..................... 75 3.3.3 Energy-Delay Product (EDP) Analysis............... 76 4 Conclusions and Future Work............................ 79 References........................................ 81 9 List of Tables Table Page 2.1 Traffic load (Flits/cycle), temperature (Celsius) and delay overhead (cycles) calculated for the corresponding voltage modes of RETUNES.......... 45 3.1 Applications used in the design.......................... 63 10 List of Figures Figure Page 1.1 Microprocessor trend for the past four decades where frequency gains and single-thread performance no longer provide sufficient gains [Rup18]...... 16 1.2 Common NoC topologies............................. 18 1.3 Router microarchitecture with cross bar and router pipeline stages (left) and Network Interface which serves as a connection between router and its cores (right)........................................ 19 1.4 Maximum Energy Point (MEP) that is observed in the NTV region[Yu]..... 22 1.5 Variation of delay and energy with operating voltage at super, near, and sub threshold voltage regions [Mit15]......................... 23 1.6 HotSpot thermal map of the traffic flow in NoC where utilization is shown as temperature raise.................................. 26 1.7 Threshold voltage shift (∆Vth) due to NBTI and HCI effect at different temperatures (a) and different supply voltages (b)................. 28 1.8 Transmission and Re-transmission in communication network between source and destination................................... 30 2.1 Reconfigurable NoC architecture based on the network traffic [CPK+13]..... 37 2.2 Control device architecture with router and layer controllers to switch NoC voltage levels [RJCR16].............................. 39 2.3 Percentage of buffer utilization at different simulation time (cycles) for blacksholes (left) and LU (right) applications................... 42 2.4 Traffic pattern of blackscholes application at different epochs to determine epoch size for RETUNES............................. 43 2.5 Figure shows the flow of original image read from the Memory Control Unit (MCU) and approximated JPEG image sent back to the Memory Control Unit (MCU)....................................... 47 2.6 Shows the approximation performed on 10bit data, where ’d’ represents the number of duplicates following a digit....................... 48 2.7 JPEG encoder, Memory Control Unit (MCU) and approximating core mapped on NoC....................................... 48 2.8 Unified fault model showing error range separately for threshold voltage variation (∆Vth) and bit errors observed in RETUNES.............. 51 2.9 Flowchart shows appropriate encoding layer (e2e or s2s) used in RETUNES for different error ranges (Ne,Fe,Me)....................... 52 2.10 RETUNES switch-to-switch encoding layer microarchitecture showing encoder and decoder of R-layer along with the router pipeline stages........... 53 2.11 RETUNES end-to-end encoding layer microarchitecure showing encoder and decoder of R-layer at the Network Interface (NI)................. 54 11 2.12 map of the single router explaining five directions

Reliable and Energy-Efficient Network-On-Chip Architecture Using

GPU Architecture • Display Controller • Designing for Safety • Vision Processing

Directx and GPU (Nvidia-Centric) History Why

MSI Afterburner V4.6.4

Graphics Shaders Mike Hergaarden January 2011, VU Amsterdam

CPU-GPU Benchmark Description

Model System of Zirconium Oxide An

ATI Radeon Driver for Plan 9 Implementing R600 Support

04-Prog-On-Gpu-Schaber.Pdf

1 Títol: Aceleración Con CUDA De Procesos 3D Volum

HP Z800 Workstation Overview

Nvidia's GPU Microarchitectures

Evaluating ATTILA, a Cycle-Accurate GPU Simulator