A Universal Self-Calibrating Dynamic Voltage and Frequency Scaling (DVFS) Scheme with Thermal Compensation for Energy Savings in Fpgas

A Universal Self-Calibrating Dynamic Voltage and Frequency Scaling (DVFS) Scheme with Thermal Compensation for Energy Savings in FPGAs Shuze Zhao1, Ibrahim Ahmed1, Carl Lamoureux1, Ashraf Lotfi2, Vaughn Betz1 and Olivier Trescases1 1University of Toronto, Toronto, ON, Canada 10 King’s College Road, Toronto, ON, M5S 3G4, Canada 2Altera Corp., Hampton, NJ, 08827, USA Email: [email protected] Abstract— Field Programmable Gate Arrays (FPGAs) are Currently, FPGA designers operate each IC at its rated widely used in telecom, medical, military and cloud computing nominal voltage, and must choose a clock frequency at or applications. Unlike in microprocessors, the routing and critical below the limit predicted by the Computer-Aided Design path delay of FPGAs is user dependent. The design tool sug- gests a maximum operating frequency based on the worst-case (CAD) tool’s timing analysis. This timing analysis is extremely timing analysis of the critical paths at a fixed nominal voltage, conservative, using worst-case models for process corners, on- which usually means there is significant voltage or frequency chip voltage drop, temperature and aging. In the vast majority margin in a typical chip. This paper presents a universal of chips and systems, however, the supply voltage can be offline self-calibration scheme, which automatically finds the reduced significantly below nominal in order to obtain energy FPGA frequency and core voltage operating limit at different self-imposed temperatures by monitoring design-specific critical savings. Operating the IC at a lower voltage also reduces paths. These operating points are stored in a calibration table the impact of aging effects such as Bias-Threshold Instability and used to dynamically adjust the frequency and core voltage (BTI), and improves the chip lifetime [12], [13]. according to the FPGA temperature when the application circuit is running. The self-calibration process is demonstrated on an Altera Cyclone IV 65-nm FPGA with a digitally controlled dc-dc A. Prior Work converter, leading to 40% power savings in a typical digital filter In [14], a DVS scheme with a Complex Programmable application. Logic Device (CPLD) load is presented. A CPLD is similar to an FPGA but has a non-volatile configuration memory. I. INTRODUCTION This scheme does not monitor the logic error due to the Field Programmable Gate Arrays (FPGAs) can outper- timing violation. In [15], a Logic Delay Measurement Cir- form microprocessors and Digital Signal Processors (DSPs) cuit (LDMC) is used to determine the voltage at which the in many applications, thanks to their ability to implement application circuit has a timing failure, and adjusts the supply massively parallel algorithms [1]–[3]. Since FPGAs can be voltage accordingly. It assumes that the critical path in the reprogrammed to accommodate evolving standards, they e- application circuit can be exercised by randomly generated liminate the custom manufacturing and resulting high Non- inputs during calibration, which is not valid in modern FPGA Recurring Engineering (NRE) costs and development time of applications. Approaches in [14] and [15] also rely on a non- Application-Specific Digital ICs (ASICs). Thus FPGAs are valid assumption that the VCO/LDMC delay value perfectly widely used in telecom, medical, military and cloud computing tracks the delay variation in the application circuit critical applications. However, the flexibility of FPGAs comes at a paths with temperature and aging. significant cost; they typically consume ten times the dynamic In [16]–[18], online timing slack measurement is achieved power of an ASIC performing the same task [4], making power by using a phase-shifted clock and one shadow register for reduction techniques crucial for FPGAs. Dynamic Voltage each critical path to determine timing headroom in a circuit and Frequency Scaling (DVFS) has been widely deployed during operation. This approach has several notable shortcom- in microprocessor applications over the past decade [5]– ings: [11]. The fact that an FPGA can be programmed to perform 1) the timing slack measurement is dependent on the input any digital function gives rise to some unique challenges in data, which cannot be controlled during normal opera- designing a DVFS control system. Unlike microprocessors, tion, the speed-limiting paths of a specific FPGA IC are unknown 2) the technique is limited to FPGA components where a at manufacturing time; hence mimicking the critical path and second capture register can be added at the end of a setting the minimum core voltage for the DVFS control system critical path, which is not feasible for important ‘hard’ is a major challenge. blocks such as the on-chip RAM, FLASH Memory Application Calibration Application DVFS CT Code Config File Config File Step 1 Step 2 Startup Application Calibration Running with DVFS Fig. 1. Two step DVFS scheme. 3) the scheme requires extra logic elements (LEs) and clock • a digital dc-dc controller, resources, increasing circuit power and reducing the • a calibration controller. usable capacity of the FPGA. Each of the critical paths is exercised by toggling the source In addition, the past works [15]–[18] do not employ a high- register and checking that the sink register captures the correct frequency digital dc-dc converter to generate the variable core value. And for each critical path, a fast path (a buffer or an supply voltage. Many important practical issues, such as the inverter) that behaves as the critical path is synthesized to converter response time and quantization issues, are therefore identify what the correct value is. The output of this fast path ignored. and the critical path is compared together. The fast path is In this work, a new offline universal self-calibration scheme designed such that it does not fail the timing at the maximum that requires close interaction with the digital dc-dc converter applied frequency. is proposed to automatically characterize the exact relationship Heater cells are distributed across the entire chip, and are between the maximum operating frequency for each core used to create different die temperature conditions. The FPGA voltage and temperature corner. This information is saved in proceeds to run the calibration, using self-heating and automat- a calibration table that is used during normal operation for ic timing error checking, and populates the DVFS calibration DVFS. table (CT). An ideal calibration table is demonstrated in Fig. 2. The detailed calibration scheme is described in the following II. SELF-CALIBRATION CONCEPT WITH TWO-STEP session. CONFIGURATION The proposed universal self-calibration process is intended fsys to run on a system production line, or regularly during each power-up sequence of a system. It is therefore important that fmax TRoom T1 this process (1) be reasonably fast and (2) require the minimum ... T possible FPGA resource overhead. The self-calibration method n has three steps and requires the FPGA to be programmed twice, as shown in Fig. 1: Tmax 1) The user’s design is automatically analyzed by the augmented CAD tool to extract the logic paths having the most TRoom <T1 <Tn<Tmax critical timing. A design-specific self-calibration configuration fmin file is then created. The critical paths used in self-calibration Vmin V are exact replicas of the critical paths in the application; they max Vcore are placed and routed using the identical resources (routing Fig. 2. An ideal CT as the result of the self-calibration process. wires, LEs, etc.). All inputs along the critical path are set to non-controlling values to guarantee that the path is synthe- 3) Finally, when the self-calibration is complete, the FPGA sized. These non-controlling values are selected to mimic the is automatically programmed a second time with the user’s worst-case rising and falling pattern reported by the tool. The regular configuration file, as well as the DVFS control system CAD automation ensures that no additional designer effort is which relies on the extracted calibration table, as shown in required. Fig. 3(b). Based on the clock frequency requirements and chip 2) The FPGA is programmed once with the self-calibration temperature, the DVFS control core refers to the calibration configuration file, as shown in Fig. 3(a). The on-chip config- table and set the according core voltage, Vcore. uration contains: • the design-specific critical paths with error checking III. SYSTEM LEVEL ARCHITECTURE circuit, The system architecture is shown in Fig. 4 and includes • flip-flop chain based logic blocks configured as pro- a 65-nm CMOS Cyclone IV FPGA (EP4CE115F29C7N). grammable heaters for temperature control, The two-phase Buck converter has an input voltage of 5 V • a temperature sensing circuit, and regulates the FPGA core voltage, Vcore, between 0.85 • a frequency synthesizer, - 1.35 V. The main phase, which delivers the majority of FPGA: Self-Calibration Config. DC-DC Power Stage Vmin PWM aux UVP OVP CPLD L &03 UVP aux Startup_Flag Vmax &03 OVP OVP & UVP oC 5V Enpirion Power Stage (ET4040QI) Lmain clk_ref T IL(t) clk_heat Calibration Memory PWM main ADC 1.024V CLK Synth. I (t) Control I (t) ref PLL Calib. Table L DAC clk_sys critical paths (same placement V ref Iref [n] as real application) Vcore f T err [n] DC-DC &03 Control PWM reset Iref Vcore Heater Logic DC-DC Error Checker Power-Stage DE2-115 Critical path(s) Flash Memory Application Calibration DVFS CT (a) Config Config FPGA: Application Config. I/O Data & Address BUS Vcore (t) Full Self-calibration/ Cyclone IV 0.85-1.35V Application Application Circuit FPGA Circuit CO clk_sys ... oC PLL clk_ref clk_ref T clk_heat 50MHz CLK CLK Synth. DVFS Memory Generator PLL Control Calib. Table DC-DC DVFS clk_sys SPI1 Vref Control Control Vcore f T Temp DC-DC SPI2 Sensor Control Startup_Flag I/O UVP PWM aux OVP I Vcore ref PWM main PWM reset Critical path(s) DC-DC Non-critical paths Power-Stage Fig.

A Universal Self-Calibrating Dynamic Voltage and Frequency Scaling (DVFS) Scheme with Thermal Compensation for Energy Savings in Fpgas

Analysis and Optimization of Dynamic Voltage and Frequency Scaling for AVX Workloads Using a Software-Based Reimplementation

Drmos King of Power-Saving

Website Designing, Overclocking

A+ Guide to Managing and Maintaining Your PC, 7E

Power-Aware Load Balancing of Large Scale MPI Applications

Dynamic Management of Turbomode in Modern Multi-Core Chips David Lo and Christos Kozyrakis Stanford University {Davidlo, Kozyraki}@Stanford.Edu

2010 VISION Quick Reference Guide for Desktop

Latency-Aware DVFS for Efficient Power State Transitions on Many

Overclocking Matters For 3D Architectural Cad Design

Cost-Efficient Overclocking in Immersion-Cooled Datacenters

Dynamic Voltage and Frequency Scaling with Multi-Clock Distribution Systems on SPARC Core" (2009)

Overclocking Assistant (Performance Tuning Guide) for Intel® NUC Kit Nuc8i7hnk and Nuc8i7hvk Objective Memory Overclock Perfor