Opportunities for Energy Efficient Computing: a Study of Inexact General Purpose Processors for High-Performance and Big-Data Applications
Total Page:16
File Type:pdf, Size:1020Kb
Opportunities for Energy Efficient Computing: A Study of Inexact General Purpose Processors for High-Performance and Big-data Applications Peter Düben Jeremy Schlachter AOPP, University of Oxford EPFL Oxford, United Kingdom Lausanne, Switzerland [email protected] [email protected] Parishkrati, Sreelatha Yenugula, John Augustine Christian Enz Indian Institute of Technology Madras EPFL India Lausanne, Switzerland [email protected], [email protected] [email protected], [email protected] K. Palem T. N. Palmer Electrical Engineering and Computer Science AOPP, University of Oxford Rice University, Houston, USA Oxford, United Kingdom [email protected] [email protected] Abstract—In this paper, we demonstrate that disproportionate hardware artifacts such as adders, multiplier or FFT engines gains are possible through a simple devise for injecting inexactness [22], [23], [20], [21]. For additional references on inexact or approximation into the hardware architecture of a computing hardware, architectures and related topics, please also see [1], system with a general purpose template including a complete [32], [12], [39], [35], [2]. More recently, this work has also memory hierarchy. The focus of the study is on energy savings been extended to evaluations of large-scale applications—in possible through this approach in the context of large and challenging applications. We choose two such from different ends of the context of our own group, climate modeling and weather the computing spectrum—the IGCM model for weather and climate prediction served as the driving example in which the floating modeling which embodies significant features of a high-performance point operations were rendered inexact [9]. computing workload, and the ubiquitous PageRank algorithm used in It is however widely recognized that if we were to consider Internet search. In both cases, we are able to show in the affirmative general purpose computing systems—as opposed to that an inexact system outperforms its exact counterpart in terms of architectures with accelerators with a high degree of its efficiency quantified through the relative metric of operations per customization [5]—energy costs associated with the memory virtual Joule (OPVJ)—a relative metric that is not tied to particular portion of the computation dominates to a great extent. As a hardware technology. As one example, the IGCM application can be result, work that isolates and identifies opportunities in the used to achieve savings through inexactness of (almost) a factor of 3 in energy without compromising the quality of the forecast, computational part of the application by largely overlooking quantified through the forecast error metric, in a noticeable manner. the memory component, as opposed to treating it as a As another example finding, we show that in the case of PageRank, significantly dominant component, is justifiably criticized an inexact system is able to outperform its exact counterpart by close whenever a traditional architectural template involving a data to a factor of 1.5 using the OPVJ metric. path and a standard memory hierarchy is the platform of choice. In this paper, we remedy this situation by extending the reach of our studies to understand the limits to energy benefits that I. INTRODUCTION AND OVERVIEW OF FINDINGS can be gleaned at the application level by including Inexact or (approximate) computing has been presented as a inexactness both in the computation along the data path and significant and potentially unorthodox approach to realizing also in the memory system. We do this by considering two low energy computing systems. In terms of hardware applications, the Intermediate General Circulation Model manifestations and models, following the early foundational (IGCM) for weather and climate modeling [4] and the widely work that identified this opportunity (please see [8], [28], [29]; used PageRank algorithm [2],[26] at the heart of internet and [30], [31] for a more comprehensive overview), the most search. In terms of understanding the limits to opportunities, robust validations have been through construction of specific the IGCM model involves significant amounts of floating 978-3-9815370-4-8/DATE15/c 2015 EDAA 764 point computation whereas, in contrast, PageRank is which lowers the number of operations performed. significantly memory bound; thus, they lean towards two ends Surprisingly, with an energy of 0.89 of the exact case, the of the opportunity spectrum. average VD value jumped up to a whopping 100.3—almost In order to evaluate the opportunity due to inexactness from two orders of magnitude larger than the comparable inexact the perspective of energy savings, we introduce an energy configuration! model (cf. Section II) which is based on two input elements. In achieving energy savings through inexactness, it is The first element involves an application activity profile reasonable to consider simplifying the application by altering involving activity as the program executes in different parts of its program implementation either statically, or by lowering the architecture. Our activity profiles are relative with the some execution parameter such as the number of its iterations activity in the entire application adding up to 100%. These for example. Thus, the gains need not have been derived from activity profiles were gathered using Valgrind [36], which is a inherent system technology improvements but rather purely publicly available tool. Next, we define the normalized energy through algorithm or program changes only. To sharply isolate cost model where the simplest integer operation represents the the efficiencies that our approach gives us which can be cheapest operation quantified as 1 unit of energy attributed to inexactness from the architecture elements, we consumption—all of the other activities are then represented introduce a novel metric which we will refer to as operations as multiples of this value. The overall energy cost is then per virtual Joules (OPVJ). estimated by combining the activity profile with the We show that a reduction of precision can increase OPVJ in normalized energy cost in an obvious way once the total the atmosphere model by more than a factor of three at operation count is known. approximately the same “goodness” of the solution—to Continuing, we consider two scenarios for executing our two reiterate, inexactness is induced by lowering the number of applications. The first scenario involves architectural bits in the Mantissa. We see a very similar retention of elements—adders, multipliers, DRAM main memory, SRAM goodness in the solution in PageRank along with an increase caches—to be exact namely embodying full precision. In the in OPVJ value by a factor of 1.5 for the exact and inexact second scenario, both the computation elements on the data cases compared in the previous paragraph. path as well as the memory structures are inexact—we vary and increase degrees of inexactness. Of the mechanisms II. MODELLING PROCESSOR ENERGY CONSUMPTION possible, in this paper, we induce inexactness by reducing the With the intent of gaining insights that are oblivious to the number of bits in the mantissa of the double precision floating exact technology in use—be it current or future—our goal in point number or the width of integer operations. For example, this work is to show the relative gains offered by inexact we use between 8 and 12 bits for the significand or mantissa computing. Nevertheless, we wish to rigorously model energy when considering the atmosphere application and consider 14 across all major activities in both computation and memory and 16 bits when evaluating the PageRank application. In both systems. Towards this goal, against each type of activity that cases, the fully precise or exact case will have 52 bit we address, we assign an energy cost relative to the least mantissas. The quality of the solution as the inexactness is expensive activity, which, in our case is integer operations increased is modeled as follows: in the IGCM model, as the pegged at 1 virtual Joule (vJ). Every other activity, therefore, level of inexactness is increased, the forecast error of a is assigned a relative cost in virtual Joules; see Table 1. These meteorological quantity is used to quantify the “goodness” of relative costs have been derived as best estimates obtained by the solution. In PageRank, we use three different distance insights from source [19]. measures—variation distance or VD [24], the κஶdistance, We classify computational operations broadly into floating and Kendall’s ߬ distance [18]—from a baseline exact point operations, integer operations, and others (that would execution. include branches/control flow, etc.)1. In all cases, we estimate the energy savings as the application is executed using an exact model, compared to increasingly Category Relative Energy inexact versions. As a summary, when considering the IGCM (vJ) model (Section IV), with no detectable change in the quality Cache access 15 of the prediction quantified through the global mean error Main memory 30 metric, we could achieve over a factor of three in energy Floating point 2 savings. Similarly, for PageRank, the baseline is an execution Integer 1 using a parameter called tolerance of 0.01. With this value, the Branches (others) 1 page ranking computed on an exact system is deemed to be Table 1. List of operations along with associated relative the “golden value” computed using full precision floating energy costs in virtual Joules. point values with a 64 bit representation has the ideal average VD value of 0. Let us now consider an inexact version with 26 bits. The computation can be realized with a fraction of 0.68 1 of the virtual energy compared to the exact case and the Caches were designed to significantly reduce access times (sometimes up to a factor of 100), but their energy costs are closer to that of main memory – a associated VD value goes up to 1.005. If we tried to achieve factor of two in our accounting.