Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1

Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1 Hazem A. Abdelhafez, Matei Ripeanu fhazem,[email protected] Abstract—Nowadays, heterogeneous unified memory archi- [4]. In this approach, instead of sending data to the cloud tecture platforms are becoming increasingly common. These to process it and return the results to the user, IoT devices platforms incorporate several co-processors on a single chip with directly run computation algorithms on the Edge device or on a shared physical memory. The use cases for such platforms can vary dramatically. On the one hand, they can be used in the a geographically co-located cluster-like resources (sometimes context of Edge computing, which cannot tolerate high latency dubbed cloudlets or elements of Fog computing). In all these and has strict energy/power constraints. On the other hand, scenarios energy efficiency is a major concern. motivated by their growing computing capabilities, and their Solutions at various levels of the computing stack have been energy-efficiency, many have considered replacing traditional bulky servers with these platforms to deliver the same computing proposed to address the energy-efficiency of the computing power but with lower energy budget. This study is an exploratory infrastructure. For instance, the adoption of dedicated acceler- step to understand the trade-off between power consumption, ators or highly-parallel architectures has increased drastically message latency, and throughput on a low-power heterogeneous especially heterogeneous platforms that incorporate several co- platform for data stream processing workloads by characterizing processors. GPUs’ massively-parallel architecture allows them several common computing kernels found in computer vision algorithms. Our preliminary experiments on NVIDIA Jetson TX1 to be more energy efficient for throughput oriented problems. show that it is possible reduce power consumption by up to 12%. Field Programmable Gate Arrays (FPGAs) and Application- specific Integrated Circuits (ASICs) have also witnessed wide I. INTRODUCTION adoption (e.g. Microsoft Catapult [5], Google Tensor Process- Recently, heterogeneous computing has become the leading ing Unit (TPU) [6]). approach to raise the performance envelope and reduce the In data-center environments, resource consolidation and energy consumption of computing systems. Energy consump- scheduling algorithms that aim at reducing the consumed tion not only poses a huge threat to our environment, but also power has also been proposed [7], [8], [9], others developed contributes significantly to the cost of any computing-based algorithms that predict the future workload and turn-off or on service. For example, in cloud computing environments, power the servers to reduce power consumed during low load cycles draw represents ∼15% of the Total Cost of Ownership (TCO) [10]. Moreover, there are efforts that focus on the energy [1], while cooling and other hardware equipment accounts complexity of the computational algorithms [11] with the goal for the rest. Moreover, for each 1 W of power drawn by of understanding how the algorithms behaviour affect energy the computing devices, 0.5 to 1 W is drawn by the cooling consumption and subsequently making them more energy system [2], which indirectly increases the contribution of efficient. the computing equipment in the TCO. In 2016, a study Recently, heterogeneous unified-memory platforms that in- showed that data-centers in the USA consume approximately corporate low-power CPUs for running the operating sys- 70,000GWh which represents 1.8% of the total electric power tem and low-load tasks and GPUs for high-load tasks have produced in the USA and is projected to increase by 4% by emerged. An example of such platforms is the NVIDIA 2020 [3]. Jetson embedded boards family. These platforms target Edge Additionally, the growing number of IoT devices and their computing scenarios (e.g. self-driving cars, drones, robots, various use cases represent a challenge to existing cloud etc.). However, the compute capabilities offered by their computing paradigm. On the one hand, resource consolidation heterogeneous resources made them potential candidates to and flexible scaling features have made cloud computing the replace bulky servers in data-center environments as well, backbone of the majority of large-scale computing environ- mainly because of their much lower idle base power compared ments, but, on the other hand, it is not well-suited for latency- to traditional servers. sensitive applications. Moreover, the large number of con- Objective of this paper: Stream data processing model is nected devices led to higher bandwidth cost and congestion, common for a wide range of applications (e.g. multimedia with additional privacy and security concerns from the users’ stream processing, dynamic graph algorithms, inference task perspective. These shortcomings led to the emergence of a in interactive AI applications, and IoT applications.) This study new computing paradigm known as Fog or Edge computing aims to answer two questions: This work is supported in part by the Institute for Computing, Information In the context of streaming applications, can we save and Cognitive Systems (ICICS) at UBC. power by tuning the CPU and memory controller to low frequencies and at the same time maintain the system’s that a consumer system sustains a minimum processing rate stability? expressed as F rames=Sec or else some frames are dropped Are the default power management capabilities on Jetson resulting in a degraded user experience. TX1 able to deliver best performance-per-watt? if not, Generally, data stream processing systems can be deployed then how much more energy can we save? on several computing nodes where the nodes are connected Contributions: We show that on a low-power heterogeneous together via a network interface or deployed on edge devices platform such as the Jetson TX1, we can tune the system to closer to the data producers to save network bandwidth. save from 2% to 12% of the consumed power by operating B. Energy Efficiency in Data Stream Context on lower frequencies of the CPU and the memory controller Power is the amount of energy, expressed in Joules (J), while increasing the average response time to a certain limit consumed per time unit (one second) and is measured in that still fulfills the QoS metrics (message processing rate) of Watts (W). Power consumption in a computing system can a stream-based application. be calculated as the sum of dynamic power and static power. This paper is organized as follows: Section II presents back- For a processing unit, static power is the power consumed ground information about data stream systems and energy- by its hardware circuitry in the idle state, and it represents the efficiency in a computing environment along with an illustra- baseline for the total power consumption. Dynamic power is tion of the main features of the Jetson platform. Section III, the power consumed by applications utilizing the processing details our methodology and describe the system components unit and it depends on the application behaviour and the for our experiments and how they interact together. Section IV, available hardware resources. It is also referred to as switching presents and discusses the results obtained and main lessons power and it results from: learned from the experiments. Section V, presents previous work done in the context of energy-efficiency of data stream Capacitor charging and discharging in the electronic processing. Finally, Section VI, summarizes the key findings circuits due to data flow changing the states of the and limitations. transistors between ON/OFF states. Leakage current due to short circuit during the transition II. BACKGROUND phases from 0 to 1 and from 1 to 0. A. Data Stream Systems Dynamic power can be calculated as follows: 2 In the context of this paper, a data stream system is a P = αCV f (1) system in which data is generated by a single or multiple where α[0; 1] is the activity factor and it represents the producers as small batches of data (in the order of bytes to average number of times the CMOS transistors switch between kilo-bytes) and consumed by a single or multiple consumers. the 1 and 0 states per clock cycle, in other words, it is A producer can be: physical sensors, such as weather, medical, application workload dependent. C is the capacitance and surveillance sensors, or IoT devices, such as smart appliances, it is the physical property of the circuit components, V is traffic cameras, wearables, etc. [12], [13]. the supply voltage, f is the clock frequency that drives the A consumer in such a system can be any processing transistors. The supply voltage is proportional to the operating device that incorporates one or more processors that can read, frequency, this is to guarantee stability of the system and its interpret and transform the incoming data batches from the ability to maintain the sought operating frequency [14]. producer to output a specific final or intermediate result that Most operating systems include a functionality to control can be further processed. the operating voltage and frequency of the various devices. A consumer in a data stream processing system typically This functionality is referred to as Dynamic Voltage Fre- applies a compute kernel on each batch of data or on a time quency Scaling (DVFS) [15]. DVFS is a hybrid between window of several batches. Batches of data in a stream system Dynamic

Load more