A New Temperature Distribution Measurement Method on GPU

A New Temperature Distribution Measurement Method on GPU Architectures Using Thermocouples† Aniruddha Dasgupta1, Sunpyo Hong1, Hyesoon Kim2 and Jinil Park3* 1 School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA 2School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA 3School of Mechanical Engineering, Ajou University, Suwon, 443-749, Korea ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Abstract In recent years, the many-core architecture has seen a rapid increase in the number of on-chip cores with a much slower increase in die area. This has led to very high power densities in the chip. Hence, in addition to power, temperature has become a first-order design con- straint for high-performance architectures. However, measuring temperature is very limited to on-chip temperature sensors, which might not always be available to researchers. In this paper, we propose a new temperature-measurement system using thermocouples for many-core GPU architectures and devise a new method to control GPU scheduling. This system gives us a temperature distribution heatmap of the chip. In addition to monitoring temperature distribution, our system also does run-time power consumption monitoring. The results show that there is a strong co- relation between the on-chip heatmap patterns and power consumption. Furthermore, we provide actual experimental results that show the relationship between TPC utilizations and their active locations that reduce temperature and power consumption. Keywords: Temperature Measurement; Many-core Architecture; GPU; Thermocouple ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- must be removed, which could interfere with the natural heat 1. Introduction distribution from a heatsink. Also, measurements performed The number of cores inside a chip is increasing dramatically through such a setup typically require some adjustments to the in today’s processors. For example, NVIDIA’s GTX280 has measured data, so as to accurately represent ideal measure- 30 streaming multiprocessors with 240 CUDA cores, and ments under actual working conditions of the processor (i.e., NVIDIA Fermi GPUs have 512 CUDA cores. On the multi- with a heatsink cooling solution). Thus, there is an opportunity core front, the latest AMD processors have 12 cores. This high for inaccuracies to creep in due to the nature of the modeling. number of cores puts a lot of pressure on designing effective Hence, we propose a new cost-effective temperature- power- and temperature-controlled architectures. Moreover, measurement system that uses thermocouples for the first time the work by Mesa-Martinez et al. [10] showed that tempera- for GPU architectures. We devised a method to install ther- ture is becoming a dominant factor for the determining per- mocouples between a chip and a heatsink. With this system, formance, reliability, and leakage power consumption of we successfully measured the on-chip temperature distribution modern processors. of a GPU processor. Thermocouples provide two benefits over In this paper, we use GPUs as a form of many-core proces- IR cameras. First, they are very low cost and relatively easy to sor. With GPUs, it is possible to validate that temperature- install, even in academia, without special expensive equip- aware thread scheduling can actually reduce power consump- ment. Second, a heatsink can still be placed, so we can meas- tion. Unfortunately, unlike the state-of-the-art multicores, the ure power and temperature simultaneously. Then, we demon- current GPUs do not provide temperature sensors for each strate the need for thermal-aware scheduling algorithms based individual core. Usually, a board-level temperature sensor is on the correlation between the on-chip heatmap and power provided. However, it cannot account for the rampant temper- consumption. ature variations across the chip due to hotspots. Hence, in this paper, we propose a new temperature-measurement system 2. Background and Related Work that allows us to measure the temperature map, while also In this section, we discuss previous chip temperature- measuring the total power consumption. measurement systems and provide a brief background of the Some efforts in academia have focused on measuring tem- evaluated GPU system. perature using infrared (IR) cameras [11] (Although industries have better ways of measuring temperature, typically that 2.1 Chip Temperature Characterization Methods information is not disclosed to the public). IR cameras provide Chip temperature characterization methods can be classified an entire temperature distribution, but set-up cost is very high, into two main branches: 1) modeling methods, and 2) meas- and they require special oil cooling. In other words, a heatsink urement methods. 2.1.1 Modeling Methods IR-based Measurement: Infrared-based thermal imaging has gained popularity as a robust method of characterizing Temperature-modeling methods are mainly relevant to de- thermal behavior [11]. It provides good resolution and accura- sign time thermal characterization. They provide designers cy both in time and space. As such, it has been used in study- with the freedom to try out new designs and perform simula- ing dynamic thermal management techniques. Its external tions. Also, such thermal models can be plugged into microar- nature also helps in making decisions regarding the placement chitecture simulators to see the effect of changing microarchi- location of thermal sensors on the chip at temperature-critical tectural parameters on temperature or the effect of running portions. However, there are a few limitations of using IR different benchmarks. One of the most popular thermal mod- imaging, some of which have already been pointed out by els is HotSpot [9]. Based on the duality of heat transfer and Huang et al. [7]; IR rays cannot pass through metal. Generally, electricity, the authors have modeled various microarchitec- processors are encompassed with a metallic heat dissipation ture components into equivalent thermal resistances and ca- solution like a heatsink. So, for IR imaging to work, the pacitances. HotSpot can also be used to model a particular heatsink needs to be removed and an alternate cooling solution thermal package for the chip and to observe its thermal char- needs to be provided. One of the prevalent methods in this acteristics. By plugging the HotSpot thermal model into a case is removing the heatsink and providing laminar oil- simulator, one can track the thermal properties of individual cooling over a bare silicon die [11]. However, this results in components under load, understand a program’s thermal be- different transient and steady-state thermal responses com- havior, evaluate thermal management techniques, etc. The pared to a conventional cooling solution like a heatsink [7]. thermal model is portable and flexible, and it can be built upon The other limitation is that the cooling capacity of oil is rough- to cater to particular requirements. However, verification of ly proportional to the size of the oil tank and the velocity of the model on absolute thermal values is still a challenge. the oil flow. In order to cool 100W-300W cores, the speed of oil flow has to be fast, thereby easily producing more distorted 2.1.2 Measuring Methods images. Although this method has high merits when done correctly, it comes with high set-up cost and time. Temperature measurement methods are mainly relevant to runtime thermal management techniques, which require a 2.1.2 Pros and Cons of using Thermocouples temperature measurement to occur in real time. Also, though thermal simulation models aim to faithfully mirror the behav- The most notable advantage of using thermocouples is the ior of the system, they are based on the designer’s understand- cost and the ease of use for the measurement. Not only is it ing of what factors affect the thermal characteristics of the suitable for measurements up to 750 degrees Celsius, but it system. So, modeling methods need to be validated against has a very thin diameter, and being a wire, it can easily be actual measurements of some sort to ensure the accuracy of placed anywhere. However, placing this wire in a specific the model and thus they require the existence of robust ther- location is a challenge as the pressure applied on it could af- mal-measurement methods. In the realm of performance, fect the temperature readings. Furthermore, the resolution of modern processors provide measurement instruments in the the readings is very limited to IR measurement. However, form of hardware performance counters. However, for tem- with a very well designed thermo-spacer and some knowledge perature, processors, especially many-core processors, do not of the GPU processor layout, using thermocouples provides yet have a concrete built-in measurement system. Though the the best cost-effective solution. Also, reconstructing a exact methods used in the industry to measure temperature are heatmap from thermocouple readings is much simpler than in not known, there are

Load more