Summarizing CPU and GPU Design Trends with Product Data
Total Page:16
File Type:pdf, Size:1020Kb
Summarizing CPU and GPU Design Trends with Product Data Yifan Sun, Nicolas Bohm Agostini, Shi Dong, and David Kaeli Northeastern University Email: fyifansun, agostini, shidong, [email protected] Abstract—Moore’s Law and Dennard Scaling have guided the products. Equipped with this data, we answer the following semiconductor industry for the past few decades. Recently, both questions: laws have faced validity challenges as transistor sizes approach • Are Moore’s Law and Dennard Scaling still valid? If so, the practical limits of physics. We are interested in testing the validity of these laws and reflect on the reasons responsible. In what are the factors that keep the laws valid? this work, we collect data of more than 4000 publicly-available • Do GPUs still have computing power advantages over CPU and GPU products. We find that transistor scaling remains CPUs? Is the computing capability gap between CPUs critical in keeping the laws valid. However, architectural solutions and GPUs getting larger? have become increasingly important and will play a larger role • What factors drive performance improvements in GPUs? in the future. We observe that GPUs consistently deliver higher performance than CPUs. GPU performance continues to rise II. METHODOLOGY because of increases in GPU frequency, improvements in the thermal design power (TDP), and growth in die size. But we We have collected data for all CPU and GPU products (to also see the ratio of GPU to CPU performance moving closer to our best knowledge) that have been released by Intel, AMD parity, thanks to new SIMD extensions on CPUs and increased (including the former ATI GPUs)1, and NVIDIA since January CPU core counts. 1st, 2000. In total, the dataset consists of 4031 products (2102 CPUs and 1929 GPUs). For each product, we record its release I. INTRODUCTION date and properties, such as process size, die size, transistor Moore’s Law [8] and Dennard Scaling [2] have guided count, base frequency and thermal design power (TDP). the development of the semiconductor industry. Moore’s Law A large number of GPU products (e.g., NVIDIA Tesla K80) predicted that the number of transistors on a chip would double have more than one GPU chip on a single PCB. For these every 18 months. Dennard scaling stated that the energy products, we only consider the properties of one of its chips. consumption of a chip would stay in proportion to the size of As an exception, the emerging Multi-Chip-Module (MCM) the chip. Shrinking transistor sizes (i.e., reducing the process packaging technology effectively incorporates multiple chips size) would allow us to increase the computing capabilities of in one package. We consider the multiple chips in a MCM the device without consuming more energy. Moore’s Law and package (e.g., AMD Threadripper CPUs) as a single chip. Dennard Scaling charted a promising future for computers. As Because MCM packages are usually programmer transparent, we reduce the transistor size, the performance of computing a programmer can use multiple chips in an MCM package as if devices would exponentially increase over time. they are a single chip. The development of interposer technolo- Recently, the semiconductor industry has faced unprece- gies can also provide promising communication bandwidth dented challenges. As the transistor sizes is approaching the between MCM chips. With recent technologies, MCM-CPUs practical limitations of physics, reducing transistor sizes can and MCM-GPUs can be as efficient as single large chips [1], hardly improve performance. To guide future semiconductor [4]. Another large group of products considered in this study arXiv:1911.11313v2 [cs.DC] 13 Jul 2020 research, we need to revisit these historical development trends and find new directions that can drive performance represent integrated CPU-GPU devices. We consider such de- improvement. vices as CPUs since the CPUs are the dominating components The Central Processing Unit (CPU) and Graphic Processing in these devices. We only consider the CPU on an integrated Unit (GPU) are the two mainstream options for general- CPU-GPU chip when calculating their theoretical computing purpose computing. The former primarily serves as the pro- capability. We use the Floating Point Operations Per Second cessor for general tasks executed on mobile devices, personal (FLOPS) or Tera-FLOPS (TFLOPS) as the metrics to evaluate computers, and servers. The latter takes on compute-intensive their theoretical single-precision and double-precision comput- tasks for graphics rendering, big-data analytics, signal pro- ing capabilities. cessing, artificial intelligence, and physics simulation [7]. To For device frequency, we only consider the base clocks. improve computing system performance, we need to better We do not consider the boost clocks as devices cannot con- understand CPU and GPU design trends. tinuously work at the boost clock frequencies. We estimate To study the trends, we collect and analyze the CPU and 1The data sources include www.techpowerup.com, wikichip.org, and vendor GPU data from public technical specifications of released websites. Process Size 220 nm 150 nm 110 nm 80 nm 55 nm 40 nm 28 nm 20 nm 14 nm 10 nm 250 nm 180 nm 130 nm 90 nm 65 nm 45 nm 32 nm 22 nm 16 nm 12 nm 7 nm 10 10 10 CPU 10 GPU 9 9 10 10 Transistors 8 Transistors 8 10 10 7 7 10 10 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 (a) Transistor count of CPUs. (b) Transistor count of GPUs. Fig. 1. Moore’s Law is still valid for both CPUs and GPUs. AMD NVIDIA 800 7 10 600 6 10 400 Die Size (mm^2) 200 Density (Transistor per mm^2) 5 10 7 nm 90 nm80 nm65 nm55 nm45 nm40 nm32 nm28 nm22 nm20 nm16 nm14 nm12 nm10 nm 250 nm220 nm180 nm150 nm130 nm110 nm 2000 2004 2008 2012 2016 2020 Process Size Release Date Fig. 2. Transistor Scaling. Fig. 3. GPU Die Sizes. III. FINDINGS energy consumption using the Thermal Design Power (TDP). A. Moore’s Law We admit that TDP may not be the best metric to evaluate a device’s power consumption. Chips can easily consume more Researchers have been predicting the end of Moore’s Law power than their TDP specification for a short periods of time. for a long time [5], [6], [10], [11]. However, Figure. 1 Chips may also maintain energy consumption under TDP by suggests the number of transistors on a chip is still increasing shutting down parts of the circuit. However, TDP can be a exponentially over time. For CPUs, we lack critical data reasonable estimation of the average power consumption when from the years 2014 to 2017 because Intel stopped releasing a device runs regular workloads at its base clock. transistor count and die size data since their 7th generation Core CPU. However, AMD restarted to produce high-end Although we have most of the product data, we miss the CPUs with large die-size recently. We can observe that the critical data of CPUs produced by Intel since 2014. Intel CPU transistor scaling trend is continuing to follow the pre- stopped revealing its CPU transistor counts and the die sizes 2014 trend. Also, Figure. 1 suggests that vendors tend to since the 7th generation Core CPU. Therefore, when present- use new CMOS technologies in high-end products first. Low- ing detailed studies that are related to transistor counts and end products may continue to use an older version of the die sizes (Figure. 2, 3, 4), we focus on the GPU data. Also, in CMOS process when the high-end products switch to a new these analyses, we only show devices that are manufactured by technology node. the Taiwan Semiconductor Manufacturing Company (TSMC). The next question we need to ask is, ”What drives increases We select TSMC because TSMC has produced the highest in transistor count?” In Figure. 2, we see that transistor scaling number of chips in our dataset. We only consider chips from is playing an essential role in increasing the die density (num- a single foundry since different foundries tend to use different ber of transistors per area). For the devices produced within a measurements when computing their process sizes. specific process size, the variance in terms of transistor density 3 10 1.0 1500 0.8 2 10 0.6 1000 1 0.4 10 Watt Per mm^2 500 0.2 Base Clock (MHz) Watt Per Billion Transistors 0 10 0.0 7 nm 7 nm 7 nm 90 nm 80 nm 65 nm 55 nm 40 nm 32 nm 28 nm 16 nm 14 nm 12 nm 90 nm 80 nm 65 nm 55 nm 40 nm 32 nm 28 nm 16 nm 14 nm 12 nm 90 nm 80 nm 65 nm 55 nm 40 nm 32 nm 28 nm 16 nm 14 nm 12 nm (a) Watt per billion transistor vs. process size. (b) Watt per mm^2 vs. process size. (c) Base clock vs. process size. Fig. 4. Process Size vs. (a) Energy consumption per billion transistor. (b) Energy consumption per area. (c) Base clock speed. Vendor AMD NVIDIA other nodes. The most recent 7nm technology has an unusually high energy density, exceeding the 98 percentile of the energy Die Size 300.0 600.0 900.0 density of all other nodes. It is still too early to conclude 100 the end of Dennard Scaling at the 7nm node since there have been only eight 7nm GPUs produced at the time of this report. 50 We expect chip manufacturers to develop solutions to improve energy efficiency. 20 Despite the challenges to continue Dennard Scaling, fre- quency continues to increase with shrinking process sizes (as 10 shown in Figure.