LogCA: A High-Level Performance Model for Hardware Accelerators Muhammad Shoaib Bin Altaf ∗ David A. Wood AMD Research Computer Sciences Department Advanced Micro Devices, Inc. University of Wisconsin-Madison
[email protected] [email protected] ABSTRACT 10 With the end of Dennard scaling, architects have increasingly turned Unaccelerated Accelerated 1 to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accel- 0.1 erators don’t always live up to their expectations and may under- perform in some situations. Understanding the factors which effect Time (ms) 0.01 Break-even point the performance of an accelerator is crucial for both architects and 0.001 programmers early in the design stage. Detailed models can be 16 64 highly accurate, but often require low-level details which are not 256 1K 4K 16K 64K available until late in the design cycle. In contrast, simple analytical Offloaded Data (Bytes) models can provide useful insights by abstracting away low-level system details. (a) Execution time on UltraSPARC T2. In this paper, we propose LogCA—a high-level performance 100 model for hardware accelerators. LogCA helps both programmers SPARC T4 UltraSPARC T2 GPU and architects identify performance bounds and design bottlenecks 10 early in the design cycle, and provide insight into which optimiza- tions may alleviate these bottlenecks. We validate our model across Speedup 1 Break-even point a variety of kernels, ranging from sub-linear to super-linear com- plexities on both on-chip and off-chip accelerators. We also describe the utility of LogCA using two retrospective case studies.