To appear at IEEE International Symposium on High Performance Computer Architecture (HPCA) 2021. Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design Nishil Talati∗, Kyle May∗†, Armand Behroozi∗, Yichen Yang∗, Kuba Kaszykz, Christos Vasiladiotisz, Tarunesh Verma∗, Lu Liz, Brandon Nguyen∗, Jiawen Sunz, John Magnus Mortonz, Agreen Ahmadi∗, Todd Austin∗, Michael O’Boylez, Scott Mahlke∗, Trevor Mudge∗, Ronald Drelinski∗ ∗University of Michigan yUniversity of Wisconsin, Madison zUniversity of Edinburgh Email:
[email protected] Abstract—Irregular workloads are typically bottlenecked by System overview Contributions the memory system. These workloads often use sparse data Data Indirection Instrumented binary Software Compiler Graph (DIG) int bfs() { […] representations, e.g., compressed sparse row/column (CSR/CSC), analysis Instrumented populateDS(…) bfs.cc Application regNode(…) to conserve space at the cost of complicated, irregular traversals. application regTrigEdge(…) source code regTravEdge(…) Add DIG binary // perform trav Such traversals access large volumes of data and offer little […] locality for caches and conventional prefetchers to exploit. representation } This paper presents Prodigy, a low-cost hardware-software co- Hardware Run Programmable Generate Prefetcher Adjustable prefetch design solution for intelligent prefetching to improve the memory application Program the distance prefetch Adapts to core’s latency of several important irregular workloads. Prodigy targets prefetcher execution pace irregular workloads including graph analytics, sparse linear al- requests Low-cost gebra, and fluid mechanics that exhibit two specific types of data- dependent memory access patterns. Prodigy adopts a “best of Figure 1. Overview of our design and contributions. Prodigy software both worlds” approach by using static program information from efficiently communicates key data structures and algorithmic traversal patterns, software, and dynamic run-time information from hardware.