1 Detailed Cache Coherence Characterization for OpenMP Benchmarks Anita Nagarajan, Jaydeep Marathe, Frank Mueller Dept. of Computer Science, North Carolina State University, Raleigh, NC 27695-7534
[email protected], phone: (919) 515-7889 Abstract models in their implementation (e.g., [6], [13], [26], [24], [2], [7]), and they operate at different abstraction levels Past work on studying cache coherence in shared- ranging from cycle-accuracy over instruction-level to the memory symmetric multiprocessors (SMPs) concentrates on operating system interface. On the performance tuning end, studying aggregate events, often from an architecture point work mostly concentrates on program analysis to derive op- of view. However, this approach provides insufficient infor- timized code (e.g., [15], [27]). Recent processor support for mation about the exact sources of inefficiencies in paral- performance counters opens new opportunities to study the lel applications. For SMPs in contemporary clusters, ap- effect of applications on architectures with the potential to plication performance is impacted by the pattern of shared complement them with per-reference statistics obtained by memory usage, and it becomes essential to understand co- simulation. herence behavior in terms of the application program con- In this paper, we concentrate on cache coherence simu- structs — such as data structures and source code lines. lation without cycle accuracy or even instruction-level sim- The technical contributions of this work are as follows. ulation. We constrain ourselves to an SPMD programming We introduce ccSIM, a cache-coherent memory simulator paradigm on dedicated SMPs. Specifically, we assume the fed by data traces obtained through on-the-fly dynamic absence of workload sharing, i.e., only one application runs binary rewriting of OpenMP benchmarks executing on a on a node, and we enforce a one-to-one mapping between Power3 SMP node.