Cray System Monitoring: Successes, Requirements, and Priorities x Ville Ahlgreny, Stefan Anderssonx, Jim Brandt , Nicholas P. Cardoz, Sudheer Chunduri∗, x xi Jeremy Enos∗∗, Parks Fieldsk, Ann Gentile , Richard Gerberyy, Joe Greenseid , Annette Greineryy, Bilel Hadri{, Yun (Helen) Heyy, Dennis Hoppex, Urpo Kailay, Kaki Kellyk, Mark Kleinz, x x Alex Kristiansen∗, Steve Leakyy, Mike Masonk, Kevin Pedretti , Jean-Guillaume Piccinaliz, Jason Repik , Jim Rogerszz, Susanna Salmineny, Mike Showerman∗∗, Cary Whitneyyy, and Jim Williamsk ∗Argonne Leadership Computing Facility (ALCF), Argonne, IL USA 60439 Email: (sudheerjalexk)@anl.gov yCenter for Scientific Computing (CSC - IT Center for Science), (02101 Espoo j 87100 Kajaani), Finland Email: (ville.ahlgrenjurpo.kailajsusanna.salminen)@csc.fi zSwiss National Supercomputing Centre (CSCS), 6900 Lugano, Switzerland Email: (cardojkleinjjgp)@cscs.ch xHigh Performance Computing Center Stuttgart (HLRS), 70569 Stuttgart, Germany Email: (
[email protected],
[email protected]) {King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudia Arabia Email:
[email protected] kLos Alamos National Laboratory (LANL), Los Alamos, NM USA 87545 Email: (parksjkakjmmasonjjimwilliams)@lanl.gov ∗∗National Center for Supercomputing Applications (NCSA), Urbana, IL USA 61801 Email: (jenosjmshow)@ncsa.illinois.edu yyNational Energy Research Scientific Computing Center (NERSC), Berkeley, CA USA 94720 Email: (ragerberjamgreinerjsleakjclwhitney)@lbl.gov zzOak Ridge National Laboratory (ORNL), Oak Ridge, TN USA 37830 Email:
[email protected] x Sandia National Laboratories (SNL), Albuquerque, NM USA 87123 Email: (brandtjgentilejktpedrejjjrepik)@sandia.gov xi Cray Inc., Columbia, MD, USA 21045 Email:
[email protected] Abstract—Effective HPC system operations and utilization in the procurement of a platform tailored to support the require unprecedented insight into system state, applications’ performance demands of a site’s workload.