Cray System Monitoring: Successes, Requirements, and Priorities
Ville Ahlgren, Stefan Andersson, Jim Brandt, Nicholas P. Cardo, Sudheer Chunduri, Jeremy Enos, Parks Fields, Ann Gentile, Richard Gerber, Joe Greenseid, Annette Greiner, Bilel Hadri, Yun (Helen) He, Dennis Hoppe, Urpo Kaila, Kaki Kelly, Mark Klein, Alex Kristiansen, Steve Leak, Mike Mason, Kevin Pedretti, Jean-Guillaume Piccinali, Jason Repik, Jim Rogers, Susanna Salminen, Mike Showerman, Cary Whitney, and Jim Williams
∗Argonne Leadership Computing Facility (ALCF), Argonne, IL USA
yCenter for Scientific Computing (CSC - IT Center for Science), Finland
zSwiss National Supercomputing Centre (CSCS), Lugano, Switzerland
xHigh Performance Computing Center Stuttgart (HLRS), Stuttgart, Germany
[email protected],
[email protected]) {King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudia Arabia Email:
{King Abdullah University of Science and Technology (KAUST), Thuwal, Saudia Arabia
kLos Alamos National Laboratory (LANL), Los Alamos, NM USA
∗∗National Center for Supercomputing Applications (NCSA), Urbana, IL USA
yyNational Energy Research Scientific Computing Center (NERSC), Berkeley, CA USA
zzOak Ridge National Laboratory (ORNL), Oak Ridge, TN USA
xSandia National Laboratories (SNL), Albuquerque, NM USA
xiCray Inc., Columbia, MD, USA
Abstract—Effective HPC system operations and utilization
[email protected] Abstract—Effective HPC system operations and utilization in the procurement of a platform tailored to support the require unprecedented insight into system state, applications’ performance demands of a site’s workload.