High Performance Computing Facility Operational Assessment 2016 Oak Ridge Leadership Computing Facility
Total Page:16
File Type:pdf, Size:1020Kb
ORNL/SPR-2017/486 US Department of Energy, Office of Science High Performance Computing Facility Operational Assessment 2016 Oak Ridge Leadership Computing Facility Ryan Adamson Ashley D. Barker Arthur S. Bland James J. Hack Jason Hill Approved for public release. Stephen T. McNally Distribution is unlimited. Gordon Rhyne Mallikarjun Shankar T. P. Straatsma Kevin G. Thach Suzy Tichenor Sudharshan S. Vazhkudai Jack C. Wells March 2017 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website http://www.osti.gov/scitech/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone 703-605-6000 (1-800-553-6847) TDD 703-487-4639 Fax 703-605-6900 E-mail [email protected] Website http://classic.ntis.gov/ Reports are available to DOE employees, DOE contractors, Energy Technology Data Exchange representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone 865-576-8401 Fax 865-576-5728 E-mail [email protected] Website http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. ORNL/SPR-2017/486 Oak Ridge Leadership Computing Facility HIGH PERFORMANCE COMPUTING FACILITY OPERATIONAL ASSESSMENT 2016 OAK RIDGE LEADERSHIP COMPUTING FACILITY Ryan Adamson Ashley D. Barker Arthur S. Bland James J. Hack Jason Hill Stephen T. McNally Gordon Rhyne Mallikarjun Shankar T. P. Straatsma Kevin G. Thach Suzy Tichenor Sudharshan S. Vazhkudai Jack C. Wells Date Published: March 2017 Prepared by OAK RIDGE NATIONAL LABORATORY Oak Ridge, TN 37831-6283 managed by UT-BATTELLE, LLC for the US DEPARTMENT OF ENERGY under contract DE-AC05-00OR22725 CONTENTS Page LIST OF FIGURES ............................................................................................................................ vi LIST OF TABLES ............................................................................................................................. vii ACRONYMS .................................................................................................................................... viii EXECUTIVE SUMMARY ................................................................................................................. x 1. USER RESULTS ....................................................................................................................... 14 1.1 USER RESULTS SUMMARY ....................................................................................... 14 1.2 USER SUPPORT METRICS .......................................................................................... 15 1.2.1 Overall Satisfaction Rating for the Facility ........................................................ 15 1.2.2 Average Rating across All User Support Questions ........................................... 16 1.2.3 Improvement on Past Year Unsatisfactory Ratings ............................................ 16 1.2.4 Assessing the Effectiveness of the OLCF User Survey ...................................... 17 1.3 PROBLEM RESOLUTION METRICS .......................................................................... 18 1.3.1 Problem Resolution Metric Summary ................................................................ 18 1.4 USER SUPPORT AND OUTREACH ............................................................................ 19 1.4.1 User Support ....................................................................................................... 19 1.4.2 User Assistance and Outreach (UAO) ................................................................ 20 1.4.3 Scientific Liaisons ............................................................................................... 21 1.4.4 Data Liaisons ...................................................................................................... 24 1.4.5 Visualization Liaisons ......................................................................................... 25 1.4.6 OLCF User Group and Executive Board ............................................................ 27 1.4.7 Training, Education, and Workshops ................................................................. 27 1.4.8 Training and Outreach Activities for the Future Members of the HPC Community and the General Public .................................................................... 29 1.4.9 Outreach .............................................................................................................. 32 1.5 LOOKING FORWARD .................................................................................................. 33 1.5.1 Application Portability ........................................................................................ 33 1.5.2 Application Readiness and Early Science .......................................................... 33 1.5.3 Computational Scientists for Energy, the Environment, and National Security (CSEEN) Postdoctoral Program ......................................................................... 34 2. BUSINESS RESULTS .............................................................................................................. 36 2.1 BUSINESS RESULTS SUMMARY ............................................................................... 36 2.2 CRAY XK7 (TITAN) RESOURCE SUMMARY .......................................................... 36 2.3 CRAY XC30 (EOS) RESOURCE SUMMARY ............................................................. 36 2.4 LUSTRE FILE SYSTEMS (SPIDER II) RESOURCE SUMMARY ............................. 36 2.5 DATA ANALYSIS AND VISUALIZATION CLUSTER (RHEA) RESOURCE SUMMARY ..................................................................................................................... 37 2.6 HIGH PERFORMANCE STORAGE SYSTEM (HPSS) RESOURCE SUMMARY .... 37 2.7 VISUALIZATION RESOURCE SUMMARY ............................................................... 37 2.8 OLCF COMPUTATIONAL AND DATA RESOURCE SUMMARY .......................... 37 2.8.1 OLCF HPC Resource Production Schedule ....................................................... 37 2.8.2 Business Results Snapshot .................................................................................. 38 2.9 RESOURCE AVAILABILITY ....................................................................................... 40 2.9.1 Scheduled Availability ........................................................................................ 40 2.9.2 Overall Availability ............................................................................................ 41 2.9.3 Mean Time to Interrupt (MTTI) ......................................................................... 41 2.9.4 Mean Time to Failure (MTTF) ........................................................................... 42 iii 2.10 RESOURCE UTILIZATION .......................................................................................... 42 2.10.1 Resource Utilization Snapshot ............................................................................ 43 2.10.2 Total System Utilization ..................................................................................... 43 2.11 CAPABILITY UTILIZATION ....................................................................................... 44 2.12 GPU USAGE ................................................................................................................... 45 2.13 FUNCTIONAL PARTITIONING MULTI-TASKING FRAMEWORK ACCELERATES SCIENTIFIC DISCOVERY ............................................................... 47 3. STRATEGIC RESULTS ........................................................................................................... 48 3.1 SCIENCE OUTPUT ........................................................................................................ 48 3.1.1 OLCF Publications Report .................................................................................. 48 3.2 SCIENTIFIC ACCOMPLISHMENTS ............................................................................ 49 3.2.1 Streamlining Accelerated Computing for Industry: Peter Vincent, Imperial College, Director’s Discretionary (DD) Program ............................................... 49 3.2.2 A Seismic Mapping Milestone: Jeroen Tromp, Princeton University, INCITE . 50 3.2.3 The Shape of Melting in Two Dimensions: Sharon Glotzer, University of Michigan, INCITE .............................................................................................. 51