WORKSHOP REPORT December 2008
Total Page:16
File Type:pdf, Size:1020Kb
Risk Management Techniques and Practice Workshop WORKSHOP REPORT December 2008 Compiled by Terri Quinn and Mary Zosel Lawrence Livermore National Laboratory LLNL-TR-409240 Workshop Steering Committee Arthur Bland, ORNL; Vince Dattoria, SC/ASCR/DOE HQ; Bill Kramer, LBNL; Sander Lee, NNSA/ASC/DOE HQ; Terri Quinn, LLNL (workshop chair); Randal Rheinheimer, LANL; Mark Seager, LLNL; Yukiko Sekine, SC/ASCR/DOE HQ; Jeffery Sims, ANL; Jon Stearly, SNL; and Mary Zosel, LLNL (host organizer) Workshop Group Chairs Ann Baker, ORNL; Robert Ballance, SNL; Kathlyn Boudwin, ORNL; Susan Coghlan, ANL; James Craw, LBNL; Candace Culhane, DoD; Kimberly Cupps, LLNL; Brent Draney, LBNL; Ira Goldberg, ANL; Patricia Kovatch, UTenn/NICS; Robert Pennington, NCSA; Kevin Regimbal, PNNL; Randal Rheinheimer, LANL; Gary Skouson, PNNL; Jon Stearly, SNL; and Manuel Vigil, LANL December 2008 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes. RMTAP Workshop Report Page ii December 2008 Contents I. EXECUTIVE SUMMARY ............................................................................................................ 1 II. INTRODUCTION........................................................................................................................ 3 Workshop Abstract........................................................................................................................ 3 Workshop Goals............................................................................................................................. 4 III. WORKSHOP FORMAT AND PLENARY SESSIONS..................................................... 5 Summary of Plenary Session 1.................................................................................................... 5 Summary of Plenary Session 2.................................................................................................... 6 Workshop Breakout Tracks ......................................................................................................... 7 Final Workshop Session ............................................................................................................... 8 IV. WORKSHOP FINDINGS ........................................................................................................ 9 Finding #1: Standard Risk Management Techniques and Tools Are, in the Aggregate, Applicable to HPCC Projects and Are Commonly Employed by the HPCC Community ..................................................................................................................................... 9 Finding #2: HPC Projects Have Characteristics that Necessitate a Tailoring of the Standard Risk Management Practices ....................................................................................... 9 Finding #3: All HPCC Acquisition Projects Can Benefit by Employing Risk Management but the Specific Choice of Risk Management Processes and Tools is Less Important to the Success of the Project ................................................................................... 10 Finding #4: The Special Relationship between the HPCC and HPC Vendors Must Be Reflected in the Risk Management Strategy .......................................................................... 10 Finding #5: Best Practices Findings (Based on Questionnaire Voting)............................ 11 Develop a Prioritized Risk Register with Special Attention to the Top Risks............ 11 Establish a Practice of Regular Meetings and Status Updates with the Vendor Partner........................................................................................................................................ 11 Support Regular, Open Reviews that Engage the Interests and Expertise of a Wide Range of Staff and Stakeholders........................................................................................... 11 Document and Share the Acquisition/Build/Deployment Experience...................... 12 Finding #6: Top Risk Categories (Based on Questionnaire Voting) .................................12 System Scaling Issues ............................................................................................................. 12 Request for Proposal/Contract and Acceptance Testing ............................................... 12 Vendor Technical or Business Problems ............................................................................ 13 Personnel Staffing and Interactions..................................................................................... 13 RMTAP Workshop Report Page iii December 2008 Project Schedule....................................................................................................................... 13 Sponsor Commitment............................................................................................................. 13 Facilities and Operations ....................................................................................................... 13 V. CONCLUSION ........................................................................................................................... 15 APPENDIX A. WORKSHOP AGENDA ................................................................................... 17 APPENDIX B. BREAKOUT SESSIONS AND REPORTS................................................... 19 Track 1: Tailoring Risk Management to HPCCs ...................................................................19 Session 1: Risk Ownership and Analysis: How do you know which risks will “bite” you? ............................................................................................................................................ 19 Session 2: Risk Identification and Analysis—the Classic Categories: Are we covering all the bases? ............................................................................................................ 21 Session 3: Risk Management—Tools, Tips, and Tricks: "Please sign the register" or “There's nothing up my sleeve.”.......................................................................................... 24 Session 4: Risk Management—Mitigation and Contingency Planning: Know when to hold them and when to fold them................................................................................... 26 Track 2: Real Life Risk Experience ........................................................................................... 30 Session 1: From Vision to Contract: “So you want to buy a mega-WHAT?!”............ 30 Session 2: Management of System R&D from Contract Award through the Build: “Moore's Law meets Murphy's Law” ................................................................................. 33 Session 3: Acceptance Testing and Integration: "How to get the system installed and stay out of jail.” ........................................................................................................................ 35 Session 4: Managing HPC Business Risks: “Herding cats and dollars” or “Where DOES the buck stop?” ............................................................................................................ 38 APPENDIX C. ANALYSIS OF WORKSHOP QUESTIONNAIRES..................................41 Track 1 Best Practices Finding (Based on Voting) ................................................................ 41 Track 2 – Top Risk Categories (Derived from Voting) ........................................................ 42 APPENDIX D. WORKSHOP ATTENDEES............................................................................. 45 RMTAP Workshop Report Page iv December 2008 I. Executive Summary At the request of the Department of Energy (DOE) Office of Science (SC), Lawrence Livermore National Laboratory (LLNL) hosted a two-day Risk Management Techniques and Practice (RMTAP) workshop held September 18–19 at the Hotel Nikko in San Francisco. The purpose of the workshop, which was sponsored by the SC/Advanced Scientific Computing Research (ASCR) program and the National Nuclear Security Administration (NNSA)/Advanced Simulation and Computing (ASC) program, was to assess current and emerging techniques, practices, and lessons learned for effectively identifying, understanding, managing, and mitigating the risks associated with acquiring leading-edge computing systems at high-performance computing centers (HPCCs). Representatives from fifteen high-performance computing (HPC) organizations, four HPC vendor partners, and three