0 to 60 with Intel® HPC Orchestrator HPC Advisory Council Stanford Conference — February 7 & 8, 2017
Total Page:16
File Type:pdf, Size:1020Kb
0 to 60 With Intel® HPC Orchestrator HPC Advisory Council Stanford Conference — February 7 & 8, 2017 Steve Jones Director, High Performance Computing Center Stanford University David N. Lombard Chief Software Architect Intel® HPC Orchestrator Senior Principal Engineer, Extreme Scale Computing, Intel Dellarontay Readus HPC Software Engineer, and CS Undergraduate Stanford University Legal Notices and Disclaimers • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. • No computer system can be absolutely secure. • Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. • Cost reduction scenarios described are intended as examples of how a given Intel- based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. • No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. • Intel, the Intel logo and others are trademarks of Intel Corporation in the U.S. and/or other countries. • *Other names and brands may be claimed as the property of others. • © 2017 Intel Corporation. Agenda • Demo Outline • Intel® HPC Orchestrator and OpenHPC • Installation Walkthrough • Intel® Parallel Studio XE • Modules Support • Management and Maintenance • Workload Management • 3rd-Party Packages • Demo of Built Cluster • Moving Forward Intel® Scalable System Framework (Intel® SSF) A systems approach for Innovation Continue the pace of Develop new technologies to Achieve full potential via Faster Compute Remove Bottlenecks Tighter Integration compute Memory Fabric Cores memory fabric storage system Moore’s Law software PCIe Graphics μArchitecture Better power, performance, Performance | Capability | Performance Time Capability | Performance Time density, scaling & cost Execution of Vision Intel® HPC Orchestrator products are the realization of the software portion of Intel® Scalable System Framework Intel® HPC Orchestrator Intel® Scalable System Framework (SSF) A Holistic Design Solution for All HPC • Small clusters through supercomputers • Compute and data-centric computing • Open Source Community under • Standards-based programmability • Intel’s distribution of OpenHPC Linux Foundation • On-Premise and cloud-based • Intel HW validation • Ecosystem innovation building a • Expose best performance for Intel consistent HPC SW Platform HW • Platform agnostic Intel HW Validation, SW Alignment • Advanced testing & premium features • 30 global members & Technical Support • Product technical support & updates • Multiple distributions What is OpenHPC? OpenHPC is a community effort endeavoring to: • provide collection(s) of pre-packaged components that can be used to help install and manage flexible HPC systems throughout their lifecycle Install Guide - CentOS7.1 Version (v1.0) Typical cluster • leverage standard Linux delivery model architecture to retain admin familiarity (i.e., package repos) • allow and promote multiple system configuration recipes Lustre* storage system that leverage community reference designs and best practices • implement integration testing to high speed network gain validation confidence • provide additional distribution mechanism for groups releasing Master compute open-source software (SMS) nodes • provide a stable platform for Data new R&D initiatives Center eth0 eth1 Network to compute eth interface to compute BMC interface tcp networking Courtesy of OpenHPC* Figure 1: Overview of physical cluster architecture. *Other names and brands may be claimed as the property of others. the local dat a center network and et h1 used to provision and manage the cluster backend (not e that these interface names are examples and may be di↵erent depending on local settings and OS conventions). Two logical IP interfaces are expected to each compute node: the first is the standard Ethernet interface that will be used for provisioning and resource management. The second is used to connect to each host’s BMC and is used for power management and remot e console access. Physical connectivity for these two logical IP networks is often accommodat ed via separat e cabling and switching infrastructure; however, an alternat e configurat ion can also be accommodat ed via the use of a shared NIC, which runs a packet filter to divert management packets between the host and BMC. In addition to the IP networking, there is a high-speed network (InfiniBand in this recipe) that is also connected to each of thehosts. Thishigh speed network isused for applicat ion message passing and optionally for Lustre connectivity as well. 1.3 B ring your own license OpenHPC provides a variety of integrat ed, pre-packaged elements from the Intel® Parallel Studio XE soft- ware suite. Portions of the runtimes provided by the included compilers and MPI components are freely usable. However, in order to compile new binaries using these tools (or access ot her analysis tools like Intel ® Trace Analyzer and Collector, Intel® Inspector, etc), you will need to provide your own valid license and OpenHPC adopts a br ing-your-own-license model. Note that licenses are provided free of charge for many cat egories of use. In particular, licenses for compilers and developments tools are provided at no cost to academic researchers or developers contributing to open-source software projects. More informat ion on this program can be found at : https://software.intel.com/en-us/qualify-for-free-software 1.4 I nputs As this recipe details installing a cluster starting from bare-metal, there is a requirement to define IP ad- dresses and gat her hardware MAC addresses in order to support a controlled provisioning process. These values are necessarily unique to the hardware being used, and this document uses variable substitution (${ var i abl e} ) in the command-line examples that follow to highlight where local site inputs are required. A summary of the required and optional variables used throughout this recipe are presented below. Note 6 Rev: bf8c471 Participation as of January’17 Official Members Mixture of Academics, Labs, OEMs, and ISVs/OSVs Courtesy of OpenHPC* Project member participation interest? Please contact *Other names and brands may be claimed as the property of others. Kevlin Husser or Jeff ErnstFriedman <[email protected]> Individuals • Reese Baird, Intel (Maintainer) • Thomas Sterling, Indiana University (Component Development • Pavan Balaji, Argonne National Laboratory (Maintainer) Representative) • David Brayford, LRZ (Maintainer) • Craig Stewart, Indiana University (End-User/Site Representative) • Todd Gamblin, Lawrence Livermore National Labs (Maintainer) • Scott Suchyta, Altair (Maintainer) • Craig Gardner, SUSE (Maintainer) • Nirmala Sundararajan, Dell (Maintainer) • Yiannis Georgiou, ATOS (Maintainer) • Balazs Gerofi, RIKEN (Component Development Representative) • Jennifer Green, Los Alamos National Laboratory (Maintainer) • Eric Van Hensbergen, ARM (Maintainer, Testing Coordinator) • Douglas Jacobsen, NERSC (End-User/Site Representative) • Chulho Kim, Lenovo (Maintainer) • Greg Kurtzer, Lawrence Berkeley National Labs (Component Development Representative) • Thomas Moschny, ParTec (Maintainer) • Karl W. Schulz, Intel (Project Lead, Testing Coordinator) • Derek Simmel, Pittsburgh Supercomputing Center (End-User/Site Representative) Courtesy of OpenHPC* https://github.com/openhpc/ohpc/wiki/Governance-Overview *Other names and brands may be claimed as the property of others. Intel® HPC Orchestrator Product Family Product Target Customer CUSTOM Highly modular hierarchically High End HPC users configured stack GA today ADVANCED Technical & commercial users Modular stack with a flat configuration with vertical HPC applications TURNKEY Turnkey stack with a flat HPC ISVs, SIs and customers configuration. A variety of stacks looking for a turnkey solution for may be available as offerings On-Prem and CSP access methodologies, appliances OpenHPC Community OEM University Community Community GNU RRV* OEM Linux RRV Intel® HPC Orchestrator University Stack Stack File System RRV Resource RRV Manager Integrates and OpenHPC tests stacks and makes them List of Components from Upstream RRVs Base available as Communities HPC Stack OSS Cadence . Warewulf . Scipy . Latex . Papi 6~12 mo . Ganglia . R-Project . Conduse . Pdsh . Lustre . Spack . Msweet . Pdt Continuous Integration Environment . Munge . Lmod . Losf . Petsc . EasyBuild . Tug . Luajit . Lapack - Build Environment & Source Control . Slurm . Adios . Mpip . SuperLU - Bug Tracking . Nagios . Boost . Mumps . Trilinos - User & Dev Forums . GNU . FFtw . Mcapich . Valgrind . OpenMPI . Hdfgroup . Netcdf - Collaboration tools . NumPy . Hypre . Openblas - Validation Environment *RRV: reliable and relevant version Intel® HPC Orchestrator Value Proposition Intel® HPC