Cluster Computing with Openhpc
Total Page:16
File Type:pdf, Size:1020Kb
http://openhpc.community Cluster Computing with OpenHPC Karl W. Schulz, Ph.D. Technical Project Lead, OpenHPC Community Scalable Datacenter Solutions Group, Intel HPCSYSPROS16 Workshop, SC’16 November 14 ! Salt Lake City, Utah Acknowledgements • Co-Authors: Reese Baird, David Brayford, Yiannis Georgiou, Gregory Kurtzer, Derek Simmel, Thomas Sterling, Nirmala Sundararajan, Eric Van Hensbergen • OpenHPC Technical Steering Committee (TSC) • Linux Foundation and all of the project members • Intel, Cavium, and Dell for hardware donations to support community testing efforts • Texas Advanced Computing Center for hosting support 2 Outline • Community project overview - mission/vision - members - governance • Stack overview • Infrastructure: build/test • Summary 3 OpenHPC: Mission and Vision • Mission: to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools. • Vision: OpenHPC components and best practices will enable and accelerate innovation and discoveries by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent environment, supported by a collaborative, worldwide community of HPC users, developers, researchers, administrators, and vendors. 4 OpenHPC: Project Members Argonne6 National6 Laboratory CEA Mixture of Academics, Labs, Project6member6participation6interest? Please6contact6 OEMs, and ISVs/OSVs Jeff6ErnstFriedman [email protected] OpenHPC Technical Steering Committee (TSC) Role Overview OpenHPC Technical6Steering6Committee6(TSC) Integration6 Upstream6Component6 Project6 EndUUser6/6Site6 Testing6 Development6 Maintainers Leader Representative(s) Coordinator(s) Representative(s) https://github.com/openhpc/ohpc/wiki/Governance7Overview 6 Stack Overview • Packaging efforts have HPC in mind and include compatible modules (for use with Lmod) with development libraries/tools • Endeavoring to provide hierarchical development environment that is cognizant of different compiler and MPI families • Intent is to manage package dependencies so they can be used as building blocks (e.g. deployable with multiple provisioning systems) • Include common conventions for env variables • Development library install example: # yum install petsc-gnu-mvapich2-ohpc • End user interaction example with above install: (assume we are a user wanting to build a PETSC hello world in C) $ module load petsc $ mpicc -I$PETSC_INC petsc_hello.c -L$PETSC_LIB –lpetsc Typical Cluster ArchitectureInstall Guide - CentOS7.1 Version (v1.0) Lustre* storage system • Install guides walk thru high speed network bare-metal install • Leverages image-based Master compute (SMS) nodes provisioner (Warewulf) Install Guide - CentOS7.1 Version (v1.0) Data - PXE boot (stateless) Center eth0 eth1 Network to compute eth interface to compute BMC interface - that while the example definitions above correspondtcp networking to a small 4-node compute subsystem, the compute optionally connect parameters are defined in array format to accommodate logical extension to larger node counts. external Lustre file Figure 1: Overview of physical cluster architecture. • $ ohpc repo # OpenHPC repo location system { } • $ sms name # Hostname for SMS server { } the local• $ datasms ip center network and eth1 used to provision# Internal and manage IP address the on cluster SMS server backend (note that these { } • interface• $ namessms eth areinternal examples and may be di↵erent depending# Internal on Ethernet local settings interface and on SMS OS conventions). Two Obviously need { } logical• IP$ eth interfacesprovision are expected to each compute node:# Provisioning the first is interface the standard for computes Ethernet interface that { } hardware-specific will• be used$ internal for provisioningnetmask and resource management.# SubnetThe second netmask is used for internal to connect network to each host’s BMC { } and• is used$ ntp forserver power management and remote console# access.Local ntp Physical server for connectivity time synchronization for these two logical { } information to support IP networks• $ bmc isusername often accommodated via separate cabling# andBMC switching username infrastructure;for use by IPMI however, an alternate { } configuration• $ bmc canpassword also be accommodated via the use of# BMCa shared password NIC, for which use by runs IPMI a packet filter to divert (remote) bare-metal { } management• $ c ip[0] packets, between$ c ip[1] the,... host and BMC. # Desired compute node addresses { } { } In• addition$ c bmc[0] to the, $ IPc networking,bmc[1] ,... there is a high-speed# BMC network addresses (InfiniBand for computes in this recipe) that is also provisioning { } { } connected• $ c tomac[0] each of, the$ c hosts.mac[1] This,... high speed network# is MAC used foraddresses application for computes message passing and optionally { } { } for Lustre• $ compute connectivityregex as well. # Regex for matching compute node names (e.g. c*) { } Optional: 1.3• Bring$ mgs fs yourname own license # Lustre MGS mount name { } • $ sms ipoib # IPoIB address for SMS server OpenHPC{ provides a} variety of integrated, pre-packaged elements from the Intel® Parallel Studio XE soft- • $ ipoib netmask # Subnet netmask for internal IPoIB ware suite.{ Portions of the} runtimes provided by the included compilers and MPI components are freely • $ c ipoib[0] , $ c ipoib[1] ,... # IPoIB addresses for computes usable. However,{ in} order{ to compile} new binaries using these tools (or access other analysis tools like Intel® Trace Analyzer and Collector, Intel® Inspector, etc), you will need to provide your own valid license and2 OpenHPC Install adopts Base a bring-your-own-license Operating Systemmodel. (BOS) Note that licenses are provided free of charge for many categories of use. In particular, licenses for compilers and developments tools are provided at no cost8 toIn academic an external researchers setting, installing or developers the desired contributing BOS on to a master open-sourceSMS hostsoftware typically projects. involves More booting information from a on DVD ISO image on a new server. With this approach, insert the CentOS7.1 DVD, power cycle the host, and this program can be found at: follow the distro provided directions to install the BOS on your chosen master host. Alternatively, if choosing to use a pre-installed server,https://software.intel.com/en-us/qualify-for-free-software please verify that it is provisioned with the required CentOS7.1 distribution. 1.4Tip Inputs As thisWhile recipe it is details theoretically installing possible a cluster to provision starting a Warewulf from bare-metal, cluster with there SELinux is a requirementenabled, doing to so define is be- IP ad- dressesyond and the gather scope of hardware this document. MAC addresses Even the use in orderof permissive to support mode a can controlled be problematic provisioning and we process. therefore These recommend disabling SELinux on the master SMS host. If SELinux components are installed locally, the valuesselinuxenabled are necessarilycommand unique can to be the used hardware to determine being if SELinux used, and is currently this document enabled. If uses enabled, variable consult substitution the ($ variabledistro documentation) in the command-line for information examples on how to that disable. follow to highlight where local site inputs are required. A{ summary of} the required and optional variables used throughout this recipe are presented below. Note 3 Install OpenHPC Components6 Rev: bf8c471 With the BOS installed and booted, the next step is to add desired OpenHPC packages onto the master server in order to provide provisioning and resource management services for the rest of the cluster. The following subsections highlight this process. 3.1 Enable OpenHPC repository for local use To begin, enable use of the OpenHPC repository by adding it to the local list of available package repositories. Note that this requires network access from your master server to the OpenHPC repository, or alternatively, 7 Rev: bf8c471 OpenHPC v1.2 - Current S/W components Functional*Areas Components Base6OS CentOS 7.2, SLES126SP1 new6with6v1.2 Architecture x86_64,6aarch64 (Tech6Preview) Conman,6Ganglia, Lmod,6LosF,6Nagios,6pdsh,6prun,6EasyBuild,6ClusterShell,6 Administrative6Tools mrsh,6Genders,6Shine,6Spack Provisioning6 Warewulf Notes: • Additional dependencies Resource6Mgmt. SLURM,6Munge,6PBS6Professional that are not provided by the BaseOS or community Runtimes OpenMP, OCR repos (e.g. EPEL) are also included I/O6Services Lustre client (community6version) • 3rd Party libraries are built Numerical/Scientific6 Boost,6GSL,6FFTW,6Metis,6PETSc,6Trilinos,6Hypre,6SuperLU,6SuperLU_Dist,6 for each compiler/MPI Libraries Mumps, OpenBLAS,6Scalapack family (8 combinations typically) I/O6Libraries HDF56(pHDF5),6NetCDF (including6C++6and6Fortran6interfaces),6Adios • Resulting repositories currently comprised of Compiler6Families GNU6(gcc,6g++,6gfortran) ~300 RPMs MPI6Families MVAPICH2,6OpenMPI,6MPICH Development6Tools Autotools (autoconf,6automake,6libtool),6Valgrind,R,6SciPy/NumPy Performance6Tools PAPI,6IMB, mpiP,6pdtoolkit TAU,6Scalasca,6ScoreP,6SIONLib 9 Hierarchical Overlay for OpenHPC software Centos 7 Distro Repo General Tools lmod slurm munge losf and System Services warewulf lustre client ohpc prun pdsh Development Environment Compilers gcc Intel Composer Serial Apps/Libs hdf5-gnu hdf5-intel OHPC Repo MPI MVAPICH2 IMPI OpenMPI MVAPICH2