IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution

IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution

Front cover IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution Dino Quintero Miguel Gomez Gonzalez Ahmad Y Hussein Jan-Frode Myklebust Redbooks International Technical Support Organization IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution May 2019 SG24-8422-00 Note: Before using this information and the product it supports, read the information in “Notices” on page ix. First Edition (May 2019) This edition applies to: IBM Power System AC922 Firmware OP9-V2.0-2.14 Extreme Cloud Administration Toolkit (xCAT) V2.14.3, Red Hat Enterprise Linux server 7.5 Alt (RHEL-ALT-7.5-20180315.0-Server-ppc64le-dvd1.iso) NVIDIA Compute Unified Device Architecture (CUDA) V9.2 (cuda-repo-rhel7-9-2-local-9.2.148-1.ppc64le) Mellanox MLNX_OFED_LINUX-4.3-4.0.5.1-rhel7.5alternate-ppc64le.iso IBM XLC xlc-16.1.0-1-ppc64le.NEED_PRODUCT_PKGS.tar.bz2 IBM XLF xlf-16.1.0-1-ppc64le.NEED_PRODUCT_PKGS.tar.bz2 AT at12.0 IBM SMPI ibm_smpi-10.02.00.09rtm5-rh7_20181010.ppc64le.rpm IBM Parallel Performance Toolkit ppt-2.4.0-2.tar.bz2 IBM IBM Engineering and Scientific Subroutine Library (IBM ESSL) essl-6.1.0-1-ppc64le IBM Parallel Engineering and Scientific Subroutine Library (IBM Parallel ESSL) pessl-5.4.0-0-ppc64le IBM Spectrum Scale 5.0.1.-2 The Portland Group, Inc. (PGI) pgilinux-2018-187-ppc64le.tar.gz IBM Spectrum Load Sharing Facility (IBM Spectrum LSF) lsf-10.1-build_number.ppc64le_csm.bin IBM Cluster Systems Management V1.3.0 IBM Burst Buffer V1.3.0 © Copyright International Business Machines Corporation 2019. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . ix Trademarks . .x Preface . xi Authors. xi Now you can become a published author, too! . xii Comments welcome. xiii Stay connected to IBM Redbooks . xiii Part 1. Planning. 1 Chapter 1. Introduction to IBM high-performance computing . 1 1.1 Overview of HPC. 2 1.2 Reasons for implementing HPC on IBM POWER9 processor-based systems. 2 1.3 Overview of the POWER9 processor-based CORAL Project . 3 Chapter 2. IBM Power System AC922 server for HPC overview. 7 2.1 Power AC922 server for HPC . 8 2.2 Functional component description . 8 2.2.1 Power AC922 models . 9 2.2.2 POWER9 processor . 10 2.2.3 Memory subsystem. 11 2.2.4 PCI adapters . 12 2.2.5 IBM CAPI2 . 15 2.2.6 NVLink 2.0 . 17 2.2.7 Baseboard management controller. 18 Chapter 3. Software stack . 19 3.1 Red Hat Enterprise Linux . 20 3.2 Device drivers . 20 3.2.1 Mellanox OpenFabrics Enterprise Distribution . 20 3.2.2 NVIDIA CUDA. 21 3.3 Development environment software . 23 3.3.1 IBM XL compilers . 23 3.3.2 LLVM. 23 3.3.3 IBM Engineering and Scientific Subroutine Library. 24 3.3.4 IBM Parallel Engineering and Scientific Subroutine Library . 25 3.3.5 IBM Spectrum MPI . 26 3.3.6 IBM Parallel Performance Toolkit . 27 3.4 Workload management . 28 3.4.1 IBM Spectrum Load Sharing Facility. 28 3.5 Cluster management software . 28 3.5.1 Extreme Cluster and Cloud Administration Toolkit . 28 3.5.2 Cluster Administration and Storage Tools. 29 3.5.3 Mellanox Unified Fabric Manager . 30 3.6 IBM Storage and file systems . 30 3.6.1 IBM Spectrum Scale . 30 3.6.2 IBM Elastic Storage Server. 31 © Copyright IBM Corp. 2019. All rights reserved. iii Chapter 4. Reference architecture. 33 4.1 Large HPC cluster architecture . 34 4.2 Medium HPC cluster architecture . 40 4.3 Software stack mapping . 41 4.4 Generic sizing of nodes. 42 4.5 Ethernet network layout . 43 4.6 InfiniBand network. 44 4.6.1 InfiniBand network topologies . 44 4.6.2 Fat tree topology . 44 4.7 Job launch overview . 46 Part 2. Deployment . 49 Chapter 5. Nodes and software deployment . 51 5.1 Deployment overview . 52 5.2 System management . 52 5.2.1 The Intelligent Platform Management Interface tool . 52 5.2.2 OpenBMC . 54 5.2.3 Firmware update . 55 5.2.4 Boot order configuration . 57 5.3 xCAT deployment overview . 59 5.3.1 xCAT database: Objects and tables . 60 5.3.2 xCAT node booting . 60 5.3.3 xCAT node discovery . 61 5.3.4 xCAT baseboard management controller discovery . 62 5.3.5 xCAT installation types: Disks and state. 62 5.3.6 xCAT network interfaces: Primary and additional . 62 5.3.7 xCAT software kits . 63 5.3.8 xCAT synchronizing files. ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    352 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us