LiCO 5.2.1 Installation Guide (for SLES) Second Edition (January 2019)

© Copyright Lenovo 2018, 2019. LIMITED AND RESTRICTED RIGHTS NOTICE: If data or software is delivered pursuant to a General Services Administration (GSA) contract, use, reproduction, or disclosure is subject to restrictions set forth in Contract No. GS-35F- 05925. Reading instructions

• To ensure that you get correct command lines using the copy/paste function, open this Guide with Adobe Acrobat Reader, a free PDF viewer. You can download it from the official Web site https://get.adobe.com/ reader/. • Replace values in angle brackets with the actual values. For example, when you see <*_USERNAME> and <*_PASSWORD>, enter your actual username and password. • Between the command lines and in the configuration files, ignore all annotations starting with #.

© Copyright Lenovo 2018, 2019 ii iii LiCO 5.2.1 Installation Guide (for SLES) Contents

Reading instructions...... ii Check the OS installation ...... 23 Check NFS ...... 23 Chapter 1. Overview...... 1 Check Slurm ...... 24 Introduction to LiCO ...... 1 Check Ganglia ...... 24 Typical cluster deployment ...... 1 Check MPI ...... 25 Operating environment ...... 2 Check Singularity ...... 25 Supported servers and chassis models ...... 3 Check OpenHPC installation ...... 25 Prerequisites ...... 3 List of LiCO dependencies to be installed. . . . . 25 Install RabbitMQ ...... 26 Chapter 2. Deploy the cluster Install PostgreSQL...... 26 environment ...... 5 Install InfluxDB ...... 26 Install an OS on the management node ...... 5 Install Confluent...... 27 Deploy the OS on other nodes in the cluster. . . . . 5 Configure user authentication ...... 28 Configure environment variables ...... 5 Install OpenLDAP-server ...... 28 Create a local repository ...... 8 Install libuser ...... 29 Install Lenovo xCAT ...... 8 Install OpenLDAP-client ...... 29 Prepare OS mirrors for other nodes ...... 9 Install nss-pam-Idapd ...... 29 Set xCAT node information ...... 9 Install Gmond GPU plug-in ...... 31 Add host resolution ...... 10 Configure DHCP and DNS services . . . . . 10 Chapter 4. Install LiCO ...... 33 Install a node OS through the network . . . . 10 List of LiCO components to be installed . . . . . 33 Create local repository for other nodes . . . . 10 Obtain the LiCO installation package ...... 34 Configure the memory for other nodes . . . . 11 Configure the local Zypper repository for LiCO. . . 34 Checkpoint A ...... 11 Install LiCO on the management node ...... 34 Install infrastructure software for nodes . . . . . 12 Install LiCO on the login node ...... 35 List of infrastructure software to be installed . . 12 Install LiCO on the compute nodes ...... 36 Configure a local Zypper repository for the management node ...... 12 Chapter 5. Configure LiCO ...... 37 Configure a local Zypper repository for login Configure the service account ...... 37 and compute nodes ...... 12 Configure cluster nodes ...... 37 Configure LiCO dependencies repositories . . 13 Room information ...... 37 Install Slurm ...... 13 Logic group information ...... 37 Configure NFS ...... 14 Room row information ...... 38 Configure NTP ...... 14 Rack information ...... 38 GPU driver installation ...... 15 Chassis information ...... 38 CUDA installation ...... 16 Node information ...... 39 Configure Slurm ...... 17 List of cluster services ...... 41 Install Ganglia...... 19 Infrastructure configuration ...... 42 Install MPI ...... 19 Database configuration ...... 42 Install Singularity ...... 20 Login configuration ...... 42 Checkpoint B ...... 21 Storage configuration ...... 42 Chapter 3. Install LiCO Scheduler configuration ...... 43 dependencies ...... 23 Alert configuration ...... 43 Cluster check...... 23 Cluster configuration ...... 43 Check environment variables...... 23 Functional configuration ...... 43 Check the LiCO dependencies repository . . . 23 User configuration ...... 44 Configure LiCO components...... 45

© Copyright Lenovo 2018, 2019 i lico-vnc-mond ...... 45 Change a user’s role ...... 54 lico-portal ...... 45 Resume a user ...... 54 lico-ganglia-mond ...... 46 Import a user ...... 54 lico-icinga-mond ...... 46 Import AI images ...... 54 lico-icinga-plugin ...... 47 Security improvement ...... 54 lico-confluent-proxy ...... 47 Binding settings ...... 54 lico-confluent-mond ...... 48 Firewall settings ...... 58 lico-wechat-agent ...... 48 SSHD settings ...... 59 lico-tensorboard-proxy ...... 48 Import system images ...... 59 Initialize the system ...... 49 Image bootstrap files ...... 60 Initialize users ...... 49 Create images ...... 60 Import images into LiCO as system-level Chapter 6. Start and log in to LiCO . . 51 images ...... 61 Start LiCO ...... 51 Slurm issues troubleshooting ...... 61 Log in to LiCO ...... 52 Node status check ...... 61 Configure LiCO services ...... 52 Memory allocation error ...... 61 Status setting error...... 62 Chapter 7. Appendix: Important InfiniBand issues troubleshooting ...... 62 information...... 53 Installation issues troubleshooting ...... 62 Configure VNC ...... 53 XCAT issues troubleshooting ...... 63 Standalone VNC installation ...... 53 Update OS packages ...... 63 VNC batch installation ...... 53 Edit nodes.csv from xCAT dumping data . . . . . 63 Configure the Confluent Web console ...... 53 Notices and trademarks ...... 64 LiCO commands ...... 54 Set the LDAP administrator password . . . . 54

ii LiCO 5.2.1 Installation Guide (for SLES) Chapter 1. Overview

Introduction to LiCO Lenovo Intelligent Computing Orchestration (LiCO) is an infrastructure management software for high- performance computing (HPC) and artificial intelligence (AI). It provides features like cluster management and monitoring, job scheduling and management, cluster user management, account management, and file system management.

With LiCO, users can centralize resource allocation in one supercomputing cluster and carry out HPC and AI jobs simultaneously. Users can perform operations by logging in to the management system interface with a browser, or by using command lines after logging in to a cluster login node with another shell.

Typical cluster deployment This Guide is based on the typical cluster deployment that contains management, login, and compute nodes.

Nodes BMC interface Nodes eth interface Parallel file system High speed network interface

Public network

Login node

High speed network Management node

TCP networking Compute node

Figure 1. Typical cluster deployment

© Copyright Lenovo 2018, 2019 1 Elements in the cluster are described in the table below.

Table 1. Description of elements in the typical cluster

Element Description Core of the HPC/AI cluster, undertaking primary functions such as cluster management, Management node monitoring, scheduling, strategy management, and user & account management.

Compute node Completes computing tasks.

Connects the cluster to the external network or cluster. Users must use the login node to log Login node in and upload application data, develop compilers, and submit scheduled tasks.

Provides a shared storage function. It is connected to the cluster nodes through a high- Parallel file system speed network. Parallel file system setup is beyond the scope of this Guide. A simple NFS setup is used instead.

Nodes BMC interface Used to access the node’s BMC system.

Nodes eth interface Used to manage nodes in cluster. It can also be used to transfer computing data. High speed network Optional. Used to support the parallel file system. It can also be used to transfer computing interface data.

Note: LiCO also supports the cluster deployment that only contains the management and compute nodes. In this case, all LiCO modules installed on the login node need to be installed on the management node.

Operating environment Cluster server:

Lenovo ThinkSystem servers

Operating system:

SUSE Linux Enterprise server (SLES) 12 SP3

Client requirements: • Hardware: CPU of 2.0 GHz or above, memory of 8 GB or above • Browser: Chrome (V 62.0 or higher) or Firefox (V 56.0 or higher) recommended • Display resolution: 1280 x 800 or above

2 LiCO 5.2.1 Installation Guide (for SLES) Supported servers and chassis models LiCO can be installed on certain servers, as listed in the table below.

Table 2. Supported servers

Product code Machine type Product name Appearance

Lenovo ThinkSystem sd530 7X21 SD530 (0.5U)

Lenovo ThinkSystem sr630 7X01, 7X02 SR630 (1U)

Lenovo ThinkSystem sr650 7X05, 7X06 SR650 (2U)

Lenovo ThinkSystem sd650 7X58 SD650 (2 nodes per 1U tray)

7Y36, 7Y37, Lenovo ThinkSystem sr670 7Y38 SR670 (2U)

LiCO can be installed on certain chassis models, as listed in the table below.

Table 3. Supported chassis models

Product code Machine type Model name Appearance

d2 7X20 D2 Enclosure (2U)

NeXtScale n1200 n1200 5456, 5468, 5469 (6U)

Prerequisites • Refer to LiCO best recipe to ensure that the cluster hardware uses proper firmware levels, drivers, and settings: https://support.lenovo.com/us/en/solutions/ht507011. • Refer to the OS part of LeSI 18C_SI best recipe to install the OS security patch: https:// support.lenovo.com/us/en/solutions/ht507611. • The installation described in this Guide is based on SLES 12 SP3. • A SLES-12-SP3-Server or SLES-12-SP3-SDK local repository should be added on management node. • Unless otherwise stated in this Guide, all commands are executed on the management node.

Chapter 1. Overview 3 • To enable the firewall, modify the firewall rules according to “Firewall settings” on page 58. • It is important to regularly patch and update components and the OS to prevent security vulnerabilities. For details about how to update OS packages, see “Update OS packages” on page 63. Additionally it is recommended that known updates at the time of installation be applied during or immediately after the OS deployment to the managed nodes and prior to the rest of the LiCO setup steps

4 LiCO 5.2.1 Installation Guide (for SLES) Chapter 2. Deploy the cluster environment

If the cluster environment already exists, you can skip this chapter.

Install an OS on the management node Install an official version of SLES 12 SP3 on the management node. You can select the minimum installation.

Run the following commands to configure the memory and restart OS: echo '* soft memlock unlimited' >> /etc/security/limits.conf echo '* hard memlock unlimited' >> /etc/security/limits.conf reboot

Deploy the OS on other nodes in the cluster

Configure environment variables Step 1. Log in to the management node. Step 2. Run the following commands to configure environment variables for the entire installation process: su root cd ~ vi lico_env.local Step 3. Run the following commands to edit the lico_env.local file:

# Management node hostname

sms_name="head"

# Set the domain name

domain_name="hpc.com"

# Set OpenLDAP domain name

lico_ldap_domain_name="dc=hpc,dc=com"

# set OpenLDAP domain component

lico_ldap_domain_component="hpc"

# IP address of management node in the cluster intranet

sms_ip="192.168.0.1"

# The network adapter name corresponding to the management node IP address

sms_eth_internal="eth0"

# Subnet mask in the cluster intranet. If all nodes in the cluster already have OS installed,

# retain the default configurations.

© Copyright Lenovo 2018, 2019 5 internal_netmask="255.255.0.0"

# BMC username and password

bmc_username="" bmc_password=""

# OS mirror pathway for xCAT

iso_path="/isos"

# Local repository directory for OS

os_repo_dir="/install/custom/server" sdk_repo_dir="/install/custom/sdk"

# Local repository directory for xCAT

xcat_repo_dir="/install/custom/xcat"

# link name of repository directory for Lenovo OpenHPC

link_ohpc_repo_dir="/install/custom/ohpc"

# link name of repository directory for LiCO

link_lico_repo_dir="/install/custom/lico"

# link name of repository directory for LiCO-dep

link_lico_dep_repo_dir="/install/custom/lico-dep"

# Local repository directory for Lenovo OpenHPC, please change it according to this version.

ohpc_repo_dir="/install/custom/ohpc-1.3.5"

# LiCO repository directory for LiCO, please change it according to this version.

lico_repo_dir="/install/custom/lico-5.2.1"

# LiCO repository directory for LiCO-dep, please change it according to this version.

lico_dep_repo_dir="/install/custom/lico-dep-5.2.1"

# Total compute nodes

num_computes="2"

# Prefix of compute node hostname. If OS has already been installed on all the nodes of the

# cluster, change the configuration according to actual conditions.

compute_prefix=""

# Compute node hostname list. If OS has already been installed on all the nodes of the

6 LiCO 5.2.1 Installation Guide (for SLES) # cluster, change the configuration according to actual conditions. c_name[0]=c1 c_name[1]=c2

# Compute node IP list. If OS has already been installed on all the

# nodes of the cluster, change the configuration according to actual conditions. c_ip[0]=192.168.0.6 c_ip[1]=192.168.0.16

# Network interface card MAC address corresponding to the compute node IP. If OS has

# already been installed on all the nodes of the cluster, change the configuration

# according to actual conditions. c_mac[0]=fa:16:3e:73:ec:50 c_mac[1]=fa:16:3e:27:32:c6

# Compute node BMC address list. c_bmc[0]=192.168.1.6 c_bmc[1]=192.168.1.16

# Total login nodes. If there is no login node in the cluster, the number of logins must be "0".

# And the 'l_name', 'l_ip', 'l_mac', and 'l_bmc' lines need to be removed. num_logins="1"

# Login node hostname list. If OS has already been installed on all nodes in the cluster,

# change the configuration according to actual conditions. l_name[0]=l1

# Login node IP list. If OS has already been installed on all the nodes of the cluster,

# change the configuration according to actual conditions. l_ip[0]=192.168.0.15

# Network interface card MAC address corresponding to the login node IP.

# If OS has already been installed on all nodes in the cluster, change the configuration

# according to actual conditions. l_mac[0]=fa:16:3e:2c:7a:47

# Login node BMC address list. l_bmc[0]=192.168.1.15

Chapter 2. Deploy the cluster environment 7 Step 4. Save the changes to lico_env.local.This Guide assumes that the node's BMC username and password are consistent. If they are inconsistent, they need to be modified during the installation. Step 5. Run the following commands to make the configuration file take effect: chmod 600 lico_env.local source lico_env.local

After the cluster environment is set up, configure the IP address of the public network on the login or management node. In this way, you can log in to LiCO Web portal from external network.

Create a local repository Step 1. Run the following command to create a directory for ISO storage: mkdir -p ${iso_path} Step 2. Download SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso and SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1. iso from the official Web site. Record MD5SUM result from your download Web site. Step 3. Copy the file to ${iso_path}. Step 4. Run the following commands:

# Compare md5sum result with original to check if ISO file is valid.

cd ${iso_path} md5sum SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso md5sum SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso cd ~

# mount image

mkdir -p ${os_repo_dir} mkdir -p ${sdk_repo_dir} mount -o loop ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso ${os_repo_dir} mount -o loop ${iso_path}/SLE-12-SP3-SDK-DVD-x86_64-GM-DVD1.iso ${sdk_repo_dir}

# Configure local repository

cat << eof > ${iso_path}/SLES12-SP3-12.3.repo [SLES12-SP3-12.3-SERVER] name=sle12-server enabled=1 autorefresh=0 gpgcheck=0 baseurl=file://${os_repo_dir} [SLES12-SP3-12.3-SDK] name=sle12-sdk enabled=1 autorefresh=0 gpgcheck=0 baseurl=file://${sdk_repo_dir} eof cp ${iso_path}/SLES12-SP3-12.3.repo /etc/zypp/repos.d

Install Lenovo xCAT Step 1. Download the package from https://hpc.lenovo.com/downloads/18c/xcat-2.14.3.lenovo5_confluent- 2.1.0-sles12..bz2. Step 2. Upload the package to the /root directory on the management node. Step 3. Run the following commands:

8 LiCO 5.2.1 Installation Guide (for SLES) # Create xcat local repository

mkdir -p $xcat_repo_dir cd /root tar -xvf xcat-2.14.3.lenovo5_confluent-2.1.0-sles12.tar.bz2 -C $xcat_repo_dir cd $xcat_repo_dir/lenovo-hpc-sles12 ./mklocalrepo.sh cd ~

# Install xcat

zypper --gpg-auto-import-keys install -y --force-resolution xCAT source /etc/profile.d/xcat.sh

Prepare OS mirrors for other nodes Step 1. Run the following command to prepare the OS image for the other nodes: copycds ${iso_path}/SLE-12-SP3-Server-DVD-x86_64-GM-DVD1.iso \ -a x86_64 Step 2. Run the following command to confirm that the OS image has been copied: lsdef -t osimage

# Output should be

sles12.3-x86_64-install-compute (osimage) sles12.3-x86_64-install-service (osimage) sles12.3-x86_64-netboot-compute (osimage) sles12.3-x86_64-statelite-compute (osimage) Step 3. Run the following command to disable the Nouveau module: chdef -t osimage sles12.3-x86_64-install-compute addkcmdline=\ "rdblacklist=nouveau nouveau.modeset=0 R::modprobe.blacklist=nouveau"

Note: Nouveau module is an accelerated open-source driver for cards. This module should disabled before the installation of CUDA driver.

Set xCAT node information Step 1. Run the following commands to import the compute node configuration in the lico_env.local file to xCAT: for ((i=0; i<$num_computes; i++)); do mkdef -t node ${c_name[$i]} groups=compute,all arch=x86_64 netboot=xnba mgt=ipmi \ bmcusername=${bmc_username} bmcpassword=${bmc_password} ip=${c_ip[$i]} \ mac=${c_mac[$i]} bmc=${c_bmc[$i]} serialport=0 serialspeed=115200; done Step 2. Run the following commands to import the login node configuration in the lico_env.local file to xCAT: for ((i=0; i<$num_logins; i++)); do mkdef -t node ${l_name[$i]} groups=login,all arch=x86_64 netboot=xnba mgt=ipmi \ bmcusername=${bmc_username} bmcpassword=${bmc_password} ip=${l_ip[$i]} \ mac=${l_mac[$i]} bmc=${l_bmc[$i]} serialport=0 serialspeed=115200; done Step 3. (Optional) If the BMC username and password of the node are inconsistent, run the following command to make them consistent: tabedit ipmi Step 4. Run the following command to configure the root account password for the node: chtab key=system passwd.username=root passwd.password=

Chapter 2. Deploy the cluster environment 9 Add host resolution Note: If the cluster already has the OS installed and can resolve the IP address through the hostname, skip this section.

Run the following commands to add host resolution: chdef -t site domain=${domain_name} chdef -t site master=${sms_ip} chdef -t site nameservers=${sms_ip} sed -i "/^\s*${sms_ip}\s*.*$/d" /etc/hosts sed -i "/\s*${sms_name}\s*/d" /etc/hosts echo "${sms_ip} ${sms_name} ${sms_name}.${domain_name} " >> /etc/hosts makehosts

Configure DHCP and DNS services Note: If all nodes in the cluster have the OS installed, skip this step.

Run the following commands to configure DHCP and DNS services: makenetworks makedhcp -n makedhcp -a makedns -n echo "search ${domain_name}" > /etc/resolv.conf echo "nameserver ${sms_ip}" >> /etc/resolv.conf

Note: Please reference to the following two links to make sure that the management node is pointing at the same DNS as other nodes: https://sourceforge.net/p/xcat/wiki/XCAT_iDataPlex_Cluster_Quick_Start/#install-xcat-on-the-management-node https://sourceforge.net/p/xcat/wiki/Cluster_Name_Resolution/

Install a node OS through the network Note: If all nodes in the cluster have the OS installed, skip this section.

Run the following commands to set and install the necessary OS mirror: nodeset all osimage=sles12.3-x86_64-install-compute rsetboot all net -u rpower all reset

Note: It takes several minutes to complete the OS installation. You can use the following command to check the progress: nodestat all

Create local repository for other nodes Run the following commands: cp /etc/zypp/repos.d/SLES12-SP3-12.3.repo /var/tmp sed -i '/^baseurl=/d' /var/tmp/SLES12-SP3-12.3.repo sed -i "/name=sle12-server/a\baseurl=http://${sms_name}/${os_repo_dir}" \ /var/tmp/SLES12-SP3-12.3.repo sed -i "/name=sle12-sdk/a\baseurl=http://${sms_name}/${sdk_repo_dir}" \ /var/tmp/SLES12-SP3-12.3.repo xdcp all /var/tmp/SLES12-SP3-12.3.repo /etc/zypp/repos.d

10 LiCO 5.2.1 Installation Guide (for SLES) Configure the memory for other nodes Run the following commands: xdcp all /etc/security/limits.conf /etc/security/limits.conf psh all reboot

Checkpoint A Run the following commands to check and ensure that the installation is complete: psh all uptime

# Output should be c1: 05:03am up 0:02, 0 users, load average: 0.20, 0.13, 0.05 c2: 05:03am up 0:02, 0 users, load average: 0.20, 0.14, 0.06 l1: 05:03am up 0:02, 0 users, load average: 0.17, 0.13, 0.05 ……

Note: If you cannot pass this checkpoint, ensure that the xCAT is successfully installed on the head node, and passwordless ssh is set between management node and other nodes. For example, copying the id_rsa and id_rsa.pub files from management node to other nodes. Then you can try this checkpoint again.

Chapter 2. Deploy the cluster environment 11 Install infrastructure software for nodes

List of infrastructure software to be installed Note: In the Installation node column, M stands for “Management node”, L stands for “Login node”, and C stands for “Compute node”.

Table 4. Infrastructure software to be installed Software name Component Version Service name Installation node Notes name nfs nfs-utils 1.3.0 nfs-server M ntp ntp 4.2.6 ntpd M

slurm ohpc-slurm- 1.3.5 munge, M server slurmctld ohpc-slurm-client 1.3.5 munge, slurmd C, L

ganglia ganglia-gmond- 3.7.2 gmond M, C, L ohpc

singularity singularity-ohpc 2.5.2 M

cudnn 7 C Required only on the GPU cuda 9.2 C node mpi openmpi3-gnu7- 3.1.0 M At least one ohpc MPI type required mpich-gnu7-ohpc 3.2.1 M

mvapich2-gnu7- 2.2 M ohpc

Configure a local Zypper repository for the management node Step 1. Download the package fromhttps://hpc.lenovo.com/lico/downloads/5.2.1/Lenovo-OpenHPC- 1.3.5.SLES.x86_64.tar. Step 2. Upload the package to the /root directory on the management node. Step 3. Run the following commands to configure the local Lenovo OpenHPC repository: mkdir -p $ohpc_repo_dir cd /root tar xvf Lenovo-OpenHPC-1.3.5.SLES.x86_64.tar -C $ohpc_repo_dir rm -rf $link_ohpc_repo_dir ln -s $ohpc_repo_dir $link_ohpc_repo_dir $link_ohpc_repo_dir/make_repo.sh rpm --import $link_ohpc_repo_dir/SLE_12/repodata/repomd.xml.key

Configure a local Zypper repository for login and compute nodes Run the following commands to configure a local repository for login and compute nodes: cp /etc/zypp/repos.d/Lenovo.OpenHPC.local.repo /var/tmp sed -i '/^baseurl=/d' /var/tmp/Lenovo.OpenHPC.local.repo sed -i '/^gpgkey=/d' /var/tmp/Lenovo.OpenHPC.local.repo

12 LiCO 5.2.1 Installation Guide (for SLES) echo "baseurl=http://${sms_name}/${link_ohpc_repo_dir}/SLE_12"\ >>/var/tmp/Lenovo.OpenHPC.local.repo echo "gpgkey=http://${sms_name}/${link_ohpc_repo_dir}/SLE_12/repodata/repomd.xml.key"\ >>/var/tmp/Lenovo.OpenHPC.local.repo

# Distribute repo files xdcp all /var/tmp/Lenovo.OpenHPC.local.repo /etc/zypp/repos.d/ psh all rpm --import http://${sms_name}/${link_ohpc_repo_dir}\ /SLE_12/repodata/repomd.xml.key psh all echo -e %_excludedocs 1 \>\> ~/.rpmmacros

Configure LiCO dependencies repositories Step 1. Download the package from https://hpc.lenovo.com/lico/downloads/5.2.1/lico-dep-5.2.1.bp123.x86_ 64.tgz. Step 2. Upload the package to the /root directory on the management node. Step 3. Run the following commands to configure the Zypper repository for the management node: mkdir -p $lico_dep_repo_dir cd /root tar -xvf lico-dep-5.2.1.bp123.x86_64.tgz -C $lico_dep_repo_dir rm -rf $link_lico_dep_repo_dir ln -s $lico_dep_repo_dir $link_lico_dep_repo_dir $link_lico_dep_repo_dir/mklocalrepo.sh rpm --import $link_lico_dep_repo_dir/RPM-GPG-KEY-LICO-DEP-SLE12

Attention: Before running the commands, ensure that the management node has configured a local Zypper repository for the above and subsequent actions. Step 4. (Optional) If the cluster already exists, check to ensure that your version is consistent with “List of LiCO dependencies to be installed” on page 25. Step 5. Run the following commands to configure the Zypper repository for other nodes: cp /etc/zypp/repos.d/lico-dep.repo /var/tmp sed -i '/^baseurl=/d' /var/tmp/lico-dep.repo sed -i '/^gpgkey=/d' /var/tmp/lico-dep.repo echo "baseurl=http://${sms_name}/${link_lico_dep_repo_dir}"\ >>/var/tmp/lico-dep.repo echo "gpgkey=http://${sms_name}/${link_lico_dep_repo_dir}/RPM-GPG-KEY\ -LICO-DEP-SLE12" >> /var/tmp/lico-dep.repo

# Distribute repo files

xdcp all /var/tmp/lico-dep.repo /etc/zypp/repos.d psh all rpm --import http://${sms_name}/${link_lico_dep_repo_dir}/RPM-GPG\ -KEY-LICO-DEP-SLE12

Install Slurm Step 1. Run the following commands to install the base package: zypper install lenovo-ohpc-base Step 2. Run the following commands to install Slurm: zypper install ohpc-slurm-server Step 3. Run the following command to install the Slurm client: psh all zypper install -y --force-resolution ohpc-base-compute \ ohpc-slurm-client lmod-ohpc Step 4. (Optional) Run the following commands to prevent non-root logins to the compute nodes:

Chapter 2. Deploy the cluster environment 13 psh all echo "\""account required pam_slurm.so"\""\ \>\> /etc/pam.d/sshd

Note: Do not perform this step if you want to allow non-root logins to the compute nodes regardless of whether a Slurm job is running on those nodes. If this step is performed, non-root logins to the compute nodes will only be allowed if a Slurm job is already running on those nodes under a particular username. In this case, non-root ssh logins will work for that particular username for the duration that Slurm job is running on those nodes.

Configure NFS Step 1. Run the following commands to create the shared directory /opt/ohpc/pub:

# Management node sharing /opt/ohpc/pub for OpehHPC

zypper install nfs-kernel-server echo "/opt/ohpc/pub *(ro,no_subtree_check,fsid=11)" >> /etc/exports exportfs -a

# Configure shared directory for cluster nodes

psh all zypper install -y --force-resolution nfs-client psh all mkdir -p /opt/ohpc/pub psh all echo "\""${sms_ip}:/opt/ohpc/pub /opt/ohpc/pub nfs \ nfsvers=3,nodev,noatime 0 0"\"" \>\> /etc/fstab

# Mount shared directory

psh all mount /opt/ohpc/pub

Note: This directory is mandatory. If you have already shared this directory from the management node and mounted it on all other nodes, you can skip this step. Step 2. Run the following commands to create the user shared directory (taking /home as an example):

# Management node sharing /home

echo "/home *(rw,no_subtree_check,fsid=10,no_root_squash)"\ >> /etc/exports exportfs -a

# If /home already mounted, unmount it first

psh all "sed -i '/ \/home /d' /etc/fstab" psh all umount /home

# Configure shared directory for cluster nodes

psh all echo "\""${sms_ip}:/home /home nfs nfsvers=3,nodev,nosuid,noatime \ 0 0"\"" \>\> /etc/fstab

# Mount shared directory

psh all mount /home

Configure NTP Note: If the NTP service has already been configured for nodes in the cluster, skip this section.

14 LiCO 5.2.1 Installation Guide (for SLES) Run the following commands to configure NTP: echo "server 127.127.1.0" >> /etc/ntp.conf echo "fudge 127.127.1.0 stratum 10" >> /etc/ntp.conf systemctl enable ntpd systemctl restart ntpd psh all zypper install -y --force-resolution ntp psh all echo "\""server ${sms_ip}"\"" \>\> /etc/ntp.conf psh all systemctl enable ntpd psh all systemctl restart ntpd

# check service psh all "ntpq -p | tail -n 1"

GPU driver installation You must install the GPU driver on all the GPU compute nodes. If only a subset of nodes have GPUs, replace the compute argument in psh commands with the node range corresponding to GPU nodes.

Install the GPU driver Step 1. Run the following commands to configure the operating system to start on the text console and then restart the system: psh compute systemctl set-default multi-user.target psh compute reboot

Note: This step is required only when the operating system is configured to start on a graphical desktop. Step 2. Download the NVIDIA driver from http://us.download.nvidia.com/tesla/396.26/nvidia-diag-driver-local- repo-sles123-396.26-1.0-1.x86_64.rpm and copy it to the shared directory /home. Step 3. Run the following commands: psh compute rpm -ivh /home/nvidia-diag-driver-local-repo-sles123-396.26-1.0-1.x86_64.rpm psh compute zypper --gpg-auto-import-keys install -y --force-resolution cuda-drivers psh compute perl -pi -e "s/NVreg_DeviceFileMode=0660/NVreg_DeviceFileMode=0666/" \ /etc/modprobe.d/50-nvidia-default.conf Step 4. Run the following command to restart the compute nodes with GPU: psh compute reboot -h now Step 5. Check to ensure that all nodes are restarted referring to “Checkpoint A” on page 11. Step 6. Run the following command on the GPU nodes to determine if the GPU can be identified: psh compute nvidia-smi | xcoll

Configure automatic start for the GPU driver Run the following commands to configure automatic start for the GPU driver: psh compute zypper install -y pciutils

# Add configure file cat << eof > /usr/lib//system/nvidia-persistenced.service [Unit] Description=NVIDIA Persistence Daemon Before=slurmd.service Wants=syslog.target [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/modprobe nvidia-uvm

Chapter 2. Deploy the cluster environment 15 [Install] WantedBy=multi-user.target eof

# Distribute configure file xdcp compute /usr/lib/systemd/system/nvidia-persistenced.service /usr/lib/systemd/system/\ nvidia-persistenced.service

# Restart service psh compute systemctl daemon-reload psh compute systemctl enable nvidia-persistenced psh compute systemctl start nvidia-persistenced

CUDA installation Actions in this section are optional. The system container images contain CUDA and cuDNN. Jobs are running in this container environment.

You must install CUDA and cuDNN on all the GPU compute nodes. If only a subset of nodes have GPUs, replace the compute argument in psh commands with the node range corresponding to GPU nodes.

Install CUDA If an updated kernel is installed on the system (For example, for the Spectre/Meltdown security issue, it can be obtained from https://download.suse.com/Download?buildid= zkkoONhUDd0~), and a kernel-devel package that matches the running kernel has been installed, skip the ‘kernel-devel’ package installation.

Step 1. Download CUDA9.2 cuda_9.2.148_396.37_linux.run and CUDA9.2 patch cuda_9.2.148.1_linux.run from https://developer.nvidia.com/cuda-downloads and copy them to the shared directory /home.

Note: If the file obtained does not contain the .run suffix, add the .run suffix manually. Step 2. Run the following commands to install CUDA: psh compute zypper install -y --force-resolution gcc gcc-c++

Note: If gcc and gcc-c++ have been installed, ignore the command above. psh compute chmod +x /home/cuda_9.2.148_396.37_linux.run psh compute /home/cuda_9.2.148_396.37_linux.run --silent --toolkit --samples \ --no--libs --verbose --override Step 3. Run the following commands to install CUDA patch: psh compute chmod +x /home/cuda_9.2.148.1_linux.run psh compute /home/cuda_9.2.148.1_linux.run --silent --accept-eula Step 4. The following messages will be output on installing CUDA on a system without the X window system components installed:

CUDA install missing libs: • Missing recommended library: libGLU.so • Missing recommended library: libX11.so • Missing recommended library: libXi.so • Missing recommended library: libXmu.so

Samples:

Installed in /root, but missing recommended libraries

16 LiCO 5.2.1 Installation Guide (for SLES) These messages can be safely ignored.

Install cuDNN Step 1. Download cuDNN 7.4.1 (cudnn-9.2-linux-x64-v7.4.1.5.tgz) from https://developer.nvidia.com/cudnn to directory /root. Step 2. Run the following commands to install cuDNN: cd ~ tar -xvf cudnn-9.2-linux-x64-v7.4.1.5.tgz xdcp compute cuda/include/cudnn.h /usr/local/cuda/include xdcp compute cuda/lib64/libcudnn_static.a /usr/local/cuda/lib64 xdcp compute cuda/lib64/libcudnn.so.7.4.1 /usr/local/cuda/lib64 psh compute "ln -s /usr/local/cuda/lib64/libcudnn.so.7.4.1 /usr/local/\ cuda/lib64/libcudnn.so.7" psh compute "ln -s /usr/local/cuda/lib64/libcudnn.so.7 /usr/local/cuda/lib64/libcudnn.so" psh compute chmod a+r /usr/local/cuda/include/cudnn.h psh compute "chmod a+r /usr/local/cuda/lib64/libcudnn*"

Configure CUDA environment variables To ensure the proper operation of the CUDA package, set certain environment variables by modifying the configuration files described in the subsequent commands. To facilitate the deployment of these files and the CUDA environment variables across all compute nodes in the cluster that contain GPUs, execute these commands on the management node, even though CUDA is not installed on the management node.

Step 1. Run the following commands to set the CUDA environment variables: echo "/usr/local/cuda/lib64" >> /etc/ld.so.conf.d/cuda.conf echo "export CUDA_HOME=/usr/local/cuda" >> /etc/profile.d/cuda.sh echo "export PATH=/usr/local/cuda/bin:\$PATH" >> /etc/profile.d/cuda.sh Step 2. Run the following commands to distribute configuration files: xdcp compute /etc/ld.so.conf.d/cuda.conf /etc/ld.so.conf.d/cuda.conf xdcp compute /etc/profile.d/cuda.sh /etc/profile.d/cuda.sh Step 3. Run the following commands on the GPU nodes to determine if the GPU can be identified: psh compute ldconfig psh compute nvidia-smi | xcoll psh compute "cd /root/NVIDIA_CUDA-9.2_Samples/1_Utilities/deviceQuery;\ make; ./deviceQuery" | xcoll

Configure Slurm Step 1. Download slurm.conf from https://hpc.lenovo.com/lico/downloads/5.2.1/examples/conf/ to /etc/ slurm on the management node, and modify this file according to the instructions in “slurm.conf” on page 18. Step 2. Run the following commands to distribute the configuration: xdcp all /etc/slurm/slurm.conf /etc/slurm/slurm.conf xdcp all /etc/munge/munge.key /etc/munge/munge.key Step 3. For GPU nodes only: Download gres.conf from https://hpc.lenovo.com/lico/downloads/5.2.1/ examples/conf/ to /etc/slurm on the GPU node, and modify this file according to the instructions in “gres.conf” on page 18. Step 4. Run the following commands to start the service:

# Start management node service

systemctl enable munge systemctl enable slurmctld systemctl restart munge systemctl restart slurmctld

Chapter 2. Deploy the cluster environment 17 #Start other node service

psh all systemctl enable munge psh all systemctl restart munge psh all systemctl enable slurmd psh all systemctl restart slurmd slurm.conf The following typical fields need to be configured: • Cluster name: ClusterName=mycluster • Management node name: ControlMachine=c031 • GPU scheduling: GresTypes=gpu

Note: In the cluster, this entry is used when a GPU node is included. If the cluster includes no GPU node, delete this entry. • Cluster node definitions: NodeName=c031 Gres=gpu:4, CPUs=28 RealMemory=200000 State=UNKNOWN NodeName=c032 Gres=gpu:4, CPUs=28 RealMemory=200000 State=UNKNOWN

– Gres: Number of GPUs – CPUs: Number of CPUs on a node. – RealMemory: Memory size of a node (Unit: M). • Partition definitions: PartitionName=compute Nodes=c0[31-32] Default=YES MaxTime=INFINITE State=UP PartitionName=compute1 Nodes=c0[31-32] Default=NO MaxTime=INFINITE State=UP

Notes: – Default: identifies whether this partition is the default partition. When submitting a job, you can select a partition. If you do not select a partition, the default partition is used. – Nodes: the NodeName list. If NodeName is irregular, Nodes=[nodename1,nodename2,...] is allowed. • Enforced part limit definitions: EnforcePartLimits=ALL

Attention: Use this configuration if you want to submit a direct error response when a job requests resources that exceed the cluster resource amount. Otherwise, the job remains in the queue.

For more details about how to configure slurm.conf, refer to the official Slurm site: https:// slurm.schedmd.com/slurm.conf.html. gres.conf This configuration file describes the GPUs installed on the GPU nodes and the GPU memory. The content of this file may vary based on the GPU node.

Modify the following content:

Name=gpu File=/dev/nvidia[0-3]

Note: In /dev/nvidia[0-3] , [0–3] should be changed to your actual GPU configuration. For example, /dev/ nvidia0 means one GPU card, whereas /dev/nvidia[0-1] means two GPU cards.

18 LiCO 5.2.1 Installation Guide (for SLES) Install Ganglia If the cluster monitoring is based on Ganglia, run the following commands on the management node to install Ganglia: zypper install ganglia-gmond-ohpc

# Download gmond.conf from

# https://hpc.lenovo.com/lico/downloads/5.2.1/examples/conf/ganglia/management/

# and copy it to the /etc/ganglia/ directory on the management node, then modify

# the hostname in the /etc/ganglia/gmond.conf file to the management

# node's hostname for the udp_send_channel setting. echo net.core.rmem_max=10485760 > /usr/lib/sysctl.d/gmond.conf echo net.core.rmem_max=10485760 > /etc/sysctl.d/90-gmond.conf /usr/lib/systemd/systemd-sysctl gmond.conf sysctl -w net.core.rmem_max=10485760

# Install Ganglia on other nodes psh all zypper install -y --force-resolution ganglia-gmond-ohpc

# Download gmond.conf from https://hpc.lenovo.com/lico/downloads/5.2.1/examples/conf/ganglia/,

# and copy it to the /var/tmp/ directory of the management node, then modify

# the hostname in the /var/tmp/gmond.conf file to the management

# node's hostname for the udp_send_channel setting.

# Distribute configuration xdcp all /var/tmp/gmond.conf /etc/ganglia/gmond.conf

# Start management node service systemctl enable gmond systemctl start gmond

# Start other nodes service psh all systemctl enable gmond psh all systemctl start gmond

# Run the command to see of all the nodes are listed gstat -a

Install MPI Step 1. Run the following commands to install three modules (OpenMPI, MPICH, and MVAPICH) to the system: zypper install openmpi3-gnu7-ohpc mpich-gnu7-ohpc mvapich2-gnu7-ohpc

Chapter 2. Deploy the cluster environment 19 Step 2. Set the default module.

Run the following command to set OpenMPI module as the default: zypper install lmod-defaults-gnu7-openmpi3-ohpc

Run the following command to set the MPICH module as the default: zypper install lmod-defaults-gnu7-mpich-ohpc

Run the following command to set the MVAPICH module as the default: zypper install lmod-defaults-gnu7-mvapich2-ohpc

Note: MVAPICH requires that Infiniband or OPA is present and working correctly. The following packages should be installed to support Infiniband or OPA: zypper -n install libibmad5 librdmacm1 rdma infinipath-psm dapl-devel dapl-utils \ libibverbs-utils

Interconnect support between MPI types Table 5. Interconnect support between MPI types

Ethernet (TCP) InfiniBand Omni-Path

MPICH ○ MVAPICH2 ○ MVAPICH2 (psm2) ○

OpenMPI ○ ○ ○

OpenMPI (PMIx) ○ ○ ○

Dependencies between MPI types Observe the following dependencies for MPI installation: • To use MVAPICH2 (psm2), install mvapich2-psm2-gnu7-ohpc. • To use OpenMPI (PMIx), install openmpi3-pmix-slurm-gnu7-ohpc. • openmpi3-gnu7-ohpc and openmpi3-pmix-slurm-gnu7-ohpc are incompatible. • mvapich2-psm2-gnu7-ohpc and mvapich2-gnu7-ohpc are incompatible.

Install Singularity Singularity is an HPC-facing lightweight container framework.

Step 1. Run the following command to install Singularity: zypper install singularity-ohpc Step 2. Edit the file /opt/ohpc/pub/modulefiles/ohpc by adding the following content to the end of the module try-add block: module try-add singularity Step 3. In the module del block, add the following content as the first line: module del singularity Step 4. Run the following command: source /etc/profile.d/lmod.sh

Changes to /opt/ohpc/pub/modulefiles/ohpc may be lost when the default modules are changed with the installation of the lmod-defaults* package. In that case, either modify /opt/ohpc/pub/modulefiles/ohpc again, or add module try-add singularity to the end of /etc/profile.d/lmod.sh.

20 LiCO 5.2.1 Installation Guide (for SLES) Checkpoint B Step 1. Run the following command to test if Slurm is properly installed: sinfo

# Output should be

PARTITION AVAIL TIMELIMIT NODES STATE NODELIST normal* up 1-00:00:00 2 idle c[1-2] ……

Note: The status of all nodes should be idle; idle* is unacceptable. Step 2. Run the following command to add a test account: psh all useradd test --uid 65530 Step 3. Log in with the test account and use the Slurm distributed test program: su - test mpicc -O3 /opt/ohpc/pub/examples/mpi/hello.c srun -n 8 -N 1 -w -p --pty /bin/bash prun ./a.out

# Output should be

Master compute host = c1 Resource manager = slurm Launch cmd = mpiexec.hydra -bootstrap slurm ./a.out Hello, world (8 procs total) --> Process # 0 of 8 is alive. -> c1 --> Process # 4 of 8 is alive. -> c2 --> Process # 1 of 8 is alive. -> c1 --> Process # 5 of 8 is alive. -> c2 --> Process # 2 of 8 is alive. -> c1 --> Process # 6 of 8 is alive. -> c2 --> Process # 3 of 8 is alive. -> c1 --> Process # 7 of 8 is alive. -> c2 Step 4. Run the following command to end the test: exit

Note: If you need to leave from “test” user session,type “exit” again. Step 5. Run the following command to remove test user: psh all userdel test -r

After the command is completed, you are switched to the root user of the management node.

Chapter 2. Deploy the cluster environment 21 22 LiCO 5.2.1 Installation Guide (for SLES) Chapter 3. Install LiCO dependencies

Cluster check If you skipped Chapter 2 “Deploy the cluster environment” on page 5, follow this section to make sure that the cluster environment is ready. Otherwise, proceed to “List of LiCO dependencies to be installed” on page 25.

Check environment variables Run the following commands to check the environment variables ${sms_name}, ${lico_ldap_domain_name}, and ${lico_repo_dir}: echo $sms_name;echo $lico_repo_dir;echo $lico_ldap_domain_name

# Output should be like as below head /install/custom/lico dc=hpc,dc=com

Note: If there is no output, see “Configure environment variables” on page 5.

Check the LiCO dependencies repository Run the following commands to check the LiCO dependencies repository: zypper repos|grep lico-dep-local

# Output should be like as below

5 | lico-dep-local | lico-dep-local(sle12) | Yes | (r ) Yes | No

Note: If there is no output, see “Configure LiCO dependencies repositories” on page 13.

Check the OS installation Go to “Checkpoint A” on page 11 to check the OS installation for the cluster. If the OS installation check fails, reconfigure the cluster OS referring to “Deploy the OS on other nodes in the cluster” on page 5.

Check NFS Note: If the cluster does not use NFS as the distributed file system, skip this section.

Run the following commands to check the NFS service on the management node: systemctl status nfs | grep Active

# Output should be like as below

Active: active (exited) since Tue 2018-07-24 19:02:52 CST; 1 months 20 days ago exportfs -v | grep -E '/home|/opt/ohpc/pub'

# Output should be like as below

© Copyright Lenovo 2018, 2019 23 /opt/ohpc/pub (ro,sync,wdelay,hide,no_subtree_check,fsid=11,sec=sys,secure,root_squash,no_all_squash) /home (rw,sync,wdelay,hide,no_subtree_check,fsid=10,sec=sys,secure,no_root_squash,no_all_squash)

Note: If the status is not 'active (exited)' or there is no output for exportfs, go to “Configure NFS” on page 14.

Run the following commands to check the mounting points on all other nodes: psh all df | grep -E '/home | /opt/ohpc/pub'

# Output should be like as below c032: 10.1.1.31:/home 485642240 111060992 374581248 23% /home c032: 10.1.1.31:/opt/ohpc/pub 485642240 111060992 374581248 23% /opt/ohpc/pub

Note: If the status is no output, go to “Check NFS” on page 23.

Check Slurm Run the following commands to check slurmctld on the management node: systemctl status slurmctld | grep Active

# Output should be like as below

Active: active (running) since Tue 2018-07-24 19:02:49 CST; 1 months 20 days ago

Note: If the status is not 'active (running)', go to “Install Slurm” on page 13 and “Configure Slurm” on page 17.

Run the following commands to check slurmd on the compute nodes: psh compute systemctl status slurmd | grep Active

# Output should be like as below c031: Active: active (running) since Tue 2018-07-24 19:02:49 CST; 1 months 20 days ago c032: Active: active (running) since Sat 2018-07-21 17:16:59 CST; 1 months 23 days ago

Note: If the output does not contain all compute nodes, go to “Install Slurm” on page 13 and “Configure Slurm” on page 17.

Check Ganglia Run the following commands to check gmond on all nodes: systemctl status gmond|grep Active

# Output should be like as below

Active: active (running) since Tue 2018-07-24 19:03:21 CST; 1 months 21 days ago psh all systemctl status gmond|grep Active

# Output should be like as below c031: Active: active (running) since Tue 2018-07-24 19:03:21 CST; 1 months 21 days ago c032: Active: active (running) since Sat 2018-07-21 17:17:31 CST; 1 months 24 days ago

24 LiCO 5.2.1 Installation Guide (for SLES) Note: If gmond is not running on all nodes, go to “Install Ganglia” on page 19.

Check MPI Run the following command to check MPI on the management node: module list

# Output should be like as below

Currently Loaded Modules: 1) prun/1.2 2) gnu7/7.2.0 3) mpich/3.2 4) singularity/2.5 5) ohpc

Note: If the output does not contain any of ‘openmpi3’, ‘mpich’, and ‘avmpich2’, go to “Install MPI” on page 19.

Check Singularity Run the following command to check Singularity on the management node: module list

# Output should be like as below

Currently Loaded Modules: 1) prun/1.2 2) gnu7/7.2.0 3) mpich/3.2 4) singularity/2.5 5) ohpc

Note: If the output does not contain ‘singularity’, go to “Install Singularity” on page 20.

Check OpenHPC installation Go to “Checkpoint B” on page 21 to check the OpenHPC installation for the cluster. If the OpenHPC installation check fails, reconfigure OpenHPC referring to “Install infrastructure software for nodes” on page 12.

List of LiCO dependencies to be installed Note: In the Installation node column, M stands for “Management node”, L stands for “Login node”, and C stands for “Compute node”.

Table 6. LiCO dependencies to be installed

Software name Component name Version Service name Installation node Notes

rabbitmq rabbitmq-server 3.5.8 rabbitmq-server M

postgresql postgresql-server 9.6.3 postgresql M influxdb influxdb 1.5.4 influxdb M confluent confluent 2.0.2 confluent M slapd-ssl-config 1.0.0 slapd M

nss-pam-ldapd 0.9.9 nslcd M, C, L openldap libuser 0.62 M libuser-python 0.62 M gmond-ohpc-gpu- Required only on gmond gpu plugin 1.0.0 C module the GPU node

Chapter 3. Install LiCO dependencies 25 Install RabbitMQ LiCO uses RabbitMQ as a message broker.

Run the following commands to install RabbitMQ:

# Install RabbitMQ on the management node zypper install -y rabbitmq-server

# Start RabbitMQ service systemctl enable rabbitmq-server systemctl start rabbitmq-server

Install PostgreSQL LiCO uses PostgreSQL as an object-related database for data storage.

Run the following commands to install PostgreSQL:

# Install PostgreSQL on the management node zypper install -y postgresql-server

# Initialization and passwords can be changed as needed. su - postgres echo > /var/tmp/pwfile initdb -U postgres --pwfile /var/tmp/pwfile /var/lib/pgsql/data

# The value for the -U switch on the initdb command ("postgres") is the postgres username.

# Make sure to make note of this, as well as the other usernames and passwords used during this

# setup as they will be needed during the ico-passwd-tool step later in the installation process. rm /var/tmp/pwfile exit

# Starting PostgreSQL systemctl enable postgresql systemctl start postgresql

# Create LiCO database export PGPASSWORD= psql -U postgres -c "CREATE DATABASE lico;"

Install InfluxDB LiCO uses InfluxDB as a time series database for storage monitoring.

Step 1. Run the following commands to install InfluxDB:

26 LiCO 5.2.1 Installation Guide (for SLES) # Install InfluxDB

zypper install influxdb

# Start InfluxDB

systemctl enable influxdb systemctl start influxdb Step 2. Run the following commands to create InfluxDB users:

# Enter the InfluxDB shell

influx

# Create database

create database lico

# Use database

use lico

# To create an administrator user, please note that the password must be a string, otherwise

# the error is reported.

create user with password '' \ with all privileges

# Exit the influxDB shell

exit

# Configuration

sed -i '/auth-enabled = false/a\ auth-enabled = true' /etc/influxdb/config.toml

# Restart InfluxDB

systemctl restart influxdb

Install Confluent Run the following command to install Confluent: zypper install --force-resolution confluent_server systemctl enable confluent systemctl start confluent source /etc/profile.d/confluent_env.sh confetty create /users/ password=

Note: To use the Web console, see “Configure the Confluent Web console” on page 53.

Chapter 3. Install LiCO dependencies 27 Configure user authentication

Install OpenLDAP-server Note: If you have already configured OpenLDAP for the cluster, or another authentication service is being used, skip this section.

OpenLDAP is an open-source version of the lightweight directory access protocol. LiCO recommends using OpenLDAP to manage users; however, it also supports other authentication services compatible with Linux- PAM.

Run the following commands to install OpenLDAP:

# Install OpenLDAP zypper install --force-resolution slapd-ssl-config

# Run the following commands to modify the configuration file sed -i "s/dc=hpc,dc=com/${lico_ldap_domain_name}/" /usr/share/openldap-servers/lico.ldif sed -i "/dc:/s/hpc/${lico_ldap_domain_component}/" /usr/share/openldap-servers/lico.ldif sed -i "s/dc=hpc,dc=com/${lico_ldap_domain_name}/" /etc/openldap/slapd.conf.lico install -g ldap -o ldap /etc/openldap/slapd.conf.lico /etc/openldap/slapd.conf slapadd -v -l /usr/share/openldap-servers/lico.ldif -f /etc/openldap/slapd.conf \ -b ${lico_ldap_domain_name}

# Set password

# Get the key using the following command and enter

# when prompted. slappasswd

# Edit the file /etc/openldap/slapd.conf to cover the contents of the

# rootpw with the key obtained. rootpw chown -R ldap:ldap /var/lib/ldap

# Edit configuration files vi /etc/sysconfig/openldap

# modify to below

OPENLDAP_START_LDAPS="yes" OPENLDAP_CONFIG_BACKEND="files"

# Start OpenLDAP service systemctl enable slapd systemctl start slapd

# Check service systemctl status slapd

28 LiCO 5.2.1 Installation Guide (for SLES) Install libuser The libuser module is a recommended toolkit for OpenLDAP. The installation of this module is optional.

Step 1. Run the following commands to install llibuser: zypper install libuser libuser-python Step 2. Run the following commands to configure libuser: vi /etc/libuser.conf [import] login_defs = /etc/login.defs default_useradd = /etc/default/useradd [defaults] crypt_style = sha512 modules = ldap create_modules = ldap [userdefaults] LU_USERNAME = %n LU_GIDNUMBER = %u LU_GECOS = %n # Pay attention to modify option below LU_HOMEDIRECTORY = /home/%n LU_SHADOWNAME = %n LU_SHADOWMIN = 0 LU_SHADOWMAX = 99999 [groupdefaults] LU_GROUPNAME = %n [files] [shadow] [ldap] # Modify to management node IP server = ldap:// # Pay attention to modify option below # Make sure should be the same with ${lico_ldap_domain_name}defined in lico_ # env.local basedn = userBranch = ou=People groupBranch = ou=Group binddn = uid=admin, bindtype = simple [sasl]

Install OpenLDAP-client Run the following commands to install OpenLDAP-client: echo "TLS_REQCERT never" >> /etc/openldap/ldap.conf xdcp all /etc/openldap/ldap.conf /etc/openldap/ldap.conf

Install nss-pam-Idapd nss-pam-ldapd is a name service switching module and pluggable authentication module. LiCO uses this module for user authentication.

Step 1. Run the following commands to install YaST extension:

# Management node

zypper install yast2-auth-client

Chapter 3. Install LiCO dependencies 29 # other nodes

psh all zypper install -y --force-resolution yast2-auth-client Step 2. Change the IP address for the management node. a. Run the following commands to enter YaST2: yast b. Select Network Services ➙ LDAP and Kerberos on the management node. The LDAP and Kerberos page is displayed. c. Highlight the Computer Name row, and click Change Settings on the lower right of the page. A page is displayed for you to change the IP address, as shown below.

d. Change the IP address for the management mode. e. Click Test Connection and ensure that the IP address passes the test. f. Click OK. Step 3. Run the following commands to install nss-pam-ldapd:

# management node

zypper install --force-resolution nss-pam-ldapd

# other nodes

psh all zypper install -y --force-resolution nss-pam-ldapd

# Configure nslcd

# Modify to management node IP

vi /etc/nslcd.conf uri ldap://

# make sure should be the same with ${lico_ldap_domain_name}defined in lico_

# env.local

30 LiCO 5.2.1 Installation Guide (for SLES) base rootpwmoddn uid=admin,

# Distribute OpenLDAP configuration

xdcp all /etc/nslcd.conf /etc/nslcd.conf xdcp all /etc/nsswitch.conf /etc/nsswitch.conf xdcp all /etc/krb5.conf /etc/krb5.conf xdcp all /etc/pam.d/common-session-pc /etc/pam.d/common-session-pc xdcp all /etc/pam.d/common-password-pc /etc/pam.d/common-password-pc xdcp all /etc/pam.d/common-auth-pc /etc/pam.d/common-auth-pc xdcp all /etc/pam.d/common-account-pc /etc/pam.d/common-account-pc

# start management node service

systemctl start nslcd systemctl enable nslcd

# Start other node services

psh all systemctl stop nscd psh all systemctl disable nscd psh all systemctl enable nslcd psh all systemctl start nslcd

Install Gmond GPU plug-in On all GPU nodes, run the following commands to install Gmond GPU plug-in: psh compute zypper install -y --force-resolution gmond-ohpc-gpu-module psh compute "ls /etc/ganglia/conf.d/*.pyconf|grep -v nvidia|xargs rm"

# Start gmond psh compute systemctl restart gmond

# Run the command to see of all the nodes are listed gstat -a

Chapter 3. Install LiCO dependencies 31 32 LiCO 5.2.1 Installation Guide (for SLES) Chapter 4. Install LiCO

List of LiCO components to be installed Note: In the Installation node column, M stands for “Management node”, L stands for “Login node”, and C stands for “Compute node”.

Table 7. List of LiCO components to be installed

Software name Component name Version Service name Installation node Notes lico-core lico-core 5.2.1 lico M lico-portal lico-portal 5.2.1 M, L

lico-confluent-proxy 1.1.0 M

lico-core-extend lico-vnc-proxy 1.1.0 M

lico-ai-image 1.1.1 M

lico-env 1.1.0 M, C, L lico-env Only for AI lico-ai-expert 1.2.0 C functions Cannot install lico-ganglia- this software if lico-ganglia-mond 1.1.0 M mond you install lico- icinga-mond

Cannot install lico-icinga- this software if lico-icinga-mond 1.0.0 M mond you install lico- ganglia-mond lico monitor Required if you lico-icinga-plugin 1.0.0 M, C, L need to install lico-icinga-mond

lico-confluent- lico-confluent-mond 1.1.0 M mond Required if you lico-vnc-mond 1.1.0 lico-vnc-mond C need to run VNC Required if you lico-sms-agent 1.2.0 lico-sms-agent L need to send alerts via SMS Required if you lico alarm lico-wechat- lico-wechat-agent 1.2.0 L need to send notification agent alerts via WeChat Required if you lico-mail-agent 1.3.0 lico-mail-agent L need to send alerts via e-mails

© Copyright Lenovo 2018, 2019 33 Table 7. List of LiCO components to be installed (continued)

Software name Component name Version Service name Installation node Notes Required if you lico- lico tensorboard lico-tensorboard- need to use 1.0.0 tensorboard- M, L proxy proxy TensorBoard on proxy the job page

Required if you need AI lico ai command lico-ai-cmd 1.0.0 L command line tools

Obtain the LiCO installation package Step 1. Obtain the LiCO release package from Lenovo ESD Web site https:// lenovoesd.flexnetoperations.com/control/lnvo/login.The LiCO 5.2.1 release package for SLES12 is lico-release-5.2.1.bp123.tar.gz. Step 2. Contact Lenovo salesperson for how to subscribe and get ESD authentication. Step 3. Upload the release package to the management node.

Configure the local Zypper repository for LiCO Step 1. Run the following commands to configure a local Zypper repository for the management node: mkdir -p $lico_repo_dir tar zxvf lico-release-5.2.1.bp123.tar.gz -C $lico_repo_dir --strip-components 1 rm -rf $link_lico_repo_dir ln -s $lico_repo_dir $link_lico_repo_dir cd $link_lico_repo_dir ./Makerepo Step 2. Run the following commands to configure a local Zypper repository for other nodes: cp /etc/zypp/repos.d/lico-release.repo /var/tmp sed -i '/baseurl=/d' /var/tmp/lico-release.repo echo "baseurl=http://${sms_name}/${link_lico_repo_dir}"\ >> /var/tmp/lico-release.repo

# Distribute repo files

xdcp all /var/tmp/lico-release.repo /etc/zypp/repos.d

Install LiCO on the management node Step 1. Run the following command to install the LiCO module on the management node: zypper install --force-resolution lico-core lico-confluent-mond lico-confluent-proxy \ lico-ai-expert lico-env lico-ai-image

Note: If there is an error regarding processes using files that have been deleted, restart those processes at this point. Step 2. Perform the following optional steps as required:

34 LiCO 5.2.1 Installation Guide (for SLES) If you need to... Run the following commands: Use Ganglia for LiCO monitoring (provided that Ganglia is installed on the management node) zypper install --force-resolution \

Note: LiCO can be monitored by either lico-ganglia-mond Ganglia or Icinga2. Only one of Ganglia or Icinga2 can be installed. Use Icinga2 for LiCO monitoring (provided that Icinga2 is installed on the management node) zypper install --force-resolution \

lico-icinga-mond lico-icinga-plugin Provide Web service on the management node zypper install --force-resolution lico-portal Use the VNC component zypper install --force-resolution \

lico-vnc-proxy Use TensorBoard service on the management node zypper install --force-resolution \

lico-tensorboard-proxy

Step 3. Run the following command to restart services: systemctl restart confluent

Install LiCO on the login node Step 1. Run the following command to install the LiCO module on the login node: psh login zypper install -y --force-resolution lico-env Step 2. Perform the following steps as required:

If you need to... Run the following commands: Provide Web service on the login node psh login zypper install -y \

--force-resolution lico-portal Use Icinga2 for LiCO monitoring (provided that Icinga2 is installed on the login node) psh login zypper install -y \

--force-resolution lico-icinga-plugin Provide e-mail, SMS, and WeChat services on the login node # Install email module

psh login zypper install -y \

--force-resolution lico-mail-agent

# Install SMS module

Chapter 4. Install LiCO 35 If you need to... Run the following commands:

psh login zypper install -y \

--force-resolution lico-sms-agent

# Install WeChat module

psh login zypper install -y \

--force-resolution lico-wechat-agent Provide AI command line tools on the login node psh login zypper install -y \

--force-resolution lico-ai-cmd Use TensorBoard service on the login node psh login zypper install -y \

--force-resolution lico-tensorboard-proxy Provide a basic compiling environment on the login node psh login zypper install -y \

-t pattern SDK-C-C++

psh login zypper install -y glibc-devel

Note: This is an optional step. To install these packages successfully, you may need an internet-based repository. You can set up the repository according to your network condition.

Install LiCO on the compute nodes Run the following command to install the LiCO module on the compute nodes: psh compute zypper install -y --force-resolution lico-env lico-ai-expert

If Icinga2 is installed on the compute nodes, run the following commands: psh compute zypper install -y --force-resolution lico-icinga-plugin

Note: To use the VNC component, see “Configure VNC” on page 53.

36 LiCO 5.2.1 Installation Guide (for SLES) Chapter 5. Configure LiCO

Configure the service account Note: The username and password of PostgreSQL, InfluxDB, and Confluent are configured in this Guide.

On the management node, use the tool lico-passwd-tool.

If Icinga2 is not installed, follow the prompt below to enter the username and password for PostgreSQL, InfluxDB, and Confluent: lico-passwd-tool

If Icinga2 is installed, follow the prompt below to enter the username and password for PostgreSQL, InfluxDB, Confluent and Icinga2 API: lico-passwd-tool --icinga

Configure cluster nodes Step 1. Run the following command to import the cluster information to the system: cp /etc/lico/nodes.csv.example /etc/lico/nodes.csv Step 2. Run the following command to edit the cluster information file: vi /etc/lico/nodes.csv

Notes: • It is recommended that you download this file to your local PC and edit it using Excel or other table editing software. After that you can upload it to the management node and overwrite the original file. • If you are facing a large cluster that is already been configured by xCAT, see “Edit nodes.csv from xCAT dumping data” on page 63

Room information Below is an example of the room information table.

Table 8. Room information table room name location_description

Shanghai Solution Room Shanghai Zhangjiang

Enter one entry of information for the fields name and location_description.

Logic group information Managers can use logic groups to divide nodes in the cluster into groups. The logic groups do not impact the use of computer resources or permissions configurations.

Below is an example of the logic group information table.

© Copyright Lenovo 2018, 2019 37 Table 9. Logic group information table

group name

login

You need to enter at least one logic group name in the name field.

Room row information Room row refers to the rack order in the room. Enter the information about the rack row where the cluster node is located.

Below is an example of the room row information table.

Table 10. Room row information table row name index belonging_room

row1 1 Shanghai Solution Room

Enter at least one entry of row information in the fields below: • name: row name (must be unique in the same room) • index: row order (must be a positive integer and be unique in the same room) • belonging_room: name of the room where the row belongs

Note: Add this information to the room information table.

Rack information Below is an example of the rack information table.

Table 11. Rack information table rack name column belonging_row

rack1 1 row1

Enter at least one entry of rack information in the fields below: • name: rack name (must be unique in the same room) • column: rack location column (must be a positive integer and be unique in the same row) • belonging_row: name of the row where the rack belongs

Note: Add this information to the row information table.

Chassis information If there is a chassis in the cluster, enter the chassis information.

Below is an example of the chassis information table.

38 LiCO 5.2.1 Installation Guide (for SLES) Table 12. Chassis information table chassis name belonging_rack location_u_in_rack machine_type

chassis1 rack1 7 7X20

The fields in this table are described as follows: • name: chassis name (must be unique in the same room) • belonging_rack: rack location name (must use the name configured in the rack information table) • location_u_in_rack: location of the chassis base in the rack (Unit: U). In a standard cabinet, the value should be between 1 and 42. For example, a chassis base is located at 5U. • machine_type: chassis type (see “Supported servers and chassis models” on page 3)

Node information Enter the information about all nodes in the cluster into the node information table. Due to its width, the example node information table is displayed in two split parts.

Table 13. Node information table (Part 1)

node name nodetype immip hostip machine_ ipmi_user type

head head 10.240.212.13 127.0.0.1 7X58

Table 14. Node information table (Part 2)

ipmi_pwd belonging_ belonging_ belonging_ location_u width height groups service_node rack chassis

The fields are described as follows: • name: node hostname (domain name not needed) • nodetype: head means management node; login means login node; compute means compute node. • immip: IP address of the node’s BMC system • hostip: IP address of the node on the host network • machine_type: product name for the node (see “Supported servers and chassis models” on page 3) • ipmi_user: XCC (BMC) account for the node • ipmi_pwd: XCC (BMC) password for the node • belonging_service_node: large clusters require setting up a service node to which the node belongs. If there is no service node, leave the field blank. • belonging_rack: name of the rack where the node is located (need to add the configured name to the rack information table). If the node belongs to a chassis, leave this field blank. • belonging_chassis: name of the chassis where the node is located (need to add the configured name to the chassis information table). If the node belongs to a rack, leave this field blank. • location_u: node location. If the node is located in the chassis, enter the slot in the chassis in which the node is located. If the node is located in a rack, enter the location of the node base in the rack (Unit: U). • width: node width

Chapter 5. Configure LiCO 39 • height: node height (unit: U) • groups: name of the node location logic group. One node can belong to multiple logic groups. Group names should be separated by “;”. Configure the logic group name in the logic group information table.

40 LiCO 5.2.1 Installation Guide (for SLES) List of cluster services Note: In the Installation node column, M stands for “Management node”, L stands for “Login node”, and C stands for “Compute node”.

Table 15. List of cluster services Installation Software name Component name Service name Default port node lico-core lico 18081/tcp M

lico-ganglia-mond lico-ganglia-mond 8661/tcp,8662/tcp M

lico-confluent-proxy 18081/tcp M lico-confluent- lico-confluent-mond M mond lico-vnc-proxy 18082/tcp,18083/tcp C lico lico-vnc-mond lico-vnc-mond C lico-sms-agent lico-sms-agent 18092/tcp L

lico-wechat-agent lico-wechat-agent 18090/tcp L

lico-mail-agent lico-mail-agent 18091/tcp L lico-tensorboard- lico-tensorboard-proxy 18084/tcp L, M proxy

lico-ai-cmd lico-ai-cmd L ngnix ngnix 80/tcp,443/tcp L, M

rabbitmq rabbitmq-server 5672/tcp M

postgresql postgresql 5432/tcp M

lico dependencies confluent confluent 4005/tcp,13001/tcp M

influxdb influxdb 8086/tcp,8088/tcp M

slapd 389/tcp,636/tcp M ldap nslcd M, C, L

nfs nfs 2049/tcp M

ntp ntpd M munge M, C cluster slurm slurmctld 6817/tcp M

slurmd 6818/tcp C ganglia gmond 8649/tcp,8649/udp M, C, L

Chapter 5. Configure LiCO 41 Infrastructure configuration The following part of the infrastructure configuration in the /etc/lico/lico.ini should be modified:

# Cluster domain settings domain = hpc.com

Database configuration The following parts of the database configuration are modifiable:

# PostgreSQL address db_host = 127.0.0.1

# PostgreSQL port db_port = 5432

# PostgreSQL database name db_name = lico

# InfluxDB address influx_host = 127.0.0.1

# InfluxDB port influx_port = 8086

# InfluxDB database name influx_database = lico

Login configuration The following part of the login configuration should be modified:

# Maximum number of login password error attempts login_fail_max_chance = 3

Attention: If user login failures exceed login_fail_max_chance, the system will suspend this user for 45 minutes. Suspended users cannot log in to the system even with the valid authentication information. Administrators, however, can resume a suspended user with a command line or Web portal. See “Resume a user” on page 54, or refer to LiCO 5.2.1 Administrator Guide.

Storage configuration The following part of the storage configuration should be modified:

# Shared storage directory

42 LiCO 5.2.1 Installation Guide (for SLES) # If strictly adhering to the shared directory configurations in this document, change

# to: share_dir = /home share_dir = /home

Scheduler configuration The following part of the scheduler configuration should be modified:

# The scheduler configuration currently supports Slurm, LSF, and Torque. Slurm is the default. scheduler_software = slurm

Alert configuration Note: The configuration in this section is needed only when WeChat, SMS, and e-mail proxy modules are installed for the cluster. You can obtain from https://mp.weixin.qq.com/wiki?t= resource/res_main&id=mp1445241432. The following part of the alert configuration should be modified:

# WeChat proxy server address wechat_agent_url = http://127.0.0.1:18090

# WeChat notification template ID wechat_template_id =

# SMS proxy server address sms_agent_url = http://127.0.0.1:18092

# Email proxy server address mail_agent_url = http://127.0.0.1:18091

Cluster configuration The following part of the cluster configuration should be modified:

# Confluent port confluent_port = 4005

Functional configuration The following part of the functional configuration should be modified:

[app:django] use = hpc+ai

# Modify the functional module based on the actual module purchased. If you only use the

# HPC module, change to “use = hpc”; if you only using the AI module, change to “use = ai”.

Chapter 5. Configure LiCO 43 # After changing the configuration, “lico init” must be run to refresh the data table.

# This must be done before proceeding to the "User configuration" steps below.

User configuration Note: The configuration must be performed while LiCO is working.

The following part of the user configuration should be modified:

# user use_libuser = false

The default configuration for user_libuser is “false”. To change this value, run the following command to set the ldap password:

# The command prompts you to enter the LDAP administrator password

# Use the LDAP_PASSWORD you configured in “Install OpenLDAP-server”. lico setldappasswd Please input your ldap password: Please confirm the ldap password:

44 LiCO 5.2.1 Installation Guide (for SLES) Configure LiCO components lico-vnc-mond Step 1. Create a file named /var/tmp/vnc-mond.ini and run the following commands: [vnc] url=http://127.0.0.1:18083/session timeout=30

Note: Replace 127.0.0.1 with the actual IP address of the management node. Step 2. Run the following command to distribute the configuration file: xdcp compute /var/tmp/vnc-mond.ini /etc/lico/vnc-mond.ini lico-portal To prevent conflictions, you may need to modify some pathway files for nodes installed with the lico-portal module, which provides external Web services with different ports.

/etc/nginx/nginx.conf You can edit /etc/nginx/nginx.conf by changing the port to 8080: listen 8080 default_server; listen [::]:8080 default_server;

If you want to hide the server version information, modify /etc/nginx/nginx.conf by turning off server_ tokens: http{ ...... sendfile on; server_tokens off; …… }

/etc/nginx/conf.d/https.conf You can edit /etc/nginx/conf.d/https.conf by changing the https default port 443 to other ports: listen ssl http2;

Note: Ensure that the port is not used by other applications and is not blocked by the firewall.

/etc/nginx/conf.d/sites-available/antilles.conf You can edit /etc/nginx/conf.d/sites-available/antilles.conf by replacing the first line to the following content: set $lico_host 127.0.0.1;

Note: If lico-portal does not run on the management node, you can change 127.0.0.1 to the IP address of the management node. set $tensorboard_host 127.0.0.1; Note: If lico-tensorboard-proxy does not run on the management node, you can change 127.0.0.1 to the IP address where the lico-tensorboard-proxy is installed.

Chapter 5. Configure LiCO 45 /etc/lico/portal.conf You can edit /etc/lico/portal.conf by adding custom shortcut links. Refer to /etc/lico/portal.conf.example for the configuration format. lico-ganglia-mond Note: Skip this section if Ganglia is not deployed in the cluster.

On the management node, the default /etc/lico/ganglia-mond.conf file contains the following content: influxdb { cfg_db_host 127.0.0.1 cfg_db_port 5432 cfg_db_name lico }

Make the following changes to the file: • Change cfg_db_host 127.0.0.1 and cfg_db_port 5432 to the actual IP address and port number of the PostgreSQL service. • Change the host 127.0.0.1 and port 8086 to the actual IP address and port number of the InfluxDB service. lico-icinga-mond Note: Skip this section if Icinga2 is not deployed in the cluster.

Edit the file /etc/lico/icinga-mond.ini: • Section [base] Change service = liico to the actual service name defined in Icinga2 for LiCO. • Section [icinga] Change host = 127.0.0.1 and port = 5665 to the actual Icinga2 API service. • Section [postgresql] Change host = 127.0.0.1 and port = 5432 to the actual PostgreSQL service. • Section [influxdb] Change host = 127.0.0.1 and port = 8086 to the actual InfluxDB service.

See the example below:

[base] service = lico sample_interval = 15 domain_filter = [icinga] host = 127.0.0.1 port = 5665 timeout = 30 [postgresql] host = 127.0.0.1 port = 5432 database = lico [influxdb] host = 127.0.0.1 port = 8086 database = lico timeout = 30

46 LiCO 5.2.1 Installation Guide (for SLES) lico-icinga-plugin Note: Skip this section if Icinga2 is not installed in the cluster.

If lico-icinga-plugin is installed in the cluster, Icinga2 should be configured to enable the plugin. Below are examples for how to configure Icinga2: • Define a new command in the command configuration file of Icinga2: object CheckCommand "lico-monitor" { command = [PluginDir + "/lico-icinga-plugin"] arguments = { "-a" = "" } } • Define a new service in the service configuration file of Icinga2: apply Service "lico" { display_name = "lico" check_command = "lico-monitor" assign where host.address }

Notes: • The installation of icinga2 is not covered in this document. • For details about how to define the Icinga2 command and service, refer to https://www.icinga.com/docs/ icinga2/latest/. • The “display_name” of the Icinga2 service must be the same as the “service” in the configuration file of lico-icinga-mond. lico-confluent-proxy The /etc/lico/confluent-proxy.ini file contains the following on the management node with the default port:

[DEFAULT]

# database db_host = 127.0.0.1 db_port = 5432 db_name = lico

Make the following changes to the file: • Change db_host 127.0.0.1 and db_port 5432 to the actual IP address and port number of the PostgreSQL service. • Change the host 127.0.0.1 and port 8086 to the actual IP address and port number of the InfluxDB service.

If there are multiple Confluents in the cluster, you need to configure the [app:main] section as follows:

[app:main] use = cluster-confluent-proxy

Note: If you need to change the information about the Confluent user, see “Install Confluent” on page 27. If you need to create or update the user information, see “Configure the service account” on page 37.

Chapter 5. Configure LiCO 47 lico-confluent-mond The /etc/lico/confluent-mond.ini file contains the following on the management node with the default port:

[database] db_host = 127.0.0.1 db_port = 5432 db_name = lico [influxdb] host = 127.0.0.1 port = 8086 database = lico timeout = 10

Make the following changes to the file: • Change db_host 127.0.0.1 and db_port 5432 to the actual IP address and port number of the PostgreSQL service. • Change the host 127.0.0.1 and port 8086 to the actual IP address and port number of the InfluxDB service. lico-wechat-agent Edit the file /etc/lico/wechat-agent as follows:

# The configurations below should be changed based on the specific environment appid = secret =

Note: For more information about and , refer to https://mp.weixin.qq.com/wiki?t= resource/res_main&id=mp1445241432. lico-tensorboard-proxy You can edit /etc/lico/tensorboard-proxy.ini as follows: • Change the image to the actual TensorFlow CPU image path. tensorflow_cpu_image is one of the system container images. The default path is /home/lico/ai/container/tensorflow-1.9-cpu.image. • To change the random port range of TensorBoard, uncomment the start_port and end_port lines and set the values. The default port range is from 25000 to 3000. • To change the timeout setting for launching TensorBoard, uncomment the timeout line and set the value. The default timeout value is 300 seconds.

The following example shows the commands for editing /etc/lico/tensorboard-proxy.ini:

# tensorflow-cpu image path image = /home/lico/ai/container/tensorflow-1.9-cpu.image

## port range

# start_port = 25000

# end_port = 30000

## time out, 300 sec

48 LiCO 5.2.1 Installation Guide (for SLES) # timeout = 300

Initialize the system Run the following command to initialize LiCO: lico init

Initialize users Step 1. (Optional) Run the following commands to add an LDAP user with username and password: luseradd -P psh all "su - -c whoami" | xcoll

Use the LDAP_PASSWORD you configured in “Install OpenLDAP-server” on page 28. Step 2. Run the following command to import the user to LiCO:

# Import user into LiCO and as admin

lico user_import -u -r admin

Chapter 5. Configure LiCO 49 50 LiCO 5.2.1 Installation Guide (for SLES) Chapter 6. Start and log in to LiCO

Start LiCO Step 1. Run the corresponding commands to start LiCO.

If Ganglia is installed, run the following commands:

# If the management node has to provide web service, start Nginx.

systemctl enable nginx systemctl start nginx

# If the login node has to provide web service, start Nginx.

psh login systemctl enable nginx psh login systemctl start nginx

# Start LiCO-related services

systemctl start lico-ganglia-mond systemctl start lico-confluent-mond

# Start LiCO

systemctl start lico

If Icinga2 is installed, run the following commands:

# If the management node has to provide web service, start Nginx.

systemctl enable nginx systemctl start nginx

# If the login node has to provide web service, start Nginx.

psh login systemctl enable nginx psh login systemctl start nginx

# Start LiCO-related services

systemctl start lico-icinga-mond systemctl start lico-confluent-mond

# Start LiCO

systemctl start lico Step 2. Run the following command to delete the file lico_env.local: rm -rf /root/lico_env.local

© Copyright Lenovo 2018, 2019 51 Log in to LiCO After the LiCO service is started, you can access LiCO by opening https://:/ in a Web browser.

Note: Replace port with the port number you set in /etc/nginx/conf.d/https.conf in “lico-portal” on page 45.

If the installation is correct, the LiCO login page opens. You can log in using the LDAP account set in “Initialize users” on page 49.

Configure LiCO services The LiCO service configuration file is located in /etc/lico/lico.ini and /etc/lico/nodes.csv. This configuration file controls the operating parameters for various LiCO background service components. You can modify this configuration file as needed.

If you have changed the configuration while LiCO is running, the following command must be executed to restart LiCO and make the configuration changes take effect: systemctl restart lico

Note: Configurations not mentioned in the instructions in this section can be modified after consulting with service staff. Modifications made without a service consultation could result in a system failure.

52 LiCO 5.2.1 Installation Guide (for SLES) Chapter 7. Appendix: Important information

Configure VNC Install the VNC module only on compute nodes that require the VNC functionality and GUI.

Standalone VNC installation To install VNC on a compute node, follow the steps in this section.

Step 1. Run the following commands on a compute node where you want to install the VNC function: zypper install gdm tigervnc xorg-x11-Xvnc zypper install lico-vnc-mond Step 2. Edit /etc/gdm/custom.conf on the compute node, and make the following changes: [xdmcp] Enable=true Step 3. Edit /etc/sysconfig/displaymanager on the compute node by making the following changes: DISPLAYMANAGER_REMOTE_ACCESS="yes" Step 4. Run the following commands on the compute node to start VNC: systemctl start lico-vnc-mond vncserver -query localhost -securitytypes=none

VNC batch installation To install VNC on all compute nodes, run the following commands on the management node:

# install psh compute zypper install -y --force-resolution lico-vnc-mond psh compute zypper install -y --force-resolution gdm tigervnc \ xorg-x11-Xvnc

# Distribution profile

# Edit /etc/gdm/custom.conf and make the following changes:

[xdmcp] Enable=true

# Edit /etc/sysconfig/displaymanager and make the following changes:

DISPLAYMANAGER_REMOTE_ACCESS="yes" xdcp compute /etc/gdm/custom.conf /etc/gdm/custom.conf xdcp compute /etc/sysconfig/displaymanager /etc/sysconfig/displaymanager

# start psh compute systemctl start lico-vnc-mond psh compute vncserver -query localhost -securitytypes=none

Configure the Confluent Web console To open a node’s console from LiCO web portal, configure that node first. After the configurations are completed, restart the node before the configurations take effect.

© Copyright Lenovo 2018, 2019 53 Step 1. Edit the file /etc/default/grub by appending the following content to the end of the GRUB_ CMDLINE_LINUX line: console=ttyS0,115200 Step 2. Start the UEFI mode or legacy mode.

To start the UEFI mode, run the following command: grub2-mkconfig -o /boot/efi/EFI/sles/grub.cfg

To start the legacy mode, run the following command: grub2-mkconfig -o /boot/grub2/grub.cfg

LiCO commands

Set the LDAP administrator password Run the following command to set the LDAP administrator password: lico setldappasswd Please input your ldap password: Please confirm the ldap password:

Note: This command works only when use_libuser equals “true” in the file lico.ini.

Change a user’s role Run the following commands to change a user’s role: lico user_changerole -u -r admin Parameter interpretation: -u Specify the username to modify -r Specify the role to be set (admin/operator/user)

Resume a user Run the following command to resume a user: lico user_resume Parameter interpretation: Directly specify users who need to be resumed

Import a user See “Initialize users” on page 49.

Import AI images See “Import system images” on page 59.

Security improvement

Binding settings If you have installed LiCO following this document, the listening ports of some components will be bound on all addresses by default. To improve the system security level, we recommend that you retain or change the default settings.

54 LiCO 5.2.1 Installation Guide (for SLES) RabbitMQ It is recommended to bind this component to the loopback address 127.0.0.1.

Edit /etc/rabbitmq/rabbitmq.config according to the following content:

[ { rabbit, [ { tcp_listeners, [{"127.0.0.1", 5672}] } ] } ]

PostgreSQL It is bound to the loopback address 127.0.0.1 by default. It is recommended to retain the default setting.

Confluent It it bound to the loopback address 127.0.0.1 by default. It is recommended to retain the default setting.

InfluxDB It is recommended to bind this component to the loopback address 127.0.0.1.

Edit /etc/influxdb/config.toml. For example, uncomment the line #bind-address=”:8086” in the [http] part and change it to bind-address=”127.0.0.1:8086”:

[http]

# Determines whether HTTP endpoint is enabled.

# enabled = true

# The bind address used by the HTTP service. bind-address = "127.0.0.1:8086" lico-core It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node with lico-core, it is recommended to bind this component to the loopback address.

Edit /etc/lico/supervisor.d/antilles.ini by changing the parameter of command in program Antilles. For example, change “--bind:18080” to “--bind :18080”:

[program:confluent_proxy] command=/usr/bin/gunicorn --paste /etc/lico/confluent-proxy.ini --bind 172.20.0.14:18080 --log-config /etc/lico/confluent-proxy.ini --workers 1 --threads 50 --timeout 3600 --worker-class gevent --keep-alive 65 --log-level info --access-logfile - --error-logfile - --capture-output lico-ganglia-mond It is bound to the loopback address 127.0.0.1 by default. It is recommended to retain the default setting.

Chapter 7. Appendix: Important information 55 lico-confluent-proxy It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node with lico-confluent-proxy, it is recommended to bind this component to the loopback address.

Edit /etc/lico/supervisor.d/confluent-proxy.ini by changing the parameter of command in the confluent- proxy program. For example, change “--bind:18081” to “--bind :18081”:

[program:confluent_proxy] command=/usr/bin/gunicorn --paste /etc/lico/confluent-proxy.ini --bind 172.20.0.14:18081 --log-config /etc/lico/confluent-proxy.ini --workers 1 --threads 50 --timeout 3600 --worker-class gevent --keep-alive 65 --log-level info --access-logfile - --error-logfile - --capture-output lico-vnc-proxy It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node with lico-vnc-proxy, it is recommended to bind this component to the loopback address.

Edit /etc/lico/supervisor.d/vnc-proxy.ini by changing the parameter of command in the confluent-proxy program. For example, change “--bind:18081” to “--bind :18081”, and change the IP address in the websockify parameter “--token-source” to :

[program:vncproxy] command=/usr/bin/gunicorn --paste /etc/lico/vnc-proxy.ini –bind 172.20.0.14:18083 --log-config /etc/lico/vnc-proxy.ini--workers 1 --timeout 3600 --worker-class gevent --keep-alive 65 --log-level info --access-logfile - --error-logfile - --capture-output …… [program:websockify] command=/usr/bin/websockify 18082 --token-plugin=JSONTokenApi --token-source='http://172.20.0.14:18083/lookup?token=%s' lico-wechat-agent It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node as lico-wechat-agent, it is recommended to bind this component to the loopback address.

Edit /etc/sysconfig/lico-wechat-agent by changing the GUNICORN_CMD_ARGS. For example, change “--bind:18090” to “--bind :18090”:

# lico-wechat-agent environment file

GUNICORN_CMD_ARGS= \ --bind 172.20.0.14:18090 \ --log-config /etc/lico/wechat-agent.ini \ --workers 1 \ --threads 4 \ --worker-class gevent \ --timeout 3600 \ --keep-alive 65 \ --log-level info \ --access-logfile - \ --error-logfile - \ --capture-output True

56 LiCO 5.2.1 Installation Guide (for SLES) lico-mail-agent It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node with lico-mail-agent, it is recommended to bind this component to the loopback address.

Edit /etc/sysconfig/lico-mail-agent by changing the GUNICORN_CMD_ARGS. For example, change “--bind:18091” to “--bind :18091”:

# lico-mail-agent environment file

GUNICORN_CMD_ARGS= \ --bind 172.20.0.14:18091 \ --log-config /etc/lico/mail-agent.ini \ --workers 1 \ --threads 4 \ --worker-class gevent \ --timeout 3600 \ --keep-alive 65 \ --log-level info \ --access-logfile - \ --error-logfile - \ --capture-output True lico-sms-agent It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node with lico-mail-agent, it is recommended to bind this component to the loopback address.

Edit /etc/sysconfig/lico-sms-agent by changing the GUNICORN_CMD_ARGS. For example, change “--bind:18092” to “--bind :18092”:

# lico-sms-agent environment file

GUNICORN_CMD_ARGS= \ --bind 172.20.0.14:18092 \ --log-config /etc/lico/sms-agent.ini \ --workers 1 \ --timeout 3600 \ --keep-alive 65 \ --log-level info \ --access-logfile - \ --error-logfile - \ --capture-output True lico-tensorboard-proxy It is recommended to bind this component to the internal address. If there is no login node in the cluster, and lico-portal is installed on the same node as lico-tensorboard-proxy, it is recommended to bind this component to the loopback address.

Edit /etc/sysconfig/lico-tensorboard-proxy by changing the GUNICORN_CMD_ARGS. Change “--bind:18084” to “--bind :18084”. See the following example:

# lico-tensorboard-proxy environment file

GUNICORN_CMD_ARGS= \ --bind 172.20.0.14:18084 \ --log-config /etc/lico/tensorboard-proxy.ini \ --workers 1 \ --threads 4 \

Chapter 7. Appendix: Important information 57 --worker-class gevent \ --timeout 3600 \ --keep-alive 65 \ --log-level info \ --access-logfile - \ --error-logfile - \ --capture-output True

Firewall settings If you have set up the cluster and installed LiCO following this document, you can set up your firewall referring to the official firewall setup document: https://www.suse.com/documentation/sles-12/book_security/ data/sec_security_firewall_suse.html.

Considering the security of the system, we recommend that you enable the firewall on the management node and the login node.

Step 1. Run the following command to run YaST2: yast Step 2. Select Security and Users ➙ Firewall, and then following different steps for the management node and the login node.

Set firewall on the management node Step 1. Click Allowed Services, and then set Allowed Services for Selected Zone to External Zone. Step 2. Click Advanced. A dialog is displayed. Step 3. In the displayed dialog, do the following settings: • Set TCP Ports to 25000:30000 2049 22 80 389 636 6817 18080 18081 18084. • TCP/6817: Slurm slurmctld portSet UDP Ports to 8649.

Notes: • TCP/22: SSH service port • TCP/80: httpd service port • TCP/2049: NFS service port • TCP/6817: Slurm slurmctld port • TCP/636, 389: OpenLDAP slapd port • TCP/18081: lico-confluent-proxy port • TCP/18080: lico-core port • TCP/18084: lico-tensorboard-proxy port • TCP/25000:30000: TensorBoard random binding port range • UDP/8649: Ganglia gmond port Step 4. Select Interfaces, and set eth0 and eth1 to use External Zone.

Note: eth0 and eth1 should be your internal and external network interface. Step 5. Select Start-Up, set Service Start to Enable Firewall Automatic Starting, and click Save Settings and Restart Firewall Now. Step 6. Exit YaST.

58 LiCO 5.2.1 Installation Guide (for SLES) Set firewall on the login node Step 1. Click Allowed Services, and then set Allowed Services for Selected Zone to External Zone. Step 2. Click Advanced. A dialog is displayed. Step 3. In the displayed dialog, set TCP Ports to 25000:30000 8443 22 18084.

Notes: • TCP/22: SSH service port • TCP/8443: Nginx service port • TCP/18084: lico-tensorboard-proxy port • TCP/25000:30000: TensorBoard random binding port range Step 4. Select Interfaces, and set eth0 and eth1 to use External Zone.

Note: eth0 and eth1 should be your internal and external network interface. Step 5. Select Start-Up, set Service Start to Enable Firewall Automatic Starting, and click Save Settings and Restart Firewall Now. Step 6. Exit YaST.

SSHD settings If you want to prevent the security vulnerabilities of system, or if there are some security issues with SSHD, we recommend you to change the default settings.

Improve SSHD security Step 1. Open /etc/ssh/sshd_config, and input the following:

Ciphers aes128-ctr,aes192-ctr,aes256-ctr Step 2. Open /etc/ssh/sshd_config, and input the following: MACs [email protected],[email protected],[email protected],hmac-sha2-512,hmac-sha2-256,hmac-ripemd160,hmac-sha1 Step 3. Run the following command to restart SSHD: systemctl restart sshd

Import system images System-level container images are used by all users in the cluster. You can obtain images from your salesperson or import images into LiCO as system-level container images.

This section describes how to create and import system-level container images.

Chapter 7. Appendix: Important information 59 Image bootstrap files LiCO is released with image bootstrap files for commonly-used AI frameworks. These files are located in /opt/lico/examples/image/ on the management node. Users can use these files to create images.

The table below lists the image bootstrap files.

Table 16. List of image bootstrap files

Framework File name Framework CPU/GPU Comments version caffe-1.0-cpu Caffe 1.0 CPU

Supports P100 and V100. caffe-1.0-gpu-cuda91 Caffe 1.0 CUDA 9.1 Caffe does not support CUDA 9.0 officially, so we changed the make file of Caffe.

tensorflow-1.9-cpu TensorFlow 1.9 CPU TensorFlow official image.

tensorflow-1.9-gpu- Supports P100 and V100. TensorFlow 1.9 CUDA 9.0 cuda90 TensorFlow official image.

neon-2.6-cpu Neon 2.6 CPU

intel-caffe-1.0.4-cpu Intel-Caffe 1.0.4 CPU

mxnet-1.2.1-cpu MXNet 1.2.1 CPU

Supports P100 and V100. mxnet-1.2.1-gpu- MXNet 1.2.1 CUDA 9.0 cuda90 MXNet does not support CUDA 9.1 officially, so we use CUDA 9.0.

Supports P100 and V100. letrain-1.2-gpu-cuda90 LeTrain 1.2 CUDA 9.0 Image for LeTrain training

Create images Step 1. Check to ensure that image bootstrap files are under /opt/lico/examples/image/ on the management node. Step 2. Prepare a build node with a minimum storage of 100 GB.This node should have the same OS version and Singularity version with the nodes in the cluster. If you want to create GPU images, this node should have the same GPU and GPU driver as nodes in the cluster. Step 3. Ensure that squashfs and libarchive are installed. Step 4. Copy these image bootstrap files from the management node to the build node. For example, the image bootstrap files are copied to the new directory /opt/images. If this folder cannot be found, create it manually. Note that both this new directory and /var/tmp cannot be an NFS mount. Step 5. Run the following commands to create images: cd /opt/images/ singularity build caffe-1.0-cpu.image caffe/caffe-1.0-cpu singularity build caffe-1.0-gpu-cuda91.image caffe/caffe-1.0-gpu-cuda91 singularity build tensorflow-1.9-cpu.image tensorflow/tensorflow-1.9-cpu singularity build tensorflow-1.9-gpu-cuda90.image tensorflow/tensorflow-1.9-gpu-cuda90 singularity build mxnet-1.2.1-cpu.image mxnet/mxnet-1.2.1-cpu

60 LiCO 5.2.1 Installation Guide (for SLES) singularity build mxnet-1.2.1-gpu-cuda90.image mxnet/mxnet-1.2.1-gpu-cuda90 singularity build intel-caffe-1.0.4-cpu.image intel-caffe/intel-caffe-1.0.4-cpu singularity build neon-2.6-cpu.image neon/neon-2.6-cpu

# Building CUDA application on making image is not supported by Singularity 2.5.2.

# Run the following command on CentOS/RHEL 7.4 (with CUDA driver). This limitation

# will be fixed in next release.

singularity build letrain-1.2-gpu-cuda90.image letrain/letrain-1.2-gpu-cuda90

Import images into LiCO as system-level images Step 1. Copy the created images to the management node. For example, the images are copied to directory /opt/images. Note that neither this directory nor /var/tmp can be an NFS mount. Step 2. Run the following commands as the root user to import images into LiCO: cd /opt/images lico import_system_image caffe-cpu $PWD/caffe-1.0-cpu.image singularity caffe lico import_system_image caffe-gpu $PWD/caffe-1.0-gpu-cuda91.image singularity caffe lico import_system_image tensorflow-cpu $PWD/tensorflow-1.9-cpu.image \ singularity tensorflow lico import_system_image tensorflow-gpu $PWD/tensorflow-1.9-gpu-cuda90.image \ singularity tensorflow lico import_system_image mxnet-cpu $PWD/mxnet-1.2.1-cpu.image singularity mxnet lico import_system_image mxnet-gpu $PWD/mxnet-1.2.1-gpu-cuda90.image \ singularity mxnet lico import_system_image intel-caffe $PWD/intel-caffe-1.0.4-cpu.image \ singularity intel-caffe lico import_system_image neon $PWD/neon-2.6-cpu.image singularity neon lico import_system_image letrain $PWD/letrain-1.2-gpu-cuda90.image singularity letrain

Slurm issues troubleshooting This section lists the solutions to some Slurm issues.

Node status check You can use the Slurm command siinfo to check the node status. • If the status is drain, you can use the following command to change the node status to normal: scontrol update NodeName=host1 State=RESUME • If the node status is down, you are recommended to do the following: – Use the Slurm command scontrol show nodes to view the detailed node information and view the reason in the output of this command. – Check to ensure that all nodes have the same slurm.conf file under /etc/slurm. – Check to ensure that the slurmd and munge services are active on all the nodes, and that the slurmctld service is active on the management node. – Check to ensure that all nodes have the same date and that the ntpd service is active on all nodes.

Memory allocation error When you use srun or prun to run an MPI program, you may encounter the following error message:

Chapter 7. Appendix: Important information 61 Failed to create a completion queue (CQ): …… Error: Cannot allocate memory

In this case, check whether soft memlock and hard memlock are unlimited in the file /etc/security/limits. conf on the management node and the compute nodes.

If the memlocks are limited, run the following commands to set them as unlimited and then restart the nodes before the changes take effect:

* soft memlock unlimited * hard memlock unlimited

Status setting error If you set the slurm queue node status to “DOWN”, but the status is automatically changed to “IDLE”, you can edit slurm.conf file in /etc/slurm, and set the ReturnToService to 0: ReturnToService=0.

InfiniBand issues troubleshooting To install the InfiniBand driver, run the following command on the management node to change file /root/. rpmmacros. If necessary, back up this file. psh all "sed -i 's/%_excludedocs 1/%_excludedocs 0/' /root/.rpmmacros"

Installation issues troubleshooting When you are installing the NVIDIA driver and CUDA, the following error might be reported: nvidia-modeset: Loading NVIDIA Kernel Driver for UNIX platforms 396.26 Spectre V2 : System may be vulnerable to spectre v2 nvidia_drm: loading module not compiled with retpoline compiler

In this case, upgrade the kernel, kernel-devel, gcc and gcc=c++ packages per the “Update OS packages” on page 63 section below. Then repeat the “Install the GPU driver” on page 15 step on page 16 and the “Install CUDA” on page 16 step on page 17. Please remove the RPMs and non-RPM-controller files from the previous GPU driver and CUDA first.

When you are install ganglia-gmond-ohpc, the following error might be reported:

(3/3) Installing: ganglia-gmond-ohpc-3.7.2-25.1.x86_64 ...... [done] Additional rpm output: /usr/sbin/systemd-sysv-convert: line 62: /var/lib/systemd/sysv-convert/database: No such file or directory

This will not cause any functional issue. You can ignore the error.

When you are install confluent, the following error might be reported:

The following package is going to change vendor: python-pyasn1 SUSE LLC obs://lico/lico:dep:SLE:Factory

This will not cause any functional issue. You can ignore the error.

When you are running makenetworks, the following error might be reported:

Warning: The network entry '10_16_0_0-255_255_0_0' already exists in xCAT networks table. Cannot create a definition for '10_16_0_0-255_255_0_0' ......

62 LiCO 5.2.1 Installation Guide (for SLES) This will not cause any functional issue. You can ignore the error.

When you are running makedhcp -n, the following error might be reported:

Warning: No dynamic range specified for 10.16.0.0. If hardware discovery is being used, a dynamic range is required ……

This will not cause any functional issue. You can ignore the error.

XCAT issues troubleshooting When you have problems in using XCAT, go to: https://hpc.lenovo.com/users/documentation/

Update OS packages Step 1. Subscribe to SUSE to obtain a package repository. Step 2. Upload the repository to /install/custom on the management node. Step 3. Run the following command on the management node (assuming that the directory of this repository is update): cat << eof > /etc/zypp/repos.d/sles12_3_update.repo [sles12_3_update] name= sles12_3_update baseurl= http://${sms_name}/install/custom/update gpgcheck=0 enabled=1 eof xdcp all /etc/zypp/repos.d/sles12_3_update.repo /etc/zypp/repos.d/sles12_3_update.repo zypper -y dup psh all zypper -y dup

Edit nodes.csv from xCAT dumping data When installing LiCO on a large cluster that has been configured in xCAT, you can run the following commands to edit the nodes.csv file:

# Dump nodes information from xCAT tabdump –f nodelist

# Dump BMC information from xCAT tabdump –f ipmi

# Dump MAC information from xCAT

Tabdump –f mac

Note: The dumping files are in CSV format. You can easily edit these files and merge them into your nodes. csv file using Excel or any other table editor.

Chapter 7. Appendix: Important information 63 Notices and trademarks

Notices Lenovo may not offer the products, services, or features discussed in this document in all countries. Consult your local Lenovo representative for information on the products and services currently available in your area. Any reference to a Lenovo product, program, or service is not intended to state or imply that only that Lenovo product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any Lenovo intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any other product, program, or service.

Lenovo may have patents or pending patent programs covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: Lenovo (United States), Inc. 1009 Think Place - Building One Morrisville, NC 27560 U.S.A. Attention: Lenovo Director of Licensing

LENOVO PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

Changes are made periodically to the information herein; these changes will be incorporated in new editions of the publication. To provide better service, Lenovo reserves the right to improve and/or modify the products and software programs described in the manuals included with your computer, and the content of the manual, at any time without additional notice.

The software interface and function and hardware configuration described in the manuals included with your computer might not match exactly the actual configuration of the computer that you purchase. For the configuration of the product, refer to the related contract (if any) or product packing list, or consult the distributor for the product sales. Lenovo may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

The products described in this document are not intended for use in implantation or other life support applications where malfunction may result in injury or death to persons. The information contained in this document does not affect or change Lenovo product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of Lenovo or third parties. All information contained in this document was obtained in specific environments and is presented as an illustration. The result obtained in other operating environments may vary.

Lenovo may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any references in this publication to non-Lenovo Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this Lenovo product, and use of those Web sites is at your own risk.

Any performance data contained herein was determined in a controlled environment. Therefore, the result obtained in other operating environments may vary significantly. Some measurements may have been made

64 LiCO 5.2.1 Installation Guide (for SLES) on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

This document is copyrighted by Lenovo and is not covered by any open source license, including any Linux agreement(s) which may accompany software included with this product. Lenovo may update this document at any time without notice.

For the latest information or any questions or comments, contact or visit the Lenovo Web site: https://support.lenovo.com

Trademarks LENOVO, LENOVO logo, THINKPAD, THINKPAD logo, TRACKPOINT, ULTRACONNECT, and Yoga are trademarks of Lenovo. Microsoft, Windows, , BitLocker, and Cortana are trademarks of the Microsoft group of companies. is a registered trademark of Canonical Ltd. The terms HDMI and HDMI High-Definition Multimedia Interface are trademarks or registered trademarks of HDMI Licensing LLC in the United States and other countries. Wi-Fi, Wi-Fi Alliance, and Miracast are registered trademarks of Wi- Fi Alliance. USB-C is a trademark of USB Implementers Forum. All other trademarks are the property of their respective owners. © 2018 Lenovo.

Chapter 7. Appendix: Important information 65