Xcat3 Documentation Release 2.11

xCAT3 Documentation Release 2.11 IBM Corporation Mar 09, 2017 Contents 1 Table of Contents 3 1.1 Overview.................................................3 1.2 Install Guides............................................... 15 1.3 Admin Guide............................................... 27 1.4 Advanced Topics............................................. 702 1.5 Developers................................................ 796 1.6 Need Help................................................ 811 i ii xCAT3 Documentation, Release 2.11 xCAT stands for Extreme Cloud/Cluster Administration Toolkit. xCAT offers complete management of clouds, clusters, HPC, grids, datacenters, renderfarms, online gaming infras- tructure, and whatever tomorrows next buzzword may be. xCAT enables the administrator to: 1. Discover the hardware servers 2. Execute remote system management 3. Provision operating systems on physical or virtual machines 4. Provision machines in Diskful (stateful) and Diskless (stateless) 5. Install and configure user applications 6. Parallel system management 7. Integrate xCAT in Cloud You’ve reached xCAT documentation site, The main page product page is http://xcat.org xCAT is an open source project hosted on GitHub. Go to GitHub to view the source, open issues, ask questions, and particpate in the project. Enjoy! Contents 1 xCAT3 Documentation, Release 2.11 2 Contents CHAPTER 1 Table of Contents Overview xCAT enables you to easily manage large number of servers for any type of technical computing workload. xCAT is known for exceptional scaling, wide variety of supported hardware and operating systems, virtualization platforms, and complete “day0” setup capabilities. Differentiators • xCAT Scales Beyond all IT budgets, up to 100,000s of nodes with distributed architecture. • Open Source Eclipse Public License. Support contracts are also available, please contact IBM. • Supports Multiple Operating Systems RHEL, SLES, Ubuntu, Debian, CentOS, Fedora, Scientific Linux, Oracle Linux, Windows, Esxi, RHEV, and more! • Support Multiple Hardware IBM Power, IBM Power LE, x86_64 • Support Multiple Virtulization Infrastructures IBM PowerKVM, KVM, IBM zVM, ESXI, XEN • Support Multiple Installation Options Diskful (Install to Hard Disk), Diskless (Runs in memory), Cloning • Built in Automatic discovery No need to power on one machine at a time for discovery. Nodes that fail can be replaced and back in action simply by powering the new one on. 3 xCAT3 Documentation, Release 2.11 • RestFUL API Provides a Rest API interface for the third-party software to integrate with Features 1. Discover the hardware servers • Manually define • MTMS-based discovery • Switch-based discovery • Sequential-based discovery 2. Execute remote system management against the discovered server • Remote power control • Remote console support • Remote inventory/vitals information query • Remote event log query 3. Provision Operating Systems on physical (Bare-metal) or virtual machines • RHEL • SLES • Ubuntu • Debian • Fedora • CentOS • Scientific Linux • Oracle Linux • PowerKVM • Esxi • RHEV • Windows • AIX 4. Provision machines in • Diskful (Scripted install, Clone) • Stateless 5. Install and configure user applications • During OS install • After the OS install • HPC products - GPFS, Parallel Environment, LSF, compilers ... • Big Data - Hadoop, Symphony 4 Chapter 1. Table of Contents xCAT3 Documentation, Release 2.11 • Cloud - Openstack, Chef 6. Parallel system management • Parallel shell (Run shell command against nodes in parallel) • Parallel copy • Parallel ping 7. Integrate xCAT in Cloud * Openstack * SoftLayer Operating System & Hardware Support Matrix Power Power LE zVM Power KVM x86_64 x86_64 KVM x86_64 Esxi RHEL yes yes yes yes yes yes yes SLES yes yes yes yes yes yes yes Ubuntu no yes no yes yes yes yes CentOS no no no no yes yes yes AIX yes no no no no no no Windows no no no no yes yes yes Architecture The following diagram shows the basic structure of xCAT: 1.1. Overview 5 xCAT3 Documentation, Release 2.11 xCAT Management Node (xCAT Mgmt Node): The server where xCAT software is installed and used as the single point to perform system management over the entire cluster. On this node, a database is configured to store the xCAT node definitions. Network services (dhcp, tftp, http, etc) are enabled to respond in Operating system deployment. Service Node: One or more defined “slave” servers operating under the Management Node to assist in system management to reduce the load (cpu, network badnwidth) when using a single Management Node. This concept is necessary when managing very large clusters. Compute Node: The compute nodes are the target servers which xCAT is managing. Network Services (dhcp, tftp, http,etc): The various network services necessary to perform Operating System deployment over the network. xCAT will bring up and configure the network services automatically without any intervention from the System Administrator. Service Processor (SP): A module embedded in the hardware server used to perform the out-of-band hardware control. (e.g. Integrated Management Module (IMM), Flexible Service Processor (FSP), Baseboard Management Controller (BMC), etc) Management network: The network used by the Management Node (or Service Node) to install operating systems and manage the nodes. The Management Node and in-band Network Interface Card (NIC) of the nodes are 6 Chapter 1. Table of Contents xCAT3 Documentation, Release 2.11 connected to this network. If you have a large cluster utilizing Service Nodes, sometimes this network is segregated into separate VLANs for each Service Node. Service network: The network used by the Management Node (or Service Node) to control the nodes using out-of- band management using the Service Processor. If the Service Processor is configured in shared mode (meaning the NIC of the Service process is used for the SP and the host), then this network can be combined with the management network. Application network: The network used by the applications on the Compute nodes to communicate among each other. Site (Public) network: The network used by users to access the Management Nodes or access the Compute Nodes directly. RestAPIs: The RestAPI interface can be used by the third-party application to integrate with xCAT. Quick Start Guide If xCAT looks suitable for your requirement, following steps are recommended procedure to set up an xCAT cluster. 1. Find a server as your xCAT management node The server can be a bare-metal server or a virtual machine. The major factor for selecting a server is the number of machines in your cluster. The bigger the cluster is, the performance of server need to be better. NOTE: The architecture of xCAT management node is recommended to be same as the target compute node in the cluster. 2. Install xCAT on your selected server The server which installed xCAT will be the xCAT Management Node. Refer to the doc: xCAT Install Guide to learn how to install xCAT on a server. 3. Start to use xCAT management node Refer to the doc: xCAT Admin Guide. 4. Discover target nodes in the cluster You have to define the target nodes in the xCAT database before managing them. For a small cluster (less than 5), you can collect the information of target nodes one by one and then define them manually through mkdef command. For a bigger cluster, you can use the automatic method to discover the target nodes. The discovered nodes will be defined to xCAT database. You can use lsdef to display them. Refer to the doc: xCAT discovery Guide to learn how to discover and define compute nodes. 5. Try to perform the hardware control against the target nodes Now you have the node definition. Verify the hardware control for defined nodes is working. e.g. rpower <node> stat. Refer to the doc: Hardware Management to learn how to perform the remote hardware control. 6. Deploy OS on the target nodes • Prepare the OS images • Customize the OS images (Optional) • Perform the OS deployment Refer to the doc: Diskful Install, Diskless Install to learn how to deploy OS for a target node. 1.1. Overview 7 xCAT3 Documentation, Release 2.11 7. Update the OS after the deployment You may require to update the OS of certain target nodes after the OS deployment, try the updatenode command. updatenode command can execute the following tasks for target nodes: • Install additional software/application for the target nodes • Sync some files to the target nodes • Run some postscript for the target nodes Refer to the doc: Updatenode to learn how to use updatenode command. 8. Run parallel commands When managing a cluster with hundreds or thousands of nodes, operating on many nodes in parallel might be necessary. xCAT has some parallel commands for that. • Parallel shell • Parallel copy • Parallel ping Refer to the Parallel Commands to learn how to use parallel commands. 1. Contribute to xCAT (Optional) While using xCAT, if you find something (code, documentation, ...) that can be improved and you want to contribute that to xCAT, please do that for your and other xCAT users benefit. And welcome to xCAT community! Refer to the Developers to learn how to contribute to xCAT community. xCAT2 Release Information The following table is a summary of the new operating system (OS), hardware, and features that are added to each xCAT release. The OS and hardware listed in the table have been fully tested with xCAT. For a more detailed list of new function, bug fixes, restrictions and known problems, refer to the individual release notes for a specific release. • RHEL - Red Hat Enterprise Linux • SLES - Suse Linux Enterprise Server • UBT - Ubuntu 8 Chapter 1. Table of Contents xCAT3 Documentation, Release 2.11 xCAT 2.11.x xCAT Version New OS New Hardware New Feature • RHEL 7.2 LE • S822LC(GCA) • NVIDIA GPU for xCAT 2.11 • UBT 14.4.3 LE • S822LC(GTA) OpenPOWER 2015/12/11 • UBT 15.10 LE • S812LC • Infiniband for 2.11 Release Notes • PowerKVM 3.1 • NeuCloud OP OpenPOWER • ZoomNet RP • SW KIT support for OpenPOWER • renergy command for OpenPOWER • rflash command for OpenPower • Add xCAT Trou- bleshooting Log • xCAT Log Classifi- cation • RAID Configura- tion • Accelerate genim- age process • Add bmcdiscover Command • Enhance xcatde- bugmode • new xCAT doc in ReadTheDocs 1.1.

Xcat3 Documentation Release 2.11

Loadleveler: Using and Administering

Installation Procedures for Clusters

Bare Metal Provisioning to Openstack Using Xcat

Xcat 2 Cookbook for Linux 8/7/2008 (Valid for Both Xcat 2.0.X and Pre-Release 2.1)

Torqueadminguide-5.1.0.Pdf

Magellan Final Report

Xcat3 Documentation Release 2.16.3

Xcat 2 Extreme Cloud Administration Toolkit

Lico 6.0.0 Installation Guide (For SLES) Seventh Edition (August 2020)

CSM to Xcat Transition On

Lico 6.1.0 Installation Guide (For SLES) Eighth Edition (Decmeber 2020)

HPC Solution on IBM POWER8