HPE Insight Cluster Management Utility v8.0 User Guide

Abstract This guide describes how to install, configure, and use HPE Insight Cluster Management Utility (Insight CMU) v8.0 on Hewlett Packard Enterprise systems. Insight CMU is dedicated to the administration of HPC and large clusters. This guide is intended primarily for administrators who install and manage a large collection of systems. This document assumes you have access to the documentation that comes with the hardware platform where the Insight CMU cluster will be installed, and you are familiar with installing and administering Linux operating systems.

Part Number: 5900-4408 Published: April 2016 Edition: 1 © Copyright 2016 Hewlett Packard Enterprise Development LP The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein.

Confidential computer software. Valid license from Hewlett Packard Enterprise required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Links to third-party websites take you outside the Hewlett Packard Enterprise website. Hewlett Packard Enterprise has no control over and is not responsible for information outside the Hewlett Packard Enterprise website.

Oracle Java® are registered trademarks of Oracle and/or its affiliates. ® and Windows® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Linux® is the registered trademark of Linus Torvalds in the U.S and other countries. Red Hat® and RPM® are trademarks of Red Hat, Inc. in the United States and other countries. ARM® is a registered trademark of ARM Limited. Contents

1 Overview...... 11 1.1 Features...... 11 1.1.1 Compute node monitoring...... 11 1.1.2 Insight CMU configuration...... 11 1.1.3 Compute node administration...... 12 1.1.4 System disk replication...... 12 2 Installing and upgrading Insight CMU...... 13 2.1 Installing Insight CMU...... 13 2.1.1 Management node hardware requirements...... 13 2.1.2 Disk space requirements...... 14 2.1.3 Support for non-Hewlett Packard Enterprise servers...... 14 2.1.4 Planning for compute node installation...... 15 2.1.5 Configuring the local Smart Array card...... 15 2.1.6 Configuring the management cards...... 15 2.1.7 Configuring the BIOS...... 16 2.1.7.1 DL3xx, DL5xx, DL7xx, Blades...... 16 2.1.7.1.1 OA IP address: Blades only...... 16 2.1.7.1.2 Configuring iLO cards from the OA: Blades only...... 17 2.1.7.1.3 Disabling automatic power on: Blades only...... 17 2.1.7.2 DL160 G5, DL165c G5, DL165c G6, and DL180 G5 Servers...... 17 2.1.7.3 DL160 G6 Servers...... 18 2.1.7.4 SL2x170z G6 and DL170h G6 Servers BIOS setting...... 19 2.2 Preparing for installation...... 20 2.2.1 Insight CMU kit delivery...... 20 2.2.2 Preinstallation limitations...... 20 2.2.3 support...... 21 2.2.3.1 RHEL 7 support...... 21 2.2.4 Insight CMU DVD directory structure...... 22 2.2.5 Insight CMU installation checklist...... 22 2.2.6 Login privileges...... 23 2.2.7 SELinux and Insight CMU...... 23 2.3 Installation procedures...... 23 2.4 Installing Insight CMU with high availability...... 28 2.4.1 HA hardware requirements...... 30 2.4.2 Software prerequisites...... 31 2.4.3 Installing Insight CMU under HA...... 31 2.4.3.1 Overview ...... 31 2.4.3.2 Insight CMU HA service requirements...... 31 2.4.3.3 Installing and testing...... 31 2.4.4 Configuring HA control of Insight CMU...... 32 2.4.5 Insight CMU configuration considerations...... 35 2.4.6 Upgrading Insight CMU HA service...... 36 2.5 Upgrading Insight CMU...... 37 2.5.1 Upgrading to v8.0 important information...... 38 2.5.2 Dependencies...... 39 2.5.2.1 64-bit versions on management node...... 39 2.5.2.2 tftp ...... 39 2.5.2.3 Java version dependency...... 39 2.5.2.4 Monitoring clients...... 39 2.5.3 Stopping the Insight CMU service...... 39 2.5.4 Upgrading Java Runtime Environment...... 39

Contents 3 2.5.5 Removing the previous Insight CMU package...... 39 2.5.6 Installing the Insight CMU v8.0 package...... 40 2.5.7 Restoring the previous Insight CMU configuration...... 41 2.5.8 Installing your Insight CMU license...... 41 2.5.9 Configuring the updated UP Insight CMU...... 42 2.5.10 Starting Insight CMU...... 43 2.5.11 Deploying the monitoring client...... 44 2.6 Saving the Insight CMU database...... 44 2.7 Restoring the Insight CMU database...... 44 3 Launching the Insight CMU GUI...... 46 3.1 Insight CMU GUI...... 46 3.2 Insight CMU main window...... 46 3.3 Administrator mode...... 47 3.4 Quitting administrator mode...... 47 3.5 Launching the Insight CMU GUI...... 47 3.5.1 Launching the Insight CMU GUI using a web browser...... 47 3.5.2 Configuring the GUI client on Linux ...... 47 3.5.3 Launching the Insight CMU Time View GUI...... 48 4 Defining a cluster with Insight CMU...... 49 4.1 Insight CMU service status...... 49 4.2 High-level procedure for building an Insight CMU cluster...... 49 4.3 Cluster administration...... 49 4.3.1 Node management...... 50 4.3.1.1 Scanning nodes...... 51 4.3.1.2 Adding nodes...... 53 4.3.1.3 Modifying nodes...... 55 4.3.1.4 Importing nodes...... 55 4.3.1.5 Deleting nodes...... 55 4.3.1.6 Exporting nodes...... 55 4.3.1.7 Contextual menu...... 56 4.3.1.8 Get node static info...... 56 4.3.1.9 Rescan MAC...... 57 4.3.2 Network group management...... 57 4.3.2.1 Adding network groups...... 58 4.3.2.2 Deleting network groups...... 58 4.3.3 Uploading files to the Insight CMU server...... 58 5 Provisioning a cluster with Insight CMU...... 60 5.1 Image group management...... 60 5.1.1 Deleting image groups...... 63 5.1.2 Renaming image groups...... 63 5.2 Autoinstall...... 63 5.2.1 Autoinstall requirements...... 64 5.2.2 Autoinstall templates...... 64 5.2.3 Autoinstall calling methods ...... 66 5.2.4 Using autoinstall from GUI...... 66 5.2.4.1 Creating an autoinstall image group...... 66 5.2.4.2 Registering compute nodes...... 67 5.2.4.3 Autoinstall compute nodes...... 67 5.2.5 Using autoinstall from CLI...... 68 5.2.5.1 Registering an autoinstall image group...... 68 5.2.5.2 Adding nodes to autoinstall image group...... 69 5.2.5.3 Autoinstall compute nodes...... 69 5.2.6 Customization...... 69

4 Contents 5.2.6.1 RHEL autoinstall customization for nodes configured with HPE Dynamic Smart Array RAID (B120i, B320i, B140) using a driver update diskette image...... 70 5.2.6.2 Autoinstall RHEL 6.x and 7.x on nodes having multiple disks or LUNs ...... 72 5.2.6.3 SLES autoinstall customization for nodes configured with HPE Dynamic Smart Array RAID (B120i, B320i, B140i) using a driver update diskette image...... 72 5.2.6.4 Autoinstall SLES 11 SP3 on Gen9 servers and certain HPE Moonshot server cartridges requires special kISO images in addition to the DVD ISO...... 73 5.2.6.5 Disable consistent NIC device names during RHEL 7.x autoinstall...... 75 5.2.6.6 Autoinstall the ARM-based HPE ProLiant m400 and m800 Moonshot servers with Ubuntu14.04...... 75 5.2.7 Restrictions...... 76 5.3 Backing up a golden compute node...... 76 5.3.1 Backing up a disk from a compute node in an image group...... 77 5.3.1.1 Backup using GUI...... 77 5.3.1.2 Backup using CLI...... 83 5.4 Cloning...... 84 5.4.1 Performing the cloning operation using the GUI...... 84 5.4.2 Performing the cloning operation using the CLI...... 86 5.4.3 Cloning Windows images...... 86 5.4.4 Preconfiguration...... 87 5.4.5 Reconfiguration...... 87 5.5 Insight CMU image editor...... 88 5.5.1 Expanding an image...... 88 5.5.2 Modifying an image...... 89 5.5.3 Saving a modified cloning image...... 90 5.6 Insight CMU diskless environments...... 90 5.6.1 Overview...... 90 5.6.1.1 Enabling diskless support in Insight CMU...... 93 5.6.2 The CMURAM diskless method...... 93 5.6.2.1 Operating systems supported...... 94 5.6.2.2 Enabling CMURAM support...... 94 5.6.2.3 Preparing the Insight CMU management node...... 94 5.6.2.4 Preparing the golden node...... 94 5.6.2.5 Capturing and customizing a diskless CMURAM image...... 94 5.6.2.6 Adding nodes to the CMU diskless image group...... 96 5.6.2.7 nodes into a diskless CMURAM image group...... 96 5.6.3 Scaling out a diskless CMURAM compute node cluster...... 97 5.6.3.1 Configuring multiple boot servers...... 97 5.6.3.2 Preparing the boot servers...... 97 5.6.4 The Insight CMU oneSIS diskless method...... 98 5.6.4.1 Operating systems supported...... 98 5.6.4.2 Enabling oneSIS support...... 98 5.6.4.3 Preparing the Insight CMU management node...... 98 5.6.4.4 Preparing the golden node...... 98 5.6.4.5 Capturing and customizing a oneSIS diskless image...... 99 5.6.4.5.1 Creating an Insight CMU oneSIS diskless image...... 100 5.6.4.5.2 Customizing an Insight CMU oneSIS diskless image...... 101 5.6.4.6 Manage the writeable memory usage by the oneSIS diskless clients...... 102 5.6.4.7 Adding nodes and booting the diskless compute nodes...... 103 5.6.5 Scaling out diskless NFS-root cluster with multiple NFS servers...... 103 5.6.5.1 Comments on High Availability (HA)...... 105 5.6.6 Configuring DNS and the default gateway in an Insight CMU diskless cluster...... 105 6 Monitoring a cluster with Insight CMU...... 106 6.1 Installing the Insight CMU monitoring client...... 106

Contents 5 6.2 Deploying the monitoring client...... 106 6.3 Monitoring the cluster...... 107 6.3.1 Node and group status...... 109 6.3.2 Selecting the central frame display...... 109 6.3.3 Global cluster view in the central frame...... 109 6.3.4 Resource view in the central frame...... 110 6.3.4.1 Resource view overview...... 110 6.3.4.2 Detail mode in resource view...... 111 6.3.5 Gauge widget...... 112 6.3.6 Node view in the central frame...... 113 6.3.7 Using Time View...... 114 6.3.7.1 Tagging nodes...... 115 6.3.7.2 Adaptive stacking...... 115 6.3.7.3 Bindings and options...... 116 6.3.7.3.1 Mouse control...... 116 6.3.7.3.2 Keyboard control...... 116 6.3.7.3.3 Custom cameras...... 116 6.3.7.3.4 Options...... 116 6.3.7.4 Technical dependencies...... 117 6.3.7.5 Troubleshooting...... 117 6.3.8 Archiving custom groups...... 117 6.3.8.1 Visualizing history data...... 118 6.3.8.2 Limitations...... 118 6.4 Stopping Insight CMU monitoring...... 119 6.5 Customizing Insight CMU monitoring, alerting, and reactions...... 119 6.5.1 Action and alert files...... 119 6.5.2 Actions...... 120 6.5.3 Alerts...... 121 6.5.4 Alert reactions...... 121 6.5.5 Modifying the sensors, alerts, and alert reactions monitored by Insight CMU...... 122 6.5.6 Using collectl for gathering monitoring data...... 123 6.5.6.1 Installing and starting collectl on compute nodes...... 123 6.5.6.2 Modifying the ActionAndAlerts.txt file...... 123 6.5.6.3 Installing and configuring colplot for plotting collectl data...... 125 6.5.6.3.1 Plotting data...... 126 6.5.7 Monitoring GPUs and coprocessors...... 128 6.5.7.1 Monitoring NVIDIA GPUs...... 128 6.5.7.2 Monitoring AMD GPUs...... 129 6.5.7.3 Monitoring Intel coprocessors...... 130 6.5.8 Monitoring Insight CMU alerts in HPE Systems Insight Manager...... 132 6.5.9 Extended metric support...... 132 6.5.9.1 Configuring iLO 4 AMS extended metric support...... 135 6.5.9.1.1 Configuring the HPE iLO SNMP port...... 136 6.5.9.1.2 Accessing and viewing the HPE iLO data via SNMP...... 137 6.5.9.1.3 Configuring HPE iLO SNMP metrics in Insight CMU...... 139 6.5.9.2 Configuring HPE Moonshot power and temperature monitoring...... 141 6.6 Customizing node static information...... 142 7 Managing a cluster with Insight CMU...... 143 7.1 Unprivileged user menu...... 143 7.2 Administrator menu...... 143 7.3 SSH connection...... 144 7.4 Management card connection...... 144 7.5 Virtual serial port connection...... 144 7.6 Shutdown...... 144

6 Contents 7.7 Power off...... 144 7.8 Boot...... 145 7.9 Reboot...... 145 7.10 Change UID LED status...... 145 7.11 Multiple windows broadcast...... 145 7.12 Single window pdsh...... 146 7.12.1 cmudiff examples...... 147 7.13 Parallel distributed copy (pdcp)...... 150 7.14 Custom group management...... 150 7.14.1 Adding custom groups...... 150 7.14.2 Deleting custom groups...... 151 7.15 HPE Insight management...... 151 7.15.1 Viewing and analyzing BIOS settings...... 151 7.15.2 Checking BIOS versions...... 152 7.15.3 Installing and upgrading firmware...... 152 7.16 Customizing the GUI menu...... 153 7.16.1 Saving user settings...... 153 7.17 Insight CMU CLI...... 153 7.17.1 Starting a CLI interactive session...... 153 7.17.2 Basic commands...... 153 7.17.3 Specifying nodes...... 156 7.17.4 Administration and cloning commands...... 157 7.17.5 Administration utilities pdcp and pdsh...... 164 7.17.6 Insight CMU Linux shell commands...... 165 8 Advanced topics...... 166 8.1 Enabling non-root users...... 166 8.1.1 Configuring non-root user access with the Insight CMU GUI...... 166 8.1.1.1 Non-root user support for custom menu options...... 168 8.1.1.2 The /opt/cmu/etc/admins file...... 168 8.1.2 Examples of sudo configurations...... 169 8.2 Modifying the management network configuration...... 170 8.3 Customizing Insight CMU netboot kernel arguments...... 170 8.3.1 PXE-boot configuration file keywords...... 171 8.4 Cloning mechanisms...... 172 8.5 Support for Intel Xeon Phi cards...... 174 8.5.1 Intel Xeon Phi card IP address and host name assignment algorithm...... 175 8.5.2 Cloning an image with Intel Xeon Phi cards configured with independent IP addresses....176 8.5.3 Insight CMU oneSIS diskless support for independent addressing of Intel Xeon Phi cards...... 176 8.6 Insight CMU remote hardware control API...... 176 8.7 Insight CMU diskless API...... 178 8.7.1 Build diskless image...... 179 8.7.2 Delete diskless image...... 179 8.7.3 Configure diskless node...... 179 8.7.4 Post node configuration...... 180 8.7.5 Unconfigure diskless node...... 180 8.7.6 Boot diskless node...... 180 8.7.7 Diskless check...... 181 8.8 Insight CMU REST API...... 181 8.8.1 Overview...... 181 8.8.1.1 Base path...... 181 8.8.1.2 Version information...... 181 8.8.1.3 Feature coverage...... 181 8.8.2 Technical considerations...... 182

Contents 7 8.8.2.1 API doctype...... 182 8.8.2.2 Authentication...... 182 8.8.2.3 Resource identifier...... 183 8.8.2.4 Field filtering...... 183 8.8.3 Definitions...... 184 8.8.3.1 Node...... 184 8.8.3.1.1 NetworkSettings...... 184 8.8.3.1.2 ImageSettings...... 184 8.8.3.1.3 PlatformSettings...... 185 8.8.3.1.4 ManagementSettings...... 185 8.8.3.2 CustomGroup...... 185 8.8.3.3 ImageGroup...... 185 8.8.3.4 NetworkGroup...... 185 8.8.3.5 MultipleIdentifierDto...... 186 8.8.4 Resources...... 186 8.8.4.1 Custom group operations...... 186 8.8.4.1.1 Gets a single group...... 186 8.8.4.1.2 Updates an existing custom group...... 186 8.8.4.1.3 Deletes or archives an existing custom group...... 187 8.8.4.1.4 Gets all features of a single group...... 187 8.8.4.1.5 Adds or modifies features of an existing group...... 188 8.8.4.1.6 Remove all features of an existing group...... 188 8.8.4.1.7 Gets one node of an existing group...... 188 8.8.4.1.8 Removes one node from an existing group...... 189 8.8.4.1.9 Gets all nodes of an existing group...... 189 8.8.4.1.10 Adds nodes to an existing group...... 190 8.8.4.1.11 Removes some or all nodes from an existing group...... 190 8.8.4.1.12 Lists all groups...... 191 8.8.4.1.13 Updates a set of existing custom groups...... 191 8.8.4.1.14 Creates one or multiple new custom groups...... 191 8.8.4.1.15 Deletes or archives a set of existing custom groups...... 192 8.8.4.2 Image group operations...... 192 8.8.4.2.1 Gets one candidate of an existing image group...... 192 8.8.4.2.2 Removes one candidate from an existing image group...... 193 8.8.4.2.3 Lists all groups...... 193 8.8.4.2.4 Updates a set of existing image groups...... 194 8.8.4.2.5 Creates one or multiple new image groups...... 194 8.8.4.2.6 Deletes a set of existing image groups...... 195 8.8.4.2.7 Gets a single group...... 195 8.8.4.2.8 Updates an existing image group...... 195 8.8.4.2.9 Deletes an existing image group...... 196 8.8.4.2.10 Gets all nodes of an existing group...... 196 8.8.4.2.11 Adds nodes to an existing group...... 197 8.8.4.2.12 Removes some or all nodes from an existing group...... 197 8.8.4.2.13 Lists all candidates of an existing image group...... 197 8.8.4.2.14 Adds candidates to an existing image group...... 198 8.8.4.2.15 Removes some or all candidates from an existing image group...... 198 8.8.4.2.16 Gets one node of an existing group...... 199 8.8.4.2.17 Removes one node from an existing group...... 199 8.8.4.2.18 Gets all features of a single group...... 199 8.8.4.2.19 Adds or modifies features of an existing group...... 200 8.8.4.2.20 Remove all features of an existing group...... 200 8.8.4.3 Network group operations...... 201 8.8.4.3.1 Gets all features of a single group...... 201 8.8.4.3.2 Adds or modifies features of an existing group...... 201

8 Contents 8.8.4.3.3 Remove all features of an existing group...... 201 8.8.4.3.4 Gets all nodes of an existing group...... 202 8.8.4.3.5 Adds nodes to an existing group...... 202 8.8.4.3.6 Removes some or all nodes from an existing group...... 202 8.8.4.3.7 Lists all groups...... 203 8.8.4.3.8 Updates a set of existing network groups...... 203 8.8.4.3.9 Creates one or multiple new network groups...... 204 8.8.4.3.10 Deletes a set of existing network groups...... 204 8.8.4.3.11 Gets one node of an existing group...... 205 8.8.4.3.12 Removes one node from an existing group...... 205 8.8.4.3.13 Gets a single group...... 205 8.8.4.3.14 Updates a existing network group...... 206 8.8.4.3.15 Deletes an existing network group...... 206 8.8.4.4 Node operations...... 207 8.8.4.4.1 Lists all nodes that are not in any image group...... 207 8.8.4.4.2 Remove a set of nodes from their current image group...... 207 8.8.4.4.3 Gets a single node...... 207 8.8.4.4.4 Updates an existing node...... 208 8.8.4.4.5 Deletes an existing node...... 208 8.8.4.4.6 Lists all nodes...... 209 8.8.4.4.7 Updates a set of existing nodes...... 209 8.8.4.4.8 Creates one or multiple new nodes...... 209 8.8.4.4.9 Deletes a set of existing nodes...... 210 8.8.4.4.10 Gets all features of a single node...... 210 8.8.4.4.11 Adds or modifies features of an existing node...... 210 8.8.4.4.12 Remove all features of an existing node...... 211 8.8.4.4.13 Lists all nodes that are not in any network group...... 211 8.8.4.4.14 Remove a set of nodes from their current network group...... 212 8.9 Support for ScaleMP...... 212 9 Support and other resources...... 213 9.1 Accessing Hewlett Packard Enterprise Support...... 213 9.2 Accessing updates...... 213 9.3 Websites...... 213 9.4 Customer self repair...... 214 9.5 Remote support...... 214 9.6 Documentation feedback...... 214 9.7 Related information...... 214 9.8 Support term...... 215 A Troubleshooting...... 216 A.1 Insight CMU logs...... 216 A.1.1 cmuserver log files...... 216 A.1.2 Cloning log files...... 216 A.1.3 Backup log files...... 216 A.1.4 Monitoring log files...... 216 A.2 Network boot issues...... 216 A.2.1 Troubleshooting network boot...... 217 A.2.2 The anatomy of the PXE boot process...... 217 A.3 Backup issues...... 219 A.4 Cloning issues...... 219 A.4.1 Error when cloning a Windows node while part of a Windows domain...... 219 A.5 Administration command problems...... 220 A.6 GUI problems...... 220 A.6.1 The Insight CMU GUI cannot be launched from browser...... 220

Contents 9 A.6.2 The GUI fails to launch when java.binfmt_misc service is running on SLES11 management nodes...... 223 A.6.3 Launching the Insight CMU GUI on nodes with RHEL5u11...... 223 1 Insight CMU manpages...... 224 cmu_boot(8)...... 226 cmu_show_nodes(8)...... 227 cmu_show_image_groups(8)...... 229 cmu_show_network_groups(8)...... 230 cmu_show_custom_groups(8)...... 231 cmu_show_archived_custom_groups(8)...... 232 cmu_add_node(8)...... 233 cmu_add_network_group(8)...... 236 cmu_add_image_group(8)...... 237 cmu_add_to_image_group_candidates(8)...... 239 cmu_add_custom_group(8)...... 240 cmu_add_to_custom_group(8)...... 241 cmu_change_active_image_group(8)...... 242 cmu_change_network_group(8)...... 243 cmu_del_from_image_group_candidates(8)...... 244 cmu_del_from_network_group(8)...... 245 cmu_del_archived_custom_groups(8)...... 246 cmu_del_from_custom_group(8)...... 247 cmu_del_image_group(8)...... 248 cmu_del_network_group(8)...... 249 cmu_del_node(8)...... 250 cmu_del_snapshots(8)...... 251 cmu_del_custom_group(8)...... 252 cmu_console(8)...... 253 cmu_power(8)...... 254 cmu_custom_run(8)...... 256 cmu_clone(8)...... 257 cmu_backup(8)...... 259 cmu_scan_macs(8)...... 261 cmu_rescan_mac(8)...... 265 cmu_mod_node(8)...... 266 cmu_monstat(8)...... 269 cmu_image_open(8)...... 272 cmu_image_commit(8)...... 273 cmu_config_nvidia(8)...... 274 cmu_config_amd(8)...... 275 cmu_config_intel(8)...... 276 cmu_mgt_config(8)...... 277 cmu_firmware_mgmt(8)...... 279 cmu_monitoring_dump(8)...... 280 cmu_rename_archived_custom_group(8)...... 281 cmu_rename_image_group(8)...... 282 cmu_autoinstall_node(8)...... 283 cmu_add_feature(8)...... 284 cmu_show_features(8)...... 285 cmu_del_feature(8)...... 286 cmu_set_bmc_access(8)...... 287 Glossary...... 288 Index...... 290

10 Contents 1 Overview

HPE Insight Cluster Management Utility (Insight CMU) is a collection of tools that manage and monitor a large group of computer nodes, specifically HPC and large Linux Clusters. You can use Insight CMU to lower the total cost of ownership (TCO) of this architecture. Insight CMU helps manage, install, and monitor the compute nodes of your cluster from a single interface. You can access this utility through a GUI or a CLI. 1.1 Features Insight CMU is scalable and can be used for any size cluster. The Insight CMU GUI: • Monitors all the nodes of your cluster at a glance. • Configures Insight CMU according to your actual cluster. • Manages your cluster by sending commands to any number of compute nodes. • Replicates the disk of a compute node on any number of compute nodes. The Insight CMU CLI: • Manages your cluster by sending commands to any number of compute nodes. • Replicates the disk of a compute node on any number of compute nodes. • Saves and restores your Insight CMU database. 1.1.1 Compute node monitoring You can monitor many nodes using a single window. Insight CMU provides the connectivity status of each node as well as sensors. Insight CMU provides a default set of sensors such as CPU load, memory usage, I/O performance, and network performance. You can customize this list or create your own sensors. You can display sensor values for any number of nodes. Information provided by Insight CMU is used to ensure optimum performance and for troubleshooting. You can set thresholds to trigger alerts, and configure reactions to alerts. All information is transmitted across the network at time intervals, using a scalable protocol for real-time monitoring. 1.1.2 Insight CMU configuration Insight CMU requires a dedicated management server running RHEL or SLES. CentOS and Scientific Linux are supported on the management node, but require active approval and verification from Hewlett Packard Enterprise. The management node can run a different OS from the compute nodes. However, Hewlett Packard Enterprise recommends running the same OS on the compute nodes and on the management node.

IMPORTANT: Insight CMU does not qualify management nodes. Any x86 64-bit server with a supported operating system can become an Insight CMU management node. For more information on specific operating systems supported, see the Insight CMU release notes for your version of the product. All cluster nodes must be connected to the management node through an Ethernet network. Each compute node must have a management card. These management cards must be connected to an Ethernet network. This network must be accessible by the management node. Insight CMU is configured and customized using the Insight CMU GUI. Tasks include: • Manually adding, removing, or modifying nodes in the Insight CMU database • Invoking the scan node procedure to automatically add several nodes

1.1 Features 11 • Adding, deleting, or customizing Insight CMU groups • Managing the system images stored by Insight CMU • Configuring actions performed when a node status changes, such as display a warning, execute a command, or send an email • Exporting the Insight CMU node list in a simple text file for reuse by other applications • Importing nodes from a simple text file into the Insight CMU database 1.1.3 Compute node administration The Insight CMU GUI and CLI enable you to perform actions on any number of selected compute nodes. Tasks include: • Halting • Rebooting • Booting and powering off, using the compute node management card • Broadcasting a command to selected compute nodes, using a secure shell connection or a management card connection • Direct node connection by clicking a node to open a secure shell connection or a management card connection 1.1.4 System disk replication The Insight CMU GUI and CLI enable you to replicate a system on any number of selected compute nodes. Tasks include: • Creating a new image (While backing up a compute node system disk, you can dynamically select which partitions to back up.) • Replicating available images on any number of compute nodes in the cluster • Managing as many different images as needed for different software stacks, different operating systems, or different hardware • Cloning from one to many nodes at a time with a scalable algorithm that is reliable and does not stop the entire cloning process if any nodes are broken • Customizing reconfiguration scripts associated with each image to execute specific tasks on compute nodes after cloning

12 Overview 2 Installing and upgrading Insight CMU 2.1 Installing Insight CMU A typical Insight CMU cluster contains three kinds of nodes. Figure 1 (page 13) shows a typical HPC cluster. • The management node is the central point that connects all the compute nodes and the GUI clients. Installation, management, and monitoring are performed from the management node. The package cmu-v8.0-1.x86_64.rpm must be installed on the management node. All Insight CMU files are installed under the /opt/cmu directory. • The compute nodes are dedicated to user applications. A small software application that provides a monitoring report is installed on the compute nodes.

IMPORTANT: All compute nodes must be connected to an Ethernet network.

• The client workstations are any PC systems running Linux or Windows operating systems that display the GUI. The administrator can install, manage, and monitor the entire cluster from a client . Users can monitor the cluster and access compute nodes from their workstations. A management card is required on each node to manage the cluster. These management cards must be connected to an Ethernet network. The management node must have access to this network.

Figure 1 Typical HPC cluster

2.1.1 Management node hardware requirements The Insight CMU management node needs access to the compute nodes, the compute node management cards (iLOs), and the Insight CMU GUI clients. Each of these components is typically on a separate network, though that is not strictly required. Using independent networks ensures good network performance and isolates problems if network failures occur. Hewlett Packard Enterprise recommends the following NIC/network configuration for the management node: • Connect one NIC to a network established for compute node administration. • Connect a second NIC to the network connecting the Insight CMU management node to the Insight CMU GUI clients. • A third NIC is typically used to provide access to the network connecting all the compute node management cards (iLOs).

2.1 Installing Insight CMU 13 NOTE: The IP address of the NIC connected to the compute node administration network is needed during configuration of the Insight CMU management node.

2.1.2 Disk space requirements A total of 3GB of free disk space is needed to install all the subsets or packages required for Insight CMU. Additional space is needed to store each master disk image under the /opt/cmu/ image directory. If /opt is on a separate partition, verify it has enough space based on the number of disk images required. 2.1.3 Support for non-Hewlett Packard Enterprise servers

IMPORTANT: You must obtain a valid license to run Insight CMU on non-Hewlett Packard Enterprise hardware. The following section describes how Insight CMU functions with non-Hewlett Packard Enterprise servers.

Provisioning • autoinstall works (assumes PXE-boot support). • Diskless works (assumes PXE-boot support). • Backup and cloning must be tested. These processes rely on the Insight CMU netboot kernel which needs the network and disk drivers for non-Hewlett Packard Enterprise hardware. If these drivers for non-Hewlett Packard Enterprise hardware exist in the kernel.org source tree, then backup and cloning should work. If backup and cloning does not work on your specific hardware, contact Hewlett Packard Enterprise services.

Monitoring • All monitoring works, including the GUI. • If provisioning is not used, monitoring requires password-less ssh to be configured for the root account on all nodes.

NOTE: Backup and cloning configures this automatically.

Remote management • All core features work. For example: ◦ single|multi broadcast ◦ pdsh with cmudiff ◦ pdcp • Power control and console access depend on non-Hewlett Packard Enterprise hardware. Insight CMU supports IPMI. Otherwise, a new power interface can be configured. Insight CMU has an API for power control. • BIOS and firmware management are Hewlett Packard Enterprise-specific. • Custom menu support works.

14 Installing and upgrading Insight CMU 2.1.4 Planning for compute node installation Two IP addresses are required for each compute node: • Determine the IP address for the management card (iLO) on the management network. • Determine the IP address for the NIC on the administration network. Hewlett Packard Enterprise recommends assigning contiguous ranges of static addresses for nodes located in the same rack. This method eases the discovery of the nodes and makes the cluster management more convenient. The management cards must be configured with a static IP address. All the compute node management cards must have a single login and password.

NOTE: Insight CMU uses DHCP and PXE. Do not run other DHCP or PXE servers on the Insight CMU management network in the range of HPE ProLiant MAC addresses belonging to the Insight CMU cluster. NOTE: The settings described in this document are based on the assumption that the administration network on each compute node is connected to, and will PXE boot from, NIC1. While this configuration enables all supported hardware to be imaged by Insight CMU, some operating systems might not configure eth0 as NIC1. For example, RHEL 5.4 on the HPE ProLiant DL385 G6 Server defaults eth0 to NIC3. To simplify the installation process for your operating system, Hewlett Packard Enterprise recommends wiring your administration network to the NIC that defaults to eth0 and set that NIC to PXE boot rather than NIC1.

2.1.5 Configuring the local Smart Array card Insight CMU does not configure local disks. If you have a hardware RAID controller, configure a logical drive to be used later by the operating system. The same logical drive must be created on each compute node. The Compaq Array must be configured on each node of the cluster. You can select any RAID level. If you have only one physical drive, before performing the initial operating system installation, configure it in a logical RAID0 drive. Otherwise, the disk is not detected during the Linux installation procedure and during the cloning procedure. 2.1.6 Configuring the management cards To configure the management cards such as iLO: 1. Power on the server. 2. Access the management card. 3. Assign the same username and password to all management cards.

IMPORTANT: If the node is configured to boot other devices before PXE, ensure the account associated with this username has privileges to perform a one-time boot with PXE. For iLO, the account must have the Configure iLO Settings privilege.

4. Assign a fixed IP address to each management card.

NOTE: To configure the IP addresses on the iLO cards for Blade servers, you can use the EBIPA on the OA. For instructions, see “Configuring iLO cards from the OA: Blades only” (page 17). NOTE: Blade servers do not use the Single Sign-On capability. You must configure each Blade individually and create the same username and password. For instructions, see “Disabling server automatic power on: Blades only” (page 17).

2.1 Installing Insight CMU 15 2.1.7 Configuring the BIOS Insight CMU supports PXE-booting servers in either Legacy BIOS mode or UEFI mode. In either mode, the following parameters affect Insight CMU: • Boot order parameters. In general, network boot should have the highest priority for the management network NIC, but Insight CMU contains support for sending a boot next PXE request to the management card. This means that for most of the newer Hewlett Packard Enterprise servers, including HPE Moonshot solutions, you can leave the boot order alone or set it to disk boot before network boot. Then, when Insight CMU needs to PXE-boot a server, it will send the boot next PXE request to the management card to change the boot order on the next power cycle. Older servers or non-Hewlett Packard Enterprise servers that do not support boot next PXE should configure their boot order to network boot before disk boot.

IMPORTANT: If the server does not support boot next PXE and the boot order is not correctly set, then autoinstall, backup, and cloning will fail.

• Virtual serial console parameters enable the BIOS boot process and the Linux boot process to be viewed via the Virtual Serial Port (VSP) in the iLO and iLOCM management cards. Typically this only involves setting the Virtual Serial Port to COM1, and the Embedded Serial Port automatically switches to COM2. • Parameters that affect the behavior of the local disk controller. Parameter names can differ from one server to another and cannot be documented exhaustively. Examples are provided in the following sections.

2.1.7.1 DL3xx, DL5xx, DL7xx, Blades Parameters: • Virtual serial port COM1 • Embedded NIC ◦ NIC 1 PXE boot or PXE enabled ◦ NIC 2 Disabled ◦ NIC 3 Disabled (not always present) ◦ NIC 4 Disabled (not always present) • Boot order 1. PXE 2. CD 3. DISK • BIOS Serial console ◦ BIOS Serial Console Auto ◦ Speed 9600 Bd ◦ EMS Console COM1 ◦ Interface Mode Auto

2.1.7.1.1 OA IP address: Blades only Assign the OA IP address in the same subnet as the administration network.

16 Installing and upgrading Insight CMU 2.1.7.1.2 Configuring iLO cards from the OA: Blades only Use the EBIPA to assign consecutive addresses to the iLO: • 16 addresses on the c7000 Enclosure • 8 addresses on the c3000 Enclosure To configure the iLO cards: 1. Open a browser to the OA. 2. In the right window, select Device Bays. 3. Select Bay 1. 4. In the left window, select the Enclosure Setting tab and then Enclosure Bay IP Addressing. 5. Enter the IP address of the first iLO card. 6. Click Auto Fill or the red arrow. Each iLO is reset and assigned an IP address by the OA. 7. From the Insight CMU management node, ping each iLO.

2.1.7.1.3 Disabling server automatic power on: Blades only On each Blade server: 1. Access the iLO card. 2. Create the username and password. Each server must have the same username and password. 3. Select the Power Management tab. 4. For Automatically Power On Server, select No. 5. Select Submit.

Figure 2 iLO server power controls

2.1.7.2 DL160 G5, DL165c G5, DL165c G6, and DL180 G5 Servers • IDE ◦ ATA/IDE Enhanced ◦ Configure SATA as IDE IMPORTANT: The embedded SATA RAID Controller option is not supported. Do not select this option.

2.1 Installing Insight CMU 17 NOTE: These IDE settings only apply to the DL160 G5 Server.

• IPMI ◦ Serial Port assigned to System ◦ Serial Port Switching Disabled ◦ Serial Port Connection Mode Direct • LAN ◦ Share NIC mode Disabled ◦ DHCP Disabled • Remote Access ◦ Remote access Enabled ◦ Redirection Always ◦ Terminal VT100 • Boot Configuration ◦ Boot Order 1. Embedded NIC 2. Disk or Smart Array ◦ Embedded NIC1 Enabled

2.1.7.3 DL160 G6 Servers • IPMI ◦ Serial Port assigned to System ◦ Serial Port Connection Mode Direct • PCI ◦ NIC1 control Enabled ◦ NIC1 PXE Enabled • SATA ◦ SATA#1 Controller Mode AHCI • Boot Configuration ◦ Boot Order 1. NIC 2. CD 3. Disk

18 Installing and upgrading Insight CMU 2.1.7.4 SL2x170z G6 and DL170h G6 Servers BIOS setting

IMPORTANT: To enable BIOS updates, you must restart the server. You can restart the server with Ctrl+Alt+Delete immediately after leaving the BIOS, or you can physically restart the server by using the power switch on the server.

Figure 3 NIC2 on the SL2x170z G6 Server

• BIOS settings ◦ Post speedup Enabled ◦ Numlock Enabled ◦ Restore after AC loss Last state ◦ Post F1 prompt Delayed • CPU setup ◦ Proc hyper threading Disabled • IDE configuration ◦ SATA controller mode AHCI ◦ Drive Enabled ◦ IDE timeout 35 • Chipset ACPI configuration ◦ High Performance Event timer Enabled • IPMI serial port configuration ◦ Serial port assignment BMC ◦ Serial port switching Enabled ◦ Serial port connection mode Direct • LAN configuration If your node is wired with the LO100i management port shared with NIC2: ◦ BMC NIC Allocation Shared ◦ LAN protocol: HTTP, telnet, ping Enabled

2.1 Installing Insight CMU 19 Otherwise, if your node is wired with a dedicated management port for LO100i: ◦ BMC NIC Allocation Dedicated ◦ LAN protocol: HTTP, telnet, ping Enabled • Remote Access ◦ BIOS Serial console Enabled ◦ EMS console support Enabled ◦ Flow control Node ◦ Redirection after BIOS POST Enabled ◦ Serial port 9600 8,n,1 • Boot device priority ◦ Network ( 0500 ) ◦ Removable device ◦ Hard Disk • Enable PXE for the NIC that is connected to the administration network. 2.2 Preparing for installation 2.2.1 Insight CMU kit delivery The Insight CMU kit is delivered on DVD and is provided in the appropriate format for your operating system. These features enable Insight CMU files to be installed directly from the DVD to your disk. The Linux versions of Insight CMU are in the Red Hat Package Manager (RPM) format. 2.2.2 Preinstallation limitations • Insight CMU monitors only the compute nodes and not the infrastructure of the cluster. • For cloning Linux images: ◦ Insight CMU requires that each partition of the golden image node is at least 50% free. Alternatively, if this condition cannot be satisfied, then the largest partition of the golden node must be less than or equal to the compute node memory size. ◦ Insight CMU does not support software RAID on compute nodes. ◦ Insight CMU only clones one disk or logical drive per compute node. • Limitations for backup and cloning Windows images: IMPORTANT: Windows is only available on specific Moonshot cartridges. For details, see the latest version of the Insight CMU Release Notes available from the Insight CMU website: http://www.hpe.com/info/icmu. Under Related Links, click Technical Support / Manuals→Manuals.

◦ The Windows backup and cloning feature is supported only on specific Moonshot cartridges. No other platforms are supported. ◦ Insight CMU can backup and clone only one disk per compute node.

20 Installing and upgrading Insight CMU ◦ Windows dynamic disks are not supported. Only Windows basic disks are supported. ◦ When multiple (>1) primary and (>1) logical partitions are present in a Windows backup image, drive letters (for example, D:, E: ) assigned to the partitions on the cloned nodes are not consistent with the golden node. ◦ The local “Administrator” account and desktop folder are reset on the cloned nodes. Any content placed in the “Administrator” desktop directory is lost after cloning. ◦ Cloned nodes reboot twice after the first disk-boot for host specific customizations. ◦ GPT partition table is not supported.

IMPORTANT: Insight CMU does not support RAID arrays created by B110i RAID controllers (for example, SL4545 G7). Any attempts to backup or clone such RAID arrays will fail.

NOTE: You can partially overcome some of these limitations by using a reconfiguration script after cloning. For more information about reconfiguration, see “Reconfiguration” (page 87).

2.2.3 Operating system support

IMPORTANT: All OS minor releases, updates, and service packs (for example, RHEL7 Update 2 , SLES12 SP1) of a given OS major version (RHEL7 or SLES12) are supported. The Insight CMU team regularly qualifies and delivers patches (if needed) for new OS minor releases, as they become available. Insight CMU software is generally supported on Red Hat Enterprise Linux (RHEL) 5, 6, and 7; and SUSE Linux Enterprise Server (SLES) 11 and 12. The Insight CMU diskless environment is supported on RHEL 6 and 7; and SLES 11 and 12. Ubuntu 12.x, 13.x, and 14.x are supported on the compute nodes only, on HPE Ubuntu certified servers. Debian is supported on the compute nodes only, but requires active approval and verification from Hewlett Packard Enterprise. For support, contact Hewlett Packard Enterprise. CentOS and Scientific Linux are supported on the compute nodes and the management nodes, but require active approval and verification from Hewlett Packard Enterprise. For support, contact Hewlett Packard Enterprise. The following Windows operating systems are supported only on specific Moonshot server cartridges: • Windows 7 SP1 64-bit • Windows Server 2012 64-bit • Windows Server 2012 R2 64-bit

IMPORTANT: For the most recent information on operating systems and hardware supported, see the latest version of the Insight CMU Release Notes available from the Insight CMU website: http://www.hpe.com/info/icmu. Under Related Links, click Technical Support / Manuals→Manuals.

2.2.3.1 RHEL 7 support Insight CMU v8.0 supports RHEL 7 on the management node and compute nodes. Insight CMU continues to support a mix of operating systems. For example, RHEL 7 is not required on the management node if RHEL 7 is installed on your compute nodes. As with all Insight CMU releases, all backup images from previous Insight CMU versions can be used with v8.0.

2.2 Preparing for installation 21 HPE Smart Array warning with RHEL 7 and future Linux releases Older Smart Array controllers (cciss driver) such as HPE Smart Array P400, P800, P700m, etc. are not supported in RHEL 7. Users cannot autoinstall or clone RHEL 7 images to compute nodes with these legacy controllers. However, users can continue to use RHEL 5.x, 6.x, SLES 11, and Ubuntu images for such nodes.

Network device naming in RHEL 7 If the backup node and cloned nodes are not homogeneous in the hardware configuration, then RHEL 7 may assign different names for the network adapters (NICs) on the cloned nodes compared to that of the backup node. This results in admin network NIC IP address configuration failures on the cloned nodes. To resolve this, verify that the backup node and cloned nodes are homogeneously configured and use separate images for different groups of heterogeneous nodes. For example: On the backup node, the admin network NIC is recognized as "eno1" by RHEL 7. On another heterogeneous compute node, the admin network NIC may come up as "ens3f0" after cloning. Because the backup image contains the "ifcfg-eno1" config file only, the cloning process configures that file with an IP address, instead of "ifcfg-ens3f0". The RHEL 7 naming scheme also affects some of the NIC-related monitoring metrics like "eth0_MB/s_rx", "eth1_MB/s_rx", etc. Add the appropriate network interface ACTION to the /opt/cmu/etc/ActionAndAlertsFile.txt file and restart monitoring.

NOTE: For more information, see Disable consistent NIC device names during RHEL 7.x autoinstall (page 75). Also see the consistent network device naming in the RHEL 7 networking guide.

2.2.4 Insight CMU DVD directory structure The directory structure of the Insight CMU DVD is organized as described in Table 1. Table 1 Directory structure

Subdirectory Contents

Linux • Insight CMU kit for X86_64. CMU-.x86_64.rpm ( Insight CMU v8.0 for X86_64) • cmu-windows-moonshot-addon-.noarch.rpm (add-on for installing Windows OS on specific x86_64 Moonshot cartridges) • cmu-arm32-moonshot-addon-.noarch.rpm (add-on for supporting specific Moonshot cartridges based on ARM32) • cmu-arm64-moonshot-addon-.noarch.rpm (add-on for supporting specific Moonshot cartridges based on ARM64)

Tools Useful tools that can be used in conjunction with Insight CMU

Sources Source code for Open-Source components used by Insight CMU

Documentation Documentation and release notes

Licenses Contains the following licenses: Apache_LICENSE-2_0.txt, gluegen_LICENSE.txt, jogl_LICENSE.txt, sha512_LICENSE.txt.

2.2.5 Insight CMU installation checklist The following list summarizes the steps needed to install Insight CMU on your HPC cluster:

22 Installing and upgrading Insight CMU Preparing the management node: 1. For hardware requirements, see “Management node hardware requirements” (page 13). 2. Perform a full installation of your base OS on your management node. 3. Install required rpm files. For more information, see “Installation procedures” (page 23) 4. Install OpenJDK 7+ or Oracle Java Runtime Environment version 1.7u45 or later. 5. Install the Insight CMU rpm. 6. Install the Insight CMU license. 7. Configure Insight CMU. 8. Start Insight CMU. 9. Configure Insight CMU to start automatically.

Preparing the compute nodes: For instructions on how to prepare the compute nodes for installation, see “Planning for compute node installation” (page 15)

Preparing the GUI client workstation: 1. Install OpenJDK 7+ or Oracle Java Runtime Environment version 1.7u45 or later. 2. Optional: Copy cmugui_standalone.jar. 2.2.6 Login privileges To install Insight CMU, you must be logged in as root and have administrator privileges on the installation node. If relevant for the cluster, you must know the password of the management cards on the compute nodes. 2.2.7 SELinux and Insight CMU Hewlett Packard Enterprise recommends disabling SELinux on the management node and the compute node creating the image. To disable SELinux in RHEL versions, set SELINUX=disabled in the /etc/sysconfig/selinux file and restart the node. If you must use SELinux with Insight CMU, contact support. 2.3 Installation procedures

IMPORTANT: All steps in this section must be performed on the node designated as your Insight CMU management node. 1. Install a base operating system on your Insight CMU management node and perform any configuration steps that are necessary for the node to work within your local environment (for example, configure DNS, set up ntp time synchronization, etc). For more information on which operating systems are supported on the Insight CMU management node, see “Operating system support” (page 21). The following rpms must be installed on the Insight CMU management node. Any missing rpms are flagged as dependencies when the Insight CMU rpm is installed and must be installed to continue the installation. a. acl b. attr c. bc d. dhcp e. ed f. expect g. ipmitool

2.3 Installation procedures 23 h. libcurl i. net-tools j. NFS k. OpenJDK 7+ or Oracle Java Runtime Environment 1.7u45 or later l. OpenSSL m. perl-IO-Socket-SSL. n. perl-Net-SSLeay o. psmisc p. rsync q. Samba (Required only if you intend to use Insight CMU Windows autoinstall functionality which is supported only on specific Moonshot cartridges.) r. tcl-8 s. telnet t. tftp client u. tftp server v. xinetd If you are using firewalls on the Insight CMU management node or GUI client workstation, configure them to enable the following ports: • On the Insight CMU management node: ◦ External network interface – RMI registry traffic (tcp ports 1099, 49150) – Webserver port (tcp 80) – ssh server (tcp 22)

NOTE: If CMU_REST_HOST is not 127.0.0.1 (i.e. localhost) in the /opt/cmu/ etc/cmuserver.conf file, the value specified in CMU_REST_HTTPS_PORT must be added to the firewall.

◦ Internal (Admin/Compute) network interface – – Allow all incoming and outgoing traffic. Admin NIC should be a trusted interface or “Internal Zone”.

NOTE: Modify CMU_THTTPD_PORT in /opt/cmu/etc/cmuserver.conf to start the Insight CMU webserver on a different port.

2. Download and install OpenJDK 7+ or Oracle Java version 1.7u45 or later. 3. Install Insight CMU. a. Install the Insight CMU rpm key. # rpm --import /mnt/cmuteam-rpm-key.asc

NOTE: If you do not import the cmuteam-rpm-key a warning message similar to the following is displayed when you install the Insight CMU rmp:

warning: REPOSITORY/cmu-8.0-1.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID b59742b4: NOKEY b. Install the Insight CMU rpm. # rpm -ivh REPOSITORY/cmu-8.0-1.x86_64.rpm Preparing... ########################################### [100%] 1:cmu ########################################### [100%]

post-installation...

24 Installing and upgrading Insight CMU post-installation of x86_64 tree.....done ..done...

detailed log is /opt/cmu/log/cmu_install-Tue_Mar_22_19:48:35_IST_2016

******************************************************************************** * * * license enforcement has changed in cmu v8.0 * * older cmu v7.x license keys will not work in this version * * obtain new license keys for v8.0, otherwise a built-in one time evaluation * * lic key for 120 days will be automatically activated during service startup * * * * optional next steps: * * * * - install additional cmu packages (ex: cmu-windows-moonshot-addon) * * * * - restore a cluster configuration with /opt/cmu/tools/restoreConfig * * - complete the cmu management node setup: /opt/cmu/bin/cmu_mgt_config -c * * - setup CMU HA (more than one mgt node): /opt/cmu/tools/cmu_ha_postinstall * * * * after setup is finished, unset audit mode and start cmu : * * * * /etc/init.d/cmu unset_audit * * * * /etc/init.d/cmu start * *

NOTE: Insight CMU has dependencies on other rpms (for example, dhcp). If any missing dependencies are reported, install the required rpms and repeat this step.

c. Install the Insight CMU Windows Moonshot add-on rpm. Insight CMU v8.0 supports autoinstall, backup, and cloning of select Windows images for supported Moonshot cartridges. For more information, see “Preinstallation limitations” (page 20). If you intend to use Insight CMU Windows support, install cmu-windows-moonshot-addon-8.0-1.noarch.rpm. # rpm -ivh cmu-windows-moonshot-addon-8.0-1.noarch.rpm Preparing... ########################################### [100%] 1:cmu-windows-moonshot-ad########################################### [100%] post-installation payload... d. Install the Insight CMU ARM32 Moonshot add-on rpm. Insight CMU v8.0 supports provisioning of Moonshot cartridges based on the ARM32 architecture. If you intend to enable ARM32 support, install cmu-arm32-moonshot-addon-8.0-1.noarch.rpm. # rpm –ivh cmu-arm32-moonshot-addon-8.0-1.noarch.rpm Preparing... ########################################### [100%] 1:cmu-arm32-moonshot-addo########################################### [100%] post-installation of armv7l tree.....done e. Install the Insight CMU ARM64 Moonshot add-on rpm. Insight CMU v8.0 supports provisioning of Moonshot cartridges based on the ARM64 architecture. If you intend to enable ARM64 support, install cmu-arm64-moonshot-addon-8.0-1.noarch.rpm. # rpm –ivh cmu-arm64-moonshot-addon-8.0-1.noarch.rpm Preparing... ########################################### [100%] 1:cmu-arm64-moonshot-addo########################################### [100%] post-installation of aarch64 tree.....done 4. Install your Insight CMU license. Insight CMU v8.0 requires a valid node license key for each rack/Blade/SL server registered in the cluster. A separate chassis license key is required for each Moonshot chassis regardless of the number of cartridges inside a single chassis.

2.3 Installation procedures 25 IMPORTANT: Customers upgrading from v7.3.2 or earlier versions of Insight CMU to v8.0 must obtain new license keys. The Insight CMU license key format has changed in v8.0. Keys for v7.3.2 or earlier versions do not work with Insight CMU v8.0 and later versions. For more information, see “Installing your Insight CMU license” (page 41). If the license checks fail during the Insight CMU service startup time, a one-time evaluation license key (valid for 120 days) is automatically activated. This enables customers to proceed with Insight CMU installation and upgrade activities, without waiting for the permanent license keys for v8.0. Insight CMU will stop working if the permanent keys are not installed within 120 days after the installation or upgrade.

The procedure to obtain license keys is provided with the Entitlement Certificate. For more information, contact Hewlett Packard Enterprise support. Copy the content of all license key files to /opt/cmu/etc/cmu.lic.

NOTE: The newly added license keys will be effective only after starting cmu service. Example v8.0 node license key: ACTG C9MA H9PY KHU2 VKB5 HXWF 49JL KM7L B89H MZVU DXAU PRAD EEPG L762 SF32 FYB4 ALOK D5KM AFVW TT5J F73K 7U6K CM3L YFC2 CNBW XWLB VW89 Y55D N2SS L79Q XJUL LUQH TU8F 3DQC SAAN VEQW V886 NARE NWAT J8U3 2YT5 RNNV MHS3 QKTQ 688U CLEM ENTE Z2DX ABHI 4CP4 3F9N JQY5 "TESTCMU80322162 BD476A HPE_Insight_CMU_3yr_24x7_Flex_Lic 9D56AAUTHYD9" Example v8.0 Moonshot chassis license key: MD4Y A99A ANDY CHW3 VKB5 HWWN Y9JL KMPL B89H MZVU 8R4S LHWE 99QX X5T8 CMRG HPMR 4FVU A5K9 UHNK ACXK CHRI SWWP USD7 EPZY CNBW XWLB VW89 Y55D N2SS L79Q XJUL LUQH TU8F 3DQC SW7N VEQW NARE NE23 YWAT J8U3 2YT5 RNNV MHS3 QKTQ 688U F8A7 F8SE Z2DX SEBC 4CP4 3F9N JQY5 "TESTCMU80322162 D9Y34A HPE_Insight_CMU_Moonshot_3y24x7_Flex_Lic 93UUAAUT7Y3J" 5. Configure Insight CMU. To configure Insight CMU, run /opt/cmu/bin/cmu_mgt_config -c. The following is an example of executing the command on a management node running Red Hat Linux. In this example, the management node has the Insight CMU compute nodes connected to eth0 and has a second network on eth1 as a connection outside the cluster. # /opt/cmu/bin/cmu_mgt_config -c Checking that SELinux is not enforcing... [ OK ] Checking if firewall is down/disabled... [ OK ] Checking tftp for required configuration... Making required changes to tftp... [ OK ] Starting/restarting xinetd services... Stopping xinetd: [ OK ] Starting xinetd: [ OK ] Checking that NFS is configured... [ WARNING ]

The NFS config file doesn't specify a number of NFS threads. HPE Insight CMU recommends a minimum of 256 NFS threads on the management node regardless of cluster size.

Number of NFS threads to configure? [256] Configuring /etc/sysconfig/nfs ... [ OK ] Setting CMU_MIN_NFSD_THREADS in cmuserver.conf to 256 ... [ OK ] Checking NFS status... Starting NFS and configuring NFS to start on boot... Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Starting NFS daemon: [ OK ] Starting RPC idmapd: [ OK ] Checking dhcp for required configuration Locating dhcp file... [ OK ] Which eth should CMU use to access the compute node network? 0) eth0 10.117.20.78 1) eth1 16.117.234.140 :0 Checking if dhcpd interface is configured for "eth0"... Configuring dhcpd interface for "eth0"... [ OK ]

26 Installing and upgrading Insight CMU Checking existence of /etc/init.d/dhcpd... Setting dhcpd to start on (re)boot... Checking if CMU is configured to use 10.117.20.78 Setting CMU_CLUSTER_IP in cmuserver.conf to 10.117.20.78 [ OK ] Checking for competing dhcp servers... [ OK ] Checking for required sshd configuration... Making required changes to sshd... [ OK ] Restarting sshd Stopping sshd: [ OK ] Starting sshd: [ OK ] Checking/Configuring self-authorized key... [ OK ] Checking root passwordless ssh to this host (0.0.0.0)... [ OK ] Checking /etc/hosts for required configuration Status of /etc/hosts ... [ OK ] Checking if CMU supports the default java version... [ OK ]

The following files were modified by cmu_mgt_config

modified file: /etc/xinetd.d/tftp backup copy: /opt/cmu/etc/cmu_tftp_before_cmu_mgt_config

modified file: /etc/sysconfig/nfs backup copy: /etc/sysconfig/cmu_nfs_before_cmu_mgt_config

modified file: /opt/cmu/etc/cmuserver.conf backup copy: /opt/cmu/etc/cmuserver.conf_before_cmu_mgt_config

modified file: /etc/sysconfig/dhcpd backup copy: /etc/sysconfig/cmu_dhcpd_before_cmu_mgt_config

modified file: /etc/ssh/sshd_config backup copy: /etc/ssh/cmu_sshd_config_before_cmu_mgt_config # This command can be rerun at any time to change your configuration without adversely affecting previously configured steps. You can also verify your current configuration by running /opt/cmu/bin/cmu_mgt_config -ti. For additional options and details on this command, run /opt/cmu/bin/cmu_mgt_config -h. 6. Start Insight CMU. After the initial rpm installation, Insight CMU is configured in audit mode. To run Insight CMU, unset audit mode and start the Insight CMU service. # /etc/init.d/cmu unset_audit cmu service needs (re)start

NOTE: On RHEL7.x and SLES12 SPx management nodes, do not use the service cmu status or systemctl status cmu.service commands for displaying the status of cmu service. Instead, use the /etc/init.d/cmu status command. # /etc/init.d/cmu start starting tftp server check ... done creating a new history database cmu:core(standard) configured cmu:backend running: GUI (RMI *:1099 and *:49150), REST API (https://localhost:8080) cmu:cmustatus running cmu:monitoring not running cmu:dynamic custom groups unconfigured (cf ${CMU_PATH}/etc/cmuserver.conf CMU_DYNAMIC_UG_INPUT_SCRIPTS) cmu:web service running (HTTP *:80) cmu:nfs server running cmu:samba server not running cmu:dhcpd.conf configured ( subnet X.X.X.X netmask Y.Y.Y.Y {}) cmu:high-availability unconfigured Where X.X.X.X is the subnet IP address, and YYYY is the netmask of the subnet served by the dhcp server. The output lists all the daemons started by the service and their status. Verify that the daemons are in their expected state. core Indicates whether the core components of Insight CMU are configured. backend Indicates whether java is running and the interface used by the java to receive commands from GUI clients and to send status back to them.

2.3 Installation procedures 27 cmustatus Indicates the status of the utility that checks the state of all the compute nodes. Monitoring Indicates the status of the monitoring daemon that gathers the information reported by the small monitoring agent installed on the compute nodes.

NOTE: Because compute nodes are not installed on the cluster at this time, the monitoring agent is not started after the installation. This behavior is normal. The cluster must be configured for monitoring to start. dynamic custom groups Indicates the configuration status of dynamic custom groups. web service Indicates the status of the web service on the Insight CMU management node. By default, the web service listens on port 80. The Insight CMU GUI can be launched from a web browser that is pointed to the home page provided by the web service. nfs server Indicates the status of the NFS server. samba server Indicates the status of the Samba server. dhcpd.conf Indicates the status of the DHCPD configuration. high-availability Indicates whether the Insight CMU management node has been configured for high availability. 7. Configure Insight CMU to start automatically.

IMPORTANT: This installation depends on the operating system installed and might have to be adapted to your specific installation.

NOTE: The /etc/init.d/cmu file is available as a result of the Insight CMU installation.

a. Select one of the following options: • If your distribution supports chkconfig: # chkconfig --add cmu

• If your distribution does not support chkconfig, add start and kill links in the rc.d directory: # ln -s /etc/init.d/cmu /etc/rc.d/rc5.d/S99cmu # ln -s /etc/init.d/cmu /etc/rc.d/rc5.d/K01cmu # ln -s /etc/init.d/cmu /etc/rc.d/rc3.d/S99cmu # ln -s /etc/init.d/cmu /etc/rc.d/rc3.d/K01cmu b. After system reboot, verify that the /var/log/cmuservice_.log file does not contain errors. 8. Install Insight CMU on the GUI client workstation. 2.4 Installing Insight CMU with high availability If you are not using Insight CMU with high availability (HA), skip this section and go to the instructions on configuring the cluster in “Defining a cluster with Insight CMU” (page 49). A ”classic” Insight CMU cluster has a single management server. If that server fails, although the Insight CMU cluster continues to work for customer applications, you lose management

28 Installing and upgrading Insight CMU functions such as backup, cloning, booting a compute node, and ssh through the Insight CMU GUI. If the Insight CMU cluster uses a private network for management, you also lose connection to the site network. Installing and configuring Insight CMU under the control of HA software provides redundancy to avoid this Insight CMU service degradation. The following figure describes a “classic” Insight CMU cluster connected to two networks: the site network and a private cluster network where compute nodes are connected. The Insight CMU management server is known by its IP0 address on the site network, and by the IP1 address on the cluster network.

The next figure shows the corresponding configuration where two servers can run Insight CMU software in active or standby mode under control of HA software. Mgt server1 and mgt server 2 are connected to form an Insight CMU management cluster. The IP addresses IP0 and IP1 are attached to the Insight CMU management server that is running the Insight CMU software at a given time. The Insight CMU management cluster is known on the site network by the address IP0, and on the compute network by the address IP1. IP0 and IP1 are the only addresses Insight CMU recognizes. If that server fails, then IP0 and IP1 migrate to the other server. The two servers each have one IP address per network (IP2, IP3, IP4, IP5). The two servers are connected to shared storage which hosts the Insight CMU directory. This configuration is perceived as a “classic” Insight CMU configuration with a single management server by the compute nodes, and from the user's point of view.

2.4 Installing Insight CMU with high availability 29 The next figure shows a “classic” Insight CMU cluster with one Insight CMU management server and compute nodes connected directly to the site network. A unique IP address IP0 is used for compute node management and site network access.

The next figure shows the corresponding configuration with two Insight CMU management servers running Insight CMU software in active or standby mode under control of HA software. The address IP0 is attached to the server running the Insight CMU software. This is the unique address Insight CMU recognizes. Each Insight CMU management server has its own IP address on the site network, IP2 and IP3 respectively, unknown to Insight CMU.

2.4.1 HA hardware requirements Insight CMU under HA control has the following hardware requirements: • Two or more management servers • One shared storage accessed by both servers

30 Installing and upgrading Insight CMU 2.4.2 Software prerequisites In addition to the prerequisites described in “Preparing for installation” (page 20), you must install and configure the HA software of your choice. 2.4.3 Installing Insight CMU under HA

2.4.3.1 Overview

NOTE: To avoid confusion in this section, review the glossary definitions in “Glossary” (page 288). When installing Insight CMU as an HA cluster service, Hewlett Packard Enterprise recommends completing a normal Insight CMU installation on one management server, as described in “Preparing for installation” (page 20) and “Installation procedures” (page 23). During this phase, Insight CMU can be used as a normal standalone installation. Compute nodes can be installed, backed up, and cloned. In the second phase of installation, you must install and configure the HA software of your choice. This software controls the Insight CMU HA service. After testing the configuration, enable the Insight CMU HA service by running the /opt/cmu/ tools/cmu_ha_postinstall script. This operation moves some /opt/cmu directories to a unique shared file system. This shared file system must be configured as a resource of the Insight CMU HA service. After this procedure is completed on the first Insight CMU management server, a second server can be installed with Insight CMU and added to the management cluster. This procedure is repeated for additional servers connected to the shared storage.

2.4.3.2 Insight CMU HA service requirements When you configure the HA software layer, configure the Insight CMU HA service with the following resources: • A shared file system. The mount point of this file system must be /opt/cmu-store and must be created on all Insight CMU management servers. • A shared IP address. • If your Insight CMU cluster uses separate site and compute networks, an additional IP address resource must be configured and assigned to your Insight CMU HA service. • The Insight CMU HA service must be able to invoke the /etc/init.d/cmu script with start and stop parameters: # /etc/init.d/cmu start # /etc/init.d/cmu stop

NOTE: On RHEL7.x and SLES12 SPx management nodes, do not use the service or systemctl commands for actions such as start, stop, status, set_audit and unset_audit. Instead, use the /etc/init.d/cmu command.

2.4.3.3 Installing and testing 1. Install the Insight CMU rpm as described in “Installation procedures” (page 23). Be sure all prerequisites are fulfilled. 2. To test Insight CMU on the first cluster member, run the Insight CMU software. Verify that the Insight CMU software performs correctly.

2.4 Installing Insight CMU with high availability 31 2.4.4 Configuring HA control of Insight CMU

IMPORTANT: During the following procedure, results of the /etc/init.d/cmu script are saved to /var/log/cmuservice_hostname.log where hostname is the host name of the Insight CMU management cluster member. Refer to these log files for troubleshooting. 1. If Insight CMU is running, set audit mode before you stop Insight CMU: cmuadmin1# /etc/init.d/cmu set_audit cmuadmin1# /etc/init.d/cmu stop 2. In /opt/cmu/etc/cmuserver.conf, the CMU_CLUSTER_IP variable defines the IP address used by compute nodes to reach the management cluster. This variable must be set with the same IP address defined as the shared IP address resource for the compute network on the Insight CMU HA service: CMU_CLUSTER_IP=X.X.X.X Where X.X.X.X is the address known by compute nodes to reach the management node. 3. Check the status of the following shared HA resources and ensure that they are available. To check the status, use the appropriate HA command, for example clustat. • The shared file system mounted on /opt/cmu-store • The shared cluster IP address If the shared resources are not available, first start the Insight CMU cluster HA service using the appropriate command (for example clusvcadm or pcs) for that particular HA software. 4. Run cmu_ha_postinstall: cmuadlmin1# /opt/cmu/tools/cmu_ha_postinstall *** starting setup procedure to operate CMU in HA (Highly Available) environment *** note: this only affects the management nodes of the HPC cluster

********************************************************************** requirements to building an Highly Available cluster of cmu mgt nodes: ********************************************************************** * * * 1] a shared filesystem mounted at /opt/cmu/cmu-store * * * * it must support locking via flock() * * it must be mounted only by one (active) cmu mgt node at a time * * it must be NFS exportable (for autoinstall/diskless/backup/cloning)* * using version 3 of the NFS protocol * * * * 2] (at least) one alias IP address: * * * * this is the address used by the compute nodes to contact the mgt * * service, set CMU_CLUSTER_IP into /opt/cmu/etc/cmuserver.conf * * this address should follow the active cmu management machine * * * * [ optionally: a site alias IP address ] * * * * 3] a third-party HA software: * * * * this software is responsible for: * * * * - mounting/unmounting the /opt/cmu-store filesystem * * - activating/removing the alias IP address(es) * * - using /etc/init.d/cmu start|stop|status ( NOT /opt/cmu/cmuserver)* * * ********************************************************************** do you want to continue (at this stage only /opt/cmu-store is necessary) ?(y/N)y

setting cmu in audit mode

*** CMU is currently in 'audit' mode on cmuadmin1 *** use: '/etc/init.d/cmu unset_audit' to unset audit mode

copying the content of /opt/cmu/image to shared storage, please be patient, this could take time.... `/opt/cmu/image/spp' -> `/opt/cmu-store/image/spp' copying the content of /opt/cmu/history to shared storage, please be patient, this could take time.... `/opt/cmu/history/snapshots' -> `/opt/cmu-store/history/snapshots' `/opt/cmu/history/groups' -> `/opt/cmu-store/history/groups' `/opt/cmu/history/db' -> `/opt/cmu-store/history/db' `/opt/cmu/history/db/history.sqlite3' -> `/opt/cmu-store/history/db/history.sqlite3'

32 Installing and upgrading Insight CMU `/opt/cmu/history/archives' -> `/opt/cmu-store/history/archives' cmu_ha: saving local cmu config in:/opt/cmu-store/etc/savedConfig/cmuconf0-a9e45a7bb2e1d58e5f5272c3a32a6680.sav cmu_ha: saving clusterwide cmu config in:/opt/cmu-store/etc/savedConfig/cmuconf1-606d634b07d4f94a71206b2a4d1e1265.sav

*** *** run: "/opt/cmu/bin/cmu_mgt_config -c" now *** in particular in case of a version upgrade ***

cmu_ha: all existing cmu data has been saved cmu_ha: if required, restore it with:

/opt/cmu/tools/restoreConfig -f /opt/cmu/etc/savedConfig/

`/opt/cmu/etc/cmu.pd' -> `/opt/cmu-store/etc/cmu.pd'

*** *** the content of /opt/cmu/etc has been saved into /opt/cmu/etc-pre_HA_backup-oy557G *** it will not be used by cmu from now on ***

*** successfully configured CMU for HA operation *** install CMU similarly on the other management node(s) of the cluster

management node(s) configured so far: cmuadmin1 5. Start cmuserver and correct any possible errors. Because Insight CMU is still in audit mode, it is not started but will detect possible configuration errors. # /etc/init.d/cmu start For example: cmuadmin1# /etc/init.d/cmu start cmu:core(highavailability) configured (audit mode) starting tftp server check ... done

*** error : CMU_JAVA_BIN can not be set to 'path' for HA setup *** adjust CMU_JAVA_BIN into /opt/cmu/etc/cmuserver.conf

cmu:core(standard) failed configuring (audit mode) cmu:backend not running (audit mode) cmu:cmustatus not running (audit mode) cmu:monitoring not running (audit mode) cmu:dynamic custom groups unconfigured (cf ${CMU_PATH}/etc/cmuserver.conf CMU_DYNAMIC_UG_INPUT_SCRIPTS) (audit mode) cmu:web service not running (audit mode) cmu:nfs server running (audit mode) cmu:dhcpd.conf (unconfigured) cmu:cmu_ha_postinstall done (audit mode) cmu:/opt/cmu-store mounted (audit mode) 6. Set CMU_JAVA_BIN to the appropriate JAVA binary path in /opt/cmu/etc/ cmuserver.conf to fix the JAVA error. Start the cmu service again as mentioned in the previous step and verify that all the configuration errors are fixed. 7. Unset audit mode: cmuadmin1# /etc/init.d/cmu unset_audit cmu ha:cmu service needs (re)start This command does not actually start Insight CMU. It only clears the audit mode to enable Insight CMU to be started by the HA tool. 8. Run the appropriate HA software command (for example clusvcadm or pcs) to restart the Insight CMU cluster HA service. 9. To verify that Insight CMU is still running correctly, review the /var/log/cmuservice_hostname.log file for errors. 10. Install and configure Insight CMU on additional management cluster members. Installing new cluster members is basically the same as for configuring the first member. a. Using the same procedure as for the first member of the cluster, install the Insight CMU rpm.

2.4 Installing Insight CMU with high availability 33 b. If you started Insight CMU in standalone mode on the Insight CMU management server currently operating, you must put Insight CMU in audit mode and stop it before migrating the HA service: # /etc/init.d/cmu set_audit # /etc/init.d/cmu stop c. Migrate the HA service to the server that will perform the post installation procedure. d. On the second member of the HA cluster, save the cluster-wide configuration on a local disk directory. # /opt/cmu/tools/saveConfig.tcl -p /root/ -c /opt/cmu-store/ e. Run the post installation procedure on the second member of the HA cluster: cmuadmin2# /opt/cmu/tools/cmu_ha_postinstall *** starting setup procedure to operate CMU in HA (Highly Available) environment *** note: this only affects the management nodes of the HPC cluster

********************************************************************** requirements to building an Highly Available cluster of cmu mgt nodes: ********************************************************************** * * * 1] a shared filesystem mounted at /opt/cmu/cmu-store * * * * it must support locking via flock() * * it must be mounted only by one (active) cmu mgt node at a time * * it must be NFS exportable (for autoinstall/diskless/backup/cloning)* * using version 3 of the NFS protocol * * * * 2] (at least) one alias IP address: * * * * this is the address used by the compute nodes to contact the mgt * * service, set CMU_CLUSTER_IP into /opt/cmu/etc/cmuserver.conf * * this address should follow the active cmu management machine * * * * [ optionally: a site alias IP address ] * * * * 3] a third-party HA software: * * * * this software is responsible for: * * * * - mounting/unmounting the /opt/cmu-store filesystem * * - activating/removing the alias IP address(es) * * - using /etc/init.d/cmu start|stop|status ( NOT /opt/cmu/cmuserver)* * * ********************************************************************** do you want to continue (at this stage only /opt/cmu-store is necessary) ?(y/N)y

setting cmu in audit mode

*** CMU is currently in 'audit' mode on cmuadmin2 *** use: '/etc/init.d/cmu unset_audit' to unset audit mode

: cannot access /opt/cmu/image: No such file or directory ls: cannot access /opt/cmu/history: No such file or directory cmu_ha: saving local cmu config in:/opt/cmu-store/etc/savedConfig/cmuconf2-003c4e71038c92786ded4ecce76646bc.sav cmu_ha: saving clusterwide cmu config in:/opt/cmu-store/etc/savedConfig/cmuconf3-c70121126ee8c630798ef5281e0a8e8d.sav [INFO] copied existing cmu.lic file to /opt/cmu-store/etc/cmu.lic_before_restore_2016-02-24_11:38:06

before overwriting it [INFO] if /opt/cmu-store/etc/cmu.lic does not have the latest lic keys after restoreConfig step, check the /opt/cmu-store/etc/cmu.lic_before_restore_2016-02-24_11:38:06 file and copy the appropriate keys ls: cannot access /opt/cmu/image: No such file or directory

*** *** run: "/opt/cmu/bin/cmu_mgt_config -c" now *** in particular in case of a version upgrade ***

cmu_ha: all existing cmu data has been saved cmu_ha: if required, restore it with:

/opt/cmu/tools/restoreConfig -f /opt/cmu/etc/savedConfig/

`/opt/cmu/etc/cmu.pd' -> `/opt/cmu-store/etc/cmu.pd'

*** *** the content of /opt/cmu/etc has been saved into /opt/cmu/etc-pre_HA_backup-DanMAh

34 Installing and upgrading Insight CMU *** it will not be used by cmu from now on ***

*** successfully configured CMU for HA operation *** install CMU similarly on the other management node(s) of the cluster

management node(s) configured so far: cmuadmin1 cmuadmin2

IMPORTANT: The following warnings may appear while running the cmu_ha_postinstall command. Such warnings must be fixed before proceeding. To fix, follow the instructions listed in the warnings.

*** warning: both /opt/cmu/image and /opt/cmu-store/image are non-empty. 1- identify which image directory is required for operating the CMU cluster 2- copy them (/opt/cmu/image -> /opt/cmu-store/) 3- backup the local image dir (mv /opt/cmu/image /opt/cmu/image_bak) 4- restart this script (/opt/cmu/tools/cmu_ha_postinstall)

*** warning: both /opt/cmu/history and /opt/cmu-store/history are non-empty.

1- identify which history directory is required for operating the CMU cluster 2- copy them (/opt/cmu/history -> /opt/cmu-store/) 3- backup the local history dir (mv /opt/cmu/history /opt/cmu/history_bak) 4- restart this script (/opt/cmu/tools/cmu_ha_postinstall)

f. Restore the cluster-wide configuration saved in Step d. # /opt/cmu/tools/restoreConfig -f /root/cmuconf0-a7b7b2099e4456b7fe49c4ee10c9c5d6.sav g. Unset the audit mode on the new member: # /etc/init.d/cmu unset_audit cmu ha:cmu service needs (re)start h. Run the appropriate HA software command (for example clusvcadm or pcs) to restart the Insight CMU cluster HA service. i. Use the appropriate HA software command to migrate the Insight CMU HA service to the first member of the HA cluster.

2.4.5 Insight CMU configuration considerations The HA installation procedures described in “Installing Insight CMU under HA” (page 31) and “Configuring HA control of Insight CMU” (page 32) convert a standalone Insight CMU administration node into an Insight CMU HA administration cluster. During this procedure, the cmu_ha_postinstall script behaves as follows with respect to Insight CMU configurations: • The cmu_ha_postinstall script saves the local Insight CMU configuration of individual servers. For the configuration example in “Configuring HA control of Insight CMU” (page 32), cmuconf0-xxxx.sav is the local configuration file for the first server, and cmuconf1-yyyy.sav is the local configuration file for the second server. The local configuration is restored to the shared file system /opt/cmu-store on the first server that runs cmu_ha_postinstall. This Insight CMU configuration is made available to other Insight CMU management servers. • When cmu_ha_postinstall runs on the first Insight CMU management server, cmu_ha_postinstall does not save any cluster-wide configuration. The following message appears: cmu_ha: nothing to backup from the cmu HA share

2.4 Installing Insight CMU with high availability 35 • When cmu_ha_postinstall runs on the second Insight CMU management server, cmu_ha_postinstall saves a cluster-wide configuration. For the configuration example in “Configuring HA control of Insight CMU” (page 32), this configuration file is cmuconf2-zzzz.sav. Using this example, you do not have to manually restore any configuration. All necessary save and restore operations are performed by the cmu_ha_postinstall script. Other scenarios might require a manual intervention. For example, using different members of a future Insight CMU administration cluster in standalone mode might result in different configurations on each Insight CMU management server. The resulting cluster-wide configuration will reflect only one of these configurations. These different configurations cannot automatically merge. 2.4.6 Upgrading Insight CMU HA service To upgrade an Insight CMU management cluster under control of an HA software layer: 1. Set cmu service in audit mode on server 1, which is the server currently running the Insight CMU HA service. 2. Save the existing cluster configuration on the local disk on server 1. This could be useful if any issues are encountered during the upgrade procedure. # /opt/cmu/tools/saveConfig.tcl -p /root -c /opt/cmu-store 3. Remove any additional Insight CMU add-on rpms (for example, Windows, ARM32, ARM64, etc.) installed on server 1. (Windows is only available on specific Moonshot cartridges.) 4. Remove the previous Insight CMU rpm on server 1. 5. Install the new Insight CMU rpm on server 1. 6. If required, install the appropriate Insight CMU add-on rpms (for example, cmu-windows-moonshot-addon) on server 1. 7. Relocate the Insight CMU HA service to server 2. 8. Set cmu service in audit mode on server 2. 9. Remove any additional Insight CMU add-on rpms (for example, Windows, ARM32, ARM64, etc.) installed on server 2. 10. Remove the previous Insight CMU rpm on server 2. Make a note of the saved cluster configuration file name for restoring it later. The saved configuration file name looks like cmuconfX-1234567890.sav. For example: # rpm -e cmu cmustatus: no process killed cmu:monitoring waiting to stop thttpd: no process killed cmu_dynamic_custom_groups: no process killed cmu:core(standard) unconfigured (audit mode) cmu:core(highavailability) unconfigured (audit mode) cmu:backend not running (audit mode) cmu:cmustatus not running (audit mode) cmu:monitoring not running (audit mode) cmu:dynamic custom groups unconfigured (cf ${CMU_PATH}/etc/cmuserver.conf CMU_DYNAMIC_UG_INPUT_SCRIPTS) (audit mode) cmu:web service not running (audit mode) cmu:nfs server running (audit mode) cmu:dhcpd.conf configured ( subnet 16.16.184.0 netmask 255.255.248.0 { } ) cmu:cmu_ha_postinstall done (audit mode) cmu:/opt/cmu-store mounted (audit mode) note: use '/etc/init.d/cmu unset_audit' to unset cmu 'audit mode' cmuconf3-67f007c46ad7fe23c94268c0fe1a4e50.sav 11. Install the new Insight CMU rpm on server 2. 12. If required, install the appropriate Insight CMU add-on rpms (for example, cmu-windows-moonshot-addon) on server 2. 13. Run cmu_ha_postinstall on server 2. 14. Run /opt/cmu/bin/cmu_mgt_config -c on server 2. While running cmu_mgt_config, select the interface configured with shared cluster IP address. 15. Unset the audit mode on server 2. 16. Relocate the Insight CMU HA service to server 1.

36 Installing and upgrading Insight CMU 17. Run cmu_ha_postinstall on server 1. 18. Restore the cluster-wide configuration on server 1 using the appropriate configuration file saved in step Step 10. For example: # /opt/cmu/tools/restoreConfig -f /opt/cmu-store/etc/savedConfig/cmuconf3-67f007c46ad7fe23c94268c0fe1a4e50.sav 19. Remove the old license keys and place new Insight CMU v8.0 license keys in /opt/cmu/ etc/cmu.lic file. 20. Run /opt/cmu/bin/cmu_mgt_config -c on server 1. While running cmu_mgt_config select the interface configured with the shared cluster IP address. 21. Start cmu service /etc/init.d/cmu start and check for any errors. Because Insight CMU is in audit mode at this stage, cmu service is not actually started but shows possible configuration errors. For example: # /etc/init.d/cmu start cmu:core(highavailability) configured (audit mode) starting tftp server check ... done

*** error : CMU_JAVA_BIN can not be set to 'path' for HA setup *** adjust CMU_JAVA_BIN into /opt/cmu/etc/cmuserver.conf

cmu:core(standard) failed configuring (audit mode) cmu:backend not running (audit mode) cmu:cmustatus not running (audit mode) cmu:monitoring not running (audit mode) cmu:dynamic custom groups unconfigured (cf ${CMU_PATH}/etc/cmuserver.conf CMU_DYNAMIC_UG_INPUT_SCRIPTS) (audit mode) cmu:web service not running (audit mode) cmu:nfs server running (audit mode) cmu:dhcpd.conf (unconfigured) cmu:cmu_ha_postinstall done (audit mode) cmu:/opt/cmu-store mounted (audit mode)

If you see any JAVA version related errors, set CMU_JAVA_BIN to the appropriate JAVA binary path in /opt/cmu/etc/cmuserver.conf. 22. If errors were reported and corrected in the previous step, start cmu service again and verify all configuration errors are corrected. Do not proceed until all errors are fixed. For example: # /etc/init.d/cmu start cmu:core(highavailability) configured (audit mode) starting tftp server check ... done

cmu:core(standard) configured (audit mode) cmu:backend not running (audit mode) cmu:cmustatus not running (audit mode) cmu:monitoring not running (audit mode) cmu:dynamic custom groups unconfigured (cf ${CMU_PATH}/etc/cmuserver.conf CMU_DYNAMIC_UG_INPUT_SCRIPTS) (audit mode) cmu:web service not running (audit mode) cmu:nfs server running (audit mode) cmu:dhcpd.conf configured ( subnet 192.168.1.0 netmask 255.255.255.0 { } ) cmu:cmu_ha_postinstall done (audit mode) cmu:/opt/cmu-store mounted (audit mode)

note: use '/etc/init.d/cmu unset_audit' to unset cmu 'audit mode' 23. Unset the audit mode on server 1. 24. Using the appropriate command for your HA software, restart the Insight CMU HA service. 2.5 Upgrading Insight CMU Complete the steps in this section if you are upgrading an existing Insight CMU system from a previous Insight CMU version.

2.5 Upgrading Insight CMU 37 2.5.1 Upgrading to v8.0 important information

IMPORTANT: Customers upgrading from v7.3.2 or earlier versions of Insight CMU to v8.0 must obtain new license keys. The Insight CMU license key format has changed in v8.0. Keys for v7.3.2 or earlier versions do not work with Insight CMU v8.0 and later versions. For more information, see “Installing your Insight CMU license” (page 41). IMPORTANT: If you intend to use Insight CMU Windows autoinstall functionality, you must install Samba. (Windows is supported only on specific Moonshot cartridges.) The following environment variables are now kit properties and will be overwritten while restoring the saved cluster configuration: • CMU_VALID_ARCHITECTURE_TYPES • CMU_BIOS_SETTINGS_TOOL • CMU_BIOS_SETTINGS_FILE • CMU_WI_KERNEL • CMU_WI_INITRD • CMU_WI_KERNEL_PARMS • CMU_VALID_HARDWARE_TYPES • CMU_KS • CMU_PLATFORM_MAPPING_TABLE • CMU_BIOS_SETTINGS_TOOL • CMU_BIOS_SETTINGS_FILE • CMU_WI_KERNEL • CMU_WI_INITRD • CMU_WI_KERNEL_PARMS • CMU_VALID_HARDWARE_TYPES • CMU_KS_KERNEL • CMU_KS_INITRD • CMU_AY_KERNEL • CMU_AY_INITRD • CMU_DI_KERNEL • CMU_DI_INITRD • CMU_UI_KERNEL • CMU_UI_INITRD • CMU_KS_PARMS • CMU_KS_KERNEL_PARMS • CMU_AY_PARMS • CMU_AY_KERNEL_PARMS • CMU_DI_PARMS • CMU_DI_KERNEL_PARMS • CMU_UI_PARMS

38 Installing and upgrading Insight CMU • CMU_UI_KERNEL_PARMS • CMU_WAIT_FOR_DEV_TIMEOUT • CMU_NETBOOT_TIMEOUT • CMU_ILO_OS_SHUTDOWN_TIMEOUT After upgrading to Insight CMU v8.0, you must verify the cmuserver setup by running: cmu_mgt_config -c An upgrade to Insight CMU v8.0 from older versions may add new node-specific fields in the internal cmu database. To reinstall a previous version of Insight CMU, you must restore the previous configuration from backup. Previous versions of Insight CMU are unable to restore v8.0 backups. 2.5.2 Dependencies

2.5.2.1 64-bit versions on management node Insight CMU is an x86 64-bit kit only and can no longer run on x86 32-bit hardware.

2.5.2.2 tftp client Insight CMU depends on /usr/bin/tftp on the management node because of sanity check improvements.

2.5.2.3 Java version dependency Insight CMU v8.0 depends on OpenJDK 7+ or Oracle Java version 1.7u45 or later. Hewlett Packard Enterprise strongly recommends upgrading the Java JVMs on both the management node and the endstations running the GUI to avoid security problems with the remote file browser (used by the cmu_pdcp and autoinstall GUI dialogs).

2.5.2.4 Monitoring clients Upgrading the management node to Insight CMU v8.0 also requires upgrading the monitoring clients to v8.0 on the compute nodes. Monitoring will fail with an Insight CMU v8.0 management node and v7.3.2 or previous compute node clients. 2.5.3 Stopping the Insight CMU service To stop the Insight CMU service on the management node: # /etc/init.d/cmu stop

NOTE: On RHEL7.x and SLES12 SPx management nodes, do not use the service cmu stop or systemctl stop cmu.service commands for stopping the cmu service. Instead, use the /etc/init.d/cmu stop command.

2.5.4 Upgrading Java Runtime Environment Insight CMU v8.0 requires OpenJDK 7+ or Oracle Java Runtime Environment 1.7u45 or later. 2.5.5 Removing the previous Insight CMU package To remove the previous Insight CMU package: # rpm –e cmu The following message appears: configuration saved into /opt/cmu/etc/savedConfig/cmuconf##.sav Save this information. After installation, use this information to restore the previous configuration.

2.5 Upgrading Insight CMU 39 # /opt/cmu/tools/restoreConfig -f /opt/cmu/etc/savedConfig/cmuconf##.sav 2.5.6 Installing the Insight CMU v8.0 package To install Insight CMU: 1. Install the Insight CMU rpm key: # rpm --import /mnt/cmuteam-rpm-key.asc

NOTE: If you do not import the cmuteam-rpm-key, a warning message similar to the following appears when you install the Insight CMU rpm:

warning: REPOSITORY/cmu-8.0-1.x86_64.rpm: Header V4 DSA/SHA1 Signature, key ID b59742b4: NOKEY

2. Install the Insight CMU rpm: # rpm -ivh REPOSITORY/cmu-8.0-1.x86_64.rpm Preparing... ########################################### [100%] 1:cmu ########################################### [100%]

post-installation... post-installation of x86_64 tree.....done ..done...

detailed log is /opt/cmu/log/cmu_install-Tue_Mar_22_19:48:35_IST_2016

******************************************************************************** * * * license enforcement has changed in cmu v8.0 * * older cmu v7.x license keys will not work in this version * * obtain new license keys for v8.0, otherwise a built-in one time evaluation * * lic key for 120 days will be automatically activated during service startup * * * * optional next steps: * * * * - install additional cmu packages (ex: cmu-windows-moonshot-addon) * * * * - restore a cluster configuration with /opt/cmu/tools/restoreConfig * * - complete the cmu management node setup: /opt/cmu/bin/cmu_mgt_config -c * * - setup CMU HA (more than one mgt node): /opt/cmu/tools/cmu_ha_postinstall * * * * after setup is finished, unset audit mode and start cmu : * * * * /etc/init.d/cmu unset_audit * * * * /etc/init.d/cmu start * * * ********************************************************************************

NOTE: Insight CMU has dependencies on other rpms (for example, dhcp). If any missing dependencies are reported, install the required rpms and repeat this step.

3. Install the Insight CMU Windows Moonshot add-on rpm. Insight CMU supports autoinstall, backup, and cloning of select Windows images for supported Moonshot cartridges. For more information, see “Preinstallation limitations” (page 20). If you intend to use Insight CMU Windows support, install cmu-windows-moonshot-addon-8.0-1.noarch.rpm. # rpm -ivh cmu-windows-moonshot-addon-8.0-1.noarch.rpm Preparing... ########################################### [100%] 1:cmu-windows-moonshot-ad########################################### [100%] post-installation microsoft windows payload...

IMPORTANT: Users must run /opt/cmu/bin/cmu_mgt_config -c after installing the Windows add-on rpm to ensure that Windows and Samba service-specific configuration changes are made.

40 Installing and upgrading Insight CMU 4. Install the Insight CMU ARM32 Moonshot add-on rpm. Insight CMU v8.0 supports provisioning of Moonshot cartridges based on the ARM32 architecture. If you intend to enable ARM32 support, install cmu-arm32-moonshot-addon-8.0-1.noarch.rpm. # rpm –ivh cmu-arm32-moonshot-addon-8.0-1.noarch.rpm Preparing... ########################################### [100%] 1:cmu-arm32-moonshot-addo########################################### [100%] post-installation of armv7l tree.....done 5. Install the Insight CMU ARM64 Moonshot add-on rpm. Insight CMU v8.0 supports provisioning of Moonshot cartridges based on the ARM64 architecture. If you intend to enable ARM64 support, install cmu-arm64-moonshot-addon-8.0-1.noarch.rpm. # rpm –ivh cmu-arm64-moonshot-addon-8.0-1.noarch.rpm Preparing... ########################################### [100%] 1:cmu-arm64-moonshot-addo########################################### [100%] post-installation of aarch64 tree.....done

2.5.7 Restoring the previous Insight CMU configuration

IMPORTANT: Restoring the previous Insight CMU configuration may result in overwriting the /opt/cmu/etc/cmu.lic file with the license keys present in the old configuration. However, a backup copy of the overwritten content will be available as /opt/cmu/etc/ cmu.lic_before_restore_. In such scenarios, copy the appropriate v8.0 license keys from the backup file to the /opt/cmu/etc/cmu.lic file before attempting to start the Insight CMU service. If you have a pre-existing Insight CMU installation, you must restore your Insight CMU cluster configuration: # /opt/cmu/tools/restoreConfig -f /opt/cmu/etc/savedConfig/cmuconf##.sav Insight CMU v8.0 provides new features in the monitoring file /opt/cmu/etc/ ActionAndAlertFile.txt. When you restore the Insight CMU configuration, the customized ActionAndAlertsFile.txt is copied to /opt/cmu/etc, and the original file from the Insight CMU v8.0 rpm is saved in /opt/cmu/etc/ActionAndAlertsFile.txt_before_restore. No automatic merge is performed. If you want to use the new features, you must merge the two files manually. 2.5.8 Installing your Insight CMU license Insight CMU v8.0 requires a valid node license key for each rack/Blade/SL server registered in the cluster. A separate chassis license key is required for each Moonshot chassis regardless of the number of cartridges inside a single chassis.

IMPORTANT: Customers upgrading from v7.3.2 or earlier versions of Insight CMU to v8.0 must obtain new license keys. The Insight CMU license key format has changed in v8.0. Keys for v7.3.2 or earlier versions do not work with Insight CMU v8.0 and later versions. If the license checks fail during the Insight CMU service startup time, a one-time evaluation license key (valid for 120 days) is automatically activated. This enables customers to proceed with Insight CMU installation and upgrade activities, without waiting for the permanent license keys for v8.0. Insight CMU will stop working if the permanent keys are not installed within 120 days after the installation or upgrade.

The procedure to obtain license keys is provided with the Entitlement Certificate. For more information, contact Hewlett Packard Enterprise support. Copy the content of all license key files to /opt/cmu/etc/cmu.lic.

2.5 Upgrading Insight CMU 41 NOTE: The newly added license keys will be effective only after starting cmu service. Example v8.0 node license key: ACTG C9MA H9PY KHU2 VKB5 HXWF 49JL KM7L B89H MZVU DXAU PRAD EEPG L762 SF32 FYB4 ALOK D5KM AFVW TT5J F73K 7U6K CM3L YFC2 CNBW XWLB VW89 Y55D N2SS L79Q XJUL LUQH TU8F 3DQC SAAN VEQW V886 NARE NWAT J8U3 2YT5 RNNV MHS3 QKTQ 688U CLEM ENTE Z2DX ABHI 4CP4 3F9N JQY5 "TESTCMU80322162 BD476A HPE_Insight_CMU_3yr_24x7_Flex_Lic 9D56AAUTHYD9" Example v8.0 Moonshot chassis license key: MD4Y A99A ANDY CHW3 VKB5 HWWN Y9JL KMPL B89H MZVU 8R4S LHWE 99QX X5T8 CMRG HPMR 4FVU A5K9 UHNK ACXK CHRI SWWP USD7 EPZY CNBW XWLB VW89 Y55D N2SS L79Q XJUL LUQH TU8F 3DQC SW7N VEQW NARE NE23 YWAT J8U3 2YT5 RNNV MHS3 QKTQ 688U F8A7 F8SE Z2DX SEBC 4CP4 3F9N JQY5 "TESTCMU80322162 D9Y34A HPE_Insight_CMU_Moonshot_3y24x7_Flex_Lic 93UUAAUT7Y3J" 2.5.9 Configuring the updated UP Insight CMU To configure Insight CMU, run /opt/cmu/bin/cmu_mgt_config -c. The following is an example of executing the command on a management node running Red Hat Linux. In this example, the management node has the Insight CMU compute nodes connected to eth0 and has a second network on eth1 as a connection outside the cluster. # /opt/cmu/bin/cmu_mgt_config -c Checking that SELinux is not enforcing... [ OK ] Checking for required RPMs... [ OK ] Checking existence of root ssh key... [ OK ] Checking if firewall is down/disabled... [ OK ] Checking tftp for required configuration... [ UNCONFIGURED ] Making required changes to tftp... [ OK ] Starting/restarting xinetd services... Stopping xinetd: [ OK ] Starting xinetd: [ OK ] Checking that NFS is running... [ STOPPED ] Configuring NFS to start on boot and starting NFS... Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Stopping RPC idmapd: [ OK ] Starting RPC idmapd: [ OK ] Starting NFS daemon: [ OK ] Checking number of NFS threads >= 256... [ WARNING ] The management node is currently running 8 NFS threads. Insight CMU recommends a minimum of 256 NFS threads on the management node regardless of cluster size. How many NFS threads would you like me to configure? [256] Configuring the number of NFS threads to 256 ... [ OK ] Setting CMU_MIN_NFSD_THREADS in cmuserver.conf to 256 ... [ OK ] Restarting NFS Shutting down NFS daemon: [ OK ] Shutting down NFS mountd: [ OK ] Shutting down NFS quotas: [ OK ] Shutting down NFS services: [ OK ] Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS mountd: [ OK ] Stopping RPC idmapd: [ OK ] Starting RPC idmapd: [ OK ] Starting NFS daemon: [ OK ] Checking dhcp for required configuration Locating dhcp file... [ OK ] Which eth should CMU use to access the compute node network? 0) eth0 10.117.20.78 1) eth1 16.117.234.140 :0 Checking if dhcpd interface is configured for "eth0"... [ UNCONFIGURED ]

42 Installing and upgrading Insight CMU Configuring dhcpd interface for "eth0"... [ OK ] Setting dhcpd to start on (re)boot... Checking if CMU is configured to use 10.117.21.37 [ OK ] Checking for required sshd configuration... [ UNCONFIGURED ] Making required changes to sshd... [ OK ] Restarting sshd Stopping sshd: [ OK ] Starting sshd: [ OK ] Checking if CMU supports the default java version... [ OK ] Checking for valid CMU license... [ OK ] The following files were modified by cmu_mgt_config modified file: /etc/xinetd.d/tftp backup copy: /etc/xinetd.d/cmu_tftp_before_cmu_mgt_config modified file: /etc/sysconfig/nfs backup copy: /etc/sysconfig/cmu_nfs_before_cmu_mgt_config modified file: /opt/cmu/etc/cmuserver.conf backup copy: /opt/cmu/etc/cmuserver.conf_before_cmu_mgt_config modified file: /etc/sysconfig/dhcpd backup copy: /etc/sysconfig/cmu_dhcpd_before_cmu_mgt_config modified file: /etc/ssh/sshd_config backup copy: /etc/ssh/cmu_sshd_config_before_cmu_mgt_config # This command can be rerun at any time to change your configuration without adversely affecting previously configured steps. You can also verify your current configuration by running /opt/cmu/bin/cmu_mgt_config -ti. For additional options and details on this command, run /opt/cmu/bin/cmu_mgt_config -h. 2.5.10 Starting Insight CMU After the initial rpm installation, Insight CMU is configured in audit mode. To run Insight CMU, unset audit mode and start the Insight CMU service. # /etc/init.d/cmu unset_audit cmu service needs (re)start

NOTE: On RHEL7.x and SLES12 SPx management nodes, do not use the service cmu start or systemctl start cmu.service commands for starting the cmu service. Instead, use the /etc/init.d/cmu start command. # /etc/init.d/cmu start starting tftp server check ... done creating a new history database cmu:core(standard) configured cmu:backend running: GUI (RMI *:1099 and *:49150), REST API (https://localhost:8080) cmu:cmustatus running cmu:monitoring not running cmu:dynamic custom groups unconfigured (cf ${CMU_PATH}/etc/cmuserver.conf CMU_DYNAMIC_UG_INPUT_SCRIPTS) cmu:web service running (HTTP *:80) cmu:nfs server running cmu:samba server not running cmu:dhcpd.conf configured ( subnet X.X.X.X netmask YYYY { } ) cmu:high-availability unconfigured Where X.X.X.X is the subnet IP address, and YYYY is the netmask of the subnet served by the dhcp server. The output lists all the daemons started by the service and their status. Verify that the daemons are in their expected state. core Indicates whether the core components of Insight CMU are configured. backend Indicates whether java is running and the interface used by the java to receive commands from GUI clients and to send status back to them.

2.5 Upgrading Insight CMU 43 cmustatus Indicates the status of the utility that checks the state of all the compute nodes. Monitoring Indicates the status of the monitoring daemon that gathers the information reported by the small monitoring agent installed on the compute nodes.

NOTE: Because compute nodes are not installed on the cluster at this time, the monitoring agent is not started after the installation. This behavior is normal. The cluster must be configured for monitoring to start. dynamic custom groups Indicates the configuration status of dynamic custom groups. web service Indicates the status of the web service on the Insight CMU management node. By default, the web service listens on port 80. The Insight CMU GUI can be launched from a web browser that is pointed to the home page provided by the web service. nfs server Indicates the status of the NFS server. samba server Indicates the status of the Samba server. dhcpd.conf Indicates the status of the DHCPD configuration. high-availability Indicates whether the Insight CMU management node has been configured for high availability. 2.5.11 Deploying the monitoring client If you use Insight CMU monitoring, upgrade the monitoring client on your Insight CMU client nodes. For more information about deploying the monitoring client, see “Deploying the monitoring client” (page 106). 2.6 Saving the Insight CMU database The saveConfig script saves the Insight CMU configuration. The script creates an archive containing several files, for example /etc/hosts, /etc/dhcpd.conf, /etc/exports, and the Insight CMU configuration files cmuserver.conf and reconf.sh. Save the database with the following command: # /opt/cmu/tools/saveConfig.tcl -p /my-path Where my-path is the location where you want to save the configuration. The scripts creates the archive there. The format of the archive file name is: cmuconf#-XXXX.sav Where # is an incremental number and XXXX is a checksum number.

NOTE: When you uninstall Insight CMU, the saveConfig script is automatically executed. Configuration settings are saved in the /opt/cmu/etc/savedConfig/ path. 2.7 Restoring the Insight CMU database The restoreConfig script restores an Insight CMU configuration. You must provide a configuration archive file. The /etc/hosts and /etc/dhcpd.conf and the Insight CMU configuration files cmuserver.conf and reconf.sh are replaced. Restore the database with the following command:

44 Installing and upgrading Insight CMU # /opt/cmu/tools/restoreConfig -f /opt/cmu/etc/savedConfig/cmuconf8-XXXX.sav Where /opt/cmu/etc/savedConfig/cmuconf8-XXXX.sav is a previously saved configuration archive file.

2.7 Restoring the Insight CMU database 45 3 Launching the Insight CMU GUI 3.1 Insight CMU GUI The Insight CMU GUI can be used from any workstation connected through the network to the cluster management node. The Insight CMU GUI is composed of the following modules: • A Java GUI running on the client Windows or Linux workstation • A server module on the management node to run tasks on compute nodes

IMPORTANT: If the server module is not running on the management node, the client module cannot perform any tasks.

TIP: To close an unwanted dialog window, use ESC. 3.2 Insight CMU main window If not already done, start the Insight CMU GUI on your workstation. Depending on your selected method of launching the GUI, the IP address of the management node might be requested. If your workstation has more than one network interface, then the correct network interface to use for communication with the management node might also be requested by the Insight CMU GUI. The following figure represents the Insight CMU main window.

Figure 4 Insight CMU main window

Figure 4 (page 46) contains four main areas: • The top bar allows you to perform configuration commands. • The left frame lists resources such as NetworkGroups, Image Groups, Nodes Definitions, etc. The '+' expands a resource. If Insight CMU cluster configuration commands have not yet been entered, most resources are empty.

46 Launching the Insight CMU GUI • A filter allows you to show specific resources. • The central frame displays the global cluster view. In Figure 4 (page 46), the global cluster view is empty because the cluster is not yet configured. • The bottom frame shows log information. 3.3 Administrator mode Click Options→Enter Admin Mode. You must have administrator privileges to perform the cluster configuration tasks described in this chapter. If you do not have administrator privileges, then you can monitor the cluster status, but you cannot perform all the tasks described in this chapter.

IMPORTANT: Cluster configuration tasks can be performed on only one instance of the GUI at a time. 3.4 Quitting administrator mode Click Options→Leave Admin Mode. 3.5 Launching the Insight CMU GUI The Insight CMU GUI runs on the Insight CMU client and can be launched two ways: • Through the client web browser by connecting to the Insight CMU management server • By copying the Insight CMU GUI Java file onto the client 3.5.1 Launching the Insight CMU GUI using a web browser The Insight CMU GUI is a Java application that can be downloaded from the web server running on the Insight CMU management node by using Webstart. Using the Insight CMU client web browser, the Insight CMU GUI can be accessed remotely and launched automatically on the client workstation. This capability enables access to the Insight CMU GUI from any workstation. Insight CMU automatically starts a minimal web server on port 80 of the management node that serves only the Insight CMU website. If an HTTP service is already running on this port on the management node, then the Insight CMU web service does not run. If you want to use a different port number, then edit the environment variable CMU_THTTPD_PORT in the /opt/cmu/etc/ cmuserver.conf file. To launch the Insight CMU GUI: 1. Start a web browser on the Insight CMU client and then enter: http://cmu-management-node-ip-addr 2. From the main menu of the Insight CMU v8.0 website, click Launch Cluster Management Utility GUI. 3.5.2 Configuring the GUI client on Linux workstations On Linux workstations, you can use a secure ssh tunnel to communicate between the workstation running the Insight CMU GUI and the Insight CMU management server.

Using an ssh tunnel 1. To open the ssh tunnel, the following settings are required on the Insight CMU management server.

3.3 Administrator mode 47 a. Put Xauth in the PATH. Xauth is typically at: • /usr/bin/xauth on Red Hat • /usr/bin/X11/xauth on SUSE b. Install the following: • On Red Hat: xorg-x11-xauth rpm • On SUSE: xorg-x11-libs rpm • On Debian: xbase-clients.X.X.deb

NOTE: If you did not select the X11 package during your Linux installation, then you must manually install it.

c. sshd_config 1) Edit /etc/ssh/sshd_config as follows: X11Forwarding yes PasswordAuthentication yes 2) Restart sshd. # /etc/init.d/sshd restart Stopping sshd: [ OK ] Starting sshd: [ OK ] d. Be sure localhost is resolved and pingable. 2. To verify the ssh tunnel is working correctly from the GUI workstation, open an ssh connection to the Insight CMU management server. # ssh x.x.x.x -l root Where x.x.x.x is the IP address of the Insight CMU management server.

3.5.3 Launching the Insight CMU Time View GUI 1. Select a group (in Network Group, Image Group, or Custom Group). 2. In the right panel, click the third tab labeled Time View. Each selected metric is represented by a tube filled with rings. Each ring represents a snapshot of the metric value at a given time. A ring is composed of petals Each petal represents a value for a given metric, at a given time, for a given node. Some Time View functions are inherited from 2D flowers. All node interaction is preserved from 2D to 3D. To interact with a node, right-click on it or just hover over a 3D petal with your mouse to make a tooltip appear. The tooltip displays detailed values for the petal.

48 Launching the Insight CMU GUI 4 Defining a cluster with Insight CMU 4.1 Insight CMU service status Obtain the status of all Insight CMU service components with the following command on the management node: # /etc/init.d/cmu status

NOTE: On RHEL7.x and SLES12 SPx management nodes, do not use the service cmu status or systemctl status cmu.service commands for displaying the status of cmu service. Instead, use the /etc/init.d/cmu status command.

Insight CMU must be properly configured before using the GUI. Ensure that the core and backend services report configured. 4.2 High-level procedure for building an Insight CMU cluster After Insight CMU is installed and running on the management node, the rest of the cluster can be configured as follows: 1. Start Insight CMU on the management node. 2. Start the GUI client on the GUI workstation. 3. Scan the compute nodes. 4. Create the network groups. For more information, see “Network group management” (page 57). 5. Perform a full Linux installation on the first compute node. This is referred to as the "golden node". 6. Deploy the management agent on the compute nodes. a. Install the expect package. b. Install the monitoring rpm. 7. Create the image groups. 8. Back up the golden node in its image group. This operation creates the "golden image" from which other compute nodes are cloned. You can have several golden images, each in its own image group. 9. Clone the compute nodes. 4.3 Cluster administration

Figure 5 Cluster administration menu

4.1 Insight CMU service status 49 4.3.1 Node management

Figure 6 Node management window

In Figure 6 (page 50), the node list of the cluster will appear as the node database is populated by adding, scanning, or importing nodes. Each node is represented by a line containing the following attributes: • Node name • Node IP address • Netmask of the node • MAC address of the node • Image group that cloned the node • IP address of the management card of the node, or none if unused • The type of management card on the node which can be iLO, LO100i, iLOCM, or none • The architecture of the compute node (x86_64 is the default) • For Moonshot servers—The auxiliary(cartridge) Id and Node Id of the node • The node platform type (used for cloning special architectures) • The serial port for the node (where the kernel will direct all console output) • The serial port speed for the node • Any required vendor-related kernel arguments when PXE-booting the node • Cloning block device information for the node • BIOS boot mode setting for the node • The management node IP address for the node (required for multiple networks) • The default gateway address for the node

50 Defining a cluster with Insight CMU • The iSCSI root boot string for the node (reserved for future iSCSI support) The Node Management window enables you to perform the following tasks: • Add nodes to the Insight CMU database • Delete nodes from the Insight CMU database • Modify node attributes in the Insight CMU database • Scan nodes to automatically add them to the Insight CMU database • Import nodes • Export nodes

4.3.1.1 Scanning nodes Cluster Administration→Node Management→Scan Node The Insight CMU Node Management component provides the capability to scan new nodes into the Insight CMU database. You can also manually add node information. Use this interface to scan nodes in the Insight CMU database to retrieve hardware addresses and configure IP addresses. The Insight CMU database is updated with the new nodes. Enter parameters in the initial Scan Node dialog box.

IMPORTANT: On compute nodes with LO100i management cards, scanning does not work if PXE is not activated on the BIOS. Only the hardware address of the PXE-enabled NIC is retrieved. IMPORTANT: If the cluster has Moonshot cartridges based on ARM32 or ARM64 architecture, install the appropriate Insight CMU add-on packages before scanning those nodes.

4.3 Cluster administration 51 Figure 7 Scan node dialog

For help on each parameter, click the ?.

IMPORTANT: Make sure you select the correct management card type for your compute nodes (LO100i/iLO/iLOCM). To launch the scan node process: 1. Click OK. 2. In the confirmation window, click OK. 3. Enter the user name and password for the management card.

NOTE: This is necessary only for the first scan operation. For subsequent scans, the Management card password window will not be displayed.

52 Defining a cluster with Insight CMU Figure 8 Management card password window

4. The Scan Node Result window appears. Figure 9 (page 53) 5. Select to either add or replace scanned nodes.

Figure 9 Scan nodes result

4.3.1.2 Adding nodes Cluster Administration→Node Management→Add Node Use this interface to add a new node to the Insight CMU database.

4.3 Cluster administration 53 Figure 10 Add node dialog

At the Node Dialog box: 1. Click OK. A dialog box displays the successful addition of a node completion. 2. Click OK. A dialog box asks if you want to add another node.

NOTE: When you add a node, include it in a network group using the Network Group Management utility. NOTE: When adding Moonshot cartridges based on ARM32 or ARM64, select the appropriate architecture (armv7l or aarch64) in the Add Node Dialog.

The newly added nodes appear in the node list.

54 Defining a cluster with Insight CMU Figure 11 Populated database node management window

4.3.1.3 Modifying nodes Cluster Administration→Node Management To modify the attributes of a node, select the node in the Node Management list, and then select Modify Node. The same interface as Add Node appears.

NOTE: The node name cannot be changed.

4.3.1.4 Importing nodes Cluster Administration→Node Management→Import Node To import nodes from a flat text file, select an existing text file and then click Open to import all the nodes from this file into the Insight CMU database. The following is a sample import/export file: cn01 10.50.249.1 255.255.0.0 f0-92-1c-b4-58-54 default 16.100.117.152 ILOCM x86_64 1 1 generic default default "default" default auto default default "none" cn02 10.50.249.2 255.255.0.0 f0-92-1c-b4-52-90 default 16.100.117.152 ILOCM x86_64 2 1 generic default default "default" default auto default default "none" cn03 10.50.249.3 255.255.0.0 f0-92-1c-b4-51-fc default 16.100.117.152 ILOCM x86_64 3 1 generic default default "default" default auto default default "none" cn04 10.50.249.4 255.255.0.0 f0-92-1c-b4-54-dc default 16.100.117.152 ILOCM x86_64 4 1 generic default default "default" default auto default default "none" cn05 10.50.249.5 255.255.0.0 f0-92-1c-b4-53-f4 default 16.100.117.152 ILOCM x86_64 5 1 generic default default "default" default auto default default "none"

NOTE: This file can be manually created and edited, but incorrect formatting can break the operation. All the imported nodes belong to the default image group.

4.3.1.5 Deleting nodes Use this interface to delete a node from the Insight CMU database. Select any number of nodes in the nodes list, and then click Delete Node.

IMPORTANT: After deleting a node, you cannot recover its attributes.

4.3.1.6 Exporting nodes To export node attributes to a flat text file, select one or several nodes in the Node Management window, click Export Node, and then save the file.

4.3 Cluster administration 55 4.3.1.7 Contextual menu Select one or several nodes in the Node Management window. To display a contextual menu window, right-click the selected node or nodes. The contextual menu has the following options: • Delete Nodes invokes the delete node procedure for the selected nodes. This feature is equivalent to selecting the Delete Node option from the Node Management menu. • Change Image Group changes the active group of all selected nodes. The node appears in the new group in the Monitoring by Image Group window. • Modify Node invokes the modify node procedure for the selected node. This feature is equivalent to selecting the Modify Node option from the Node Management menu. This menu item is only active when one node is selected from the list. • Add Node invokes the add node procedure. This is equivalent to selecting the Add Node option from the Node Management menu.

4.3.1.8 Get node static info

NOTE: Collecting node static info is not enabled on nodes running Windows OS. (Windows is only available on specific Moonshot cartridges.) To collect static information such as system model, BIOS version, CPU model, speed, and memory size, right-click on a node in the node tree. From the contextual menu, select Update→Get Node Static Info. Upon completion, static info is available by clicking on the Details tab.

Figure 12 Node static info

56 Defining a cluster with Insight CMU NOTE: Node static information is also gathered when the Insight CMU Monitoring Client is installed.

4.3.1.9 Rescan MAC Use this command only if you must replace a failing node. This command enables retrieving the new MAC address of the node after node replacement. Right-click on a node in the node tree. From the contextual menu, select Update→Rescan MAC.

NOTE: The Rescan MAC option is only active when a single node is selected in the node tree.

Figure 13 Rescan MAC

4.3.2 Network group management A network group in Insight CMU corresponds to a single network switch that connects to a group of nodes. Large clusters have more than one network switch. The cloning process attempts to maximize the available network bandwidth within each of the Ethernet switches used in the cluster. To accomplish this, a unique network group must be created for each group of nodes connected to a single Ethernet switch used within the cluster.

IMPORTANT: Cloning will not work for nodes that do not belong to a network group. You can use the Network Group Management window to add and delete network groups. To perform tasks by using the Network Group Management option, click Cluster Administration and then select Network Group Management.

4.3 Cluster administration 57 4.3.2.1 Adding network groups

NOTE: The cloning process does not clone nodes that are not assigned to a network group.

Figure 14 Network group management

1. Specify the name of the network group to create. The length is limited to 32 characters. Each network group can contain up to 255 nodes.

NOTE: To minimize the cloning time, each network group must correspond to each Ethernet switch. A switch in the cluster must physically represent a network group and the associated nodes must be connected to that switch.

2. Select any number of nodes from the “Nodes not in any Network Group” option on the left and use the arrows to move the nodes to the “Nodes in Network Group” option on the right.

4.3.2.2 Deleting network groups To delete a network group, select it from the network groups list in the Network Group Management window, and then click Delete.

IMPORTANT: A network group cannot be recovered after it is deleted.

4.3.3 Uploading files to the Insight CMU server To upload files to the Insight CMU server, from the top menu bar: • Click Cluster Administration→Upload file(s) to /tmp. • A file browser window appears on the local machine running the GUI. Select one or more files or folders. • Click Upload file(s). • The selected files are uploaded to the CMU server, in the /tmp folder. A progress window appears:

58 Defining a cluster with Insight CMU • When the upload is complete, the progress window disappears and the following message is displayed in the bottom information panel: Uploading: - Operation completed

4.3 Cluster administration 59 5 Provisioning a cluster with Insight CMU 5.1 Image group management An image group in Insight CMU represents a disk image that has been captured (backed up). Each image group is associated with a single backup image. The image group must contain the nodes with valid hardware configurations that can be cloned with this image. The Image Group Management window is used to add, modify, delete, or rename image groups. After the Insight CMU rpm is initially installed, only the default image group exists. You cannot perform backup operations in the default image group. A new image group must be created. In the Image Group Management window: 1. Click Cluster Administration→Image Group Management→Create an Image Group.

Figure 15 Create an image group

The following window appears.

60 Provisioning a cluster with Insight CMU Figure 16 Add image group

5.1 Image group management 61 Figure 17 Edit backup disk name

2. Enter the group name. The associated backup disk device name is set to "sda" by default. If the OS disk on the backup node is recognized with a different name (for example "sdb" or "hda" or "cciss/c0d0"), click the Advanced button to edit the disk name. The disk name is only required for legacy "Partition Number" based backup methods.

IMPORTANT: For the latest Linux distros, the Smart Array disk name depends on the Smart Array controller versions. Disks configured using recent controllers are recognized as "sdX" instead of legacy naming like "cciss/cXdY". IMPORTANT: For Windows image groups (supported only on specific Moonshot cartridges), the default backup disk name need not be edited. Users are advised to use the "Root Partition UUID" based backup method. For more details, see “Backup using GUI” (page 77).

3. Click OK. 4. To add nodes to the image group, on the top bar click Cluster Administration→Image Group Management→Manage image group. The following window appears.

62 Provisioning a cluster with Insight CMU Figure 18 Image group management

5. Select any number of nodes from the Nodes in Cluster list on the left and use the arrows to move the nodes to the Nodes in the Image Group list on the right. The nodes appear in the Nodes in the Image Group with a "not active" notation. This indicates that the nodes have not yet been cloned, but are considered candidates for cloning. The notation will change to "active" after cloning is complete. 5.1.1 Deleting image groups To delete an image group and its associated backup, select the group from the list in the Image Group Management window and then click Delete.

NOTE: Even after deleting the image group, the image group contents under /opt/cmu/ image/ are preserved.

5.1.2 Renaming image groups To rename an image group, select the group from the list in the Image Group Management window and then click Rename. 5.2 Autoinstall Insight CMU provides automated compute node installation using the OS DVD images or repositories. The following distributions are supported: • RHEL 5, 6, 7 • SLES 11, 12 IMPORTANT: Although users can autoinstall SLES 12 with btrfs file system, Insight CMU v8.0 does not support the backup and cloning of any partitions formatted with the btrfs file system.

• Ubuntu 12.x, 13.x, 14.x • Windows 7 Enterprise 64-bit (on specific Moonshot cartridges only)

5.2 Autoinstall 63 • Windows 2012 Server Standard (on specific Moonshot cartridges only) • Windows 2012 R2 server Standard (on specific Moonshot cartridges only) 5.2.1 Autoinstall requirements • Autoinstall repository—The operating system distribution repository must be copied to the Insight CMU management node.

IMPORTANT: For Windows autoinstall only, Insight CMU uses Samba for exporting the repository. However, configuring Samba for Windows autoinstall is done automatically and does not require any intervention from Insight CMU users. Windows is available only on specific Moonshot cartridges.

• Autoinstall template file—The Insight CMU autoinstall utility requires an Insight CMU autoinstall template file. The layout of the autoinstall template file depends on the software being installed. ◦ Red Hat—A classic Red Hat kickstart file ◦ SLES—An autoyast xml file ◦ Debian—A preseed file ◦ Ubuntu—A preseed file ◦ Windows—A Windows unattended installation xml file IMPORTANT: To autoinstall Windows systems, the rpm cmu-windows-moonshot-addon-8.0-1.noarch must be installed. This rpm is available on the Insight CMU CD.

Examples of autoinstall template files are available in the /opt/cmu/templates/ autoinstall directory.

• Autoinstall image group—After the autoinstall repository and the autoinstall template file are available, you must create an autoinstall image group before autoinstalling a compute node. 5.2.2 Autoinstall templates Insight CMU provides the following autoinstall templates in the directory /opt/cmu/templates/ autoinstall:

NOTE: UEFI-enabled nodes require a separate FAT partition (/boot/efi) on the disk to boot into the OS. To autoinstall such nodes with RHEL/SLES distros, use the UEFI-specific autoinstall template files in the /opt/cmu/templates/autoinstall directory. Ubuntu autoinstall templates are the same for Legacy BIOS and UEFI modes. Table 2 Autoinstall templates

Autoinstall template name Description

autoinst_rh6_rh7.templ Kickstart template used to autoinstall RHEL 6.x and RHEL 7 distros on Legacy BIOS nodes

autoinst_rh6_rh7_uefi.templ Kickstart template used to autoinstall RHEL 6.x and RHEL 7 distros on UEFI-enabled nodes

autoinst_rh6_rh7_moonshot_m710.templ Kickstart template used to autoinstall RHEL 6.x and RHEL 7 on ProLiant m710 Moonshot server cartridges

64 Provisioning a cluster with Insight CMU Table 2 Autoinstall templates (continued)

Autoinstall template name Description autoinst_sles12.templ Autoyast template used to autoinstall SLES 12 on Legacy BIOS nodes autoinst_sles11.templ Autoyast template used to autoinstall SLES 11 distros on Legacy BIOS nodes autoinst_sles12_uefi.templ Autoyast template used to autoinstall SLES 12 on UEFI-enabled nodes autoinst_sles11_uefi.templ Autoyast template used to autoinstall SLES 11 SPx distros on UEFI-enabled nodes autoinst_ubuntu_cd.templ Preseed template used to autoinstall Ubuntu 12.x, 13.x, and 14.x distros on Legacy BIOS nodes and UEFI nodes using local (CD/DVD iso image) repository autoinst_ubuntu_mini_arm.templ Preseed template used to autoinstall Ubuntu 14.x distros on ProLiant m400 and m800 Moonshot server cartridges from internet-based repositories autoinst_ubuntu_mini_x86-64.templ Preseed template used to autoinstall Ubuntu 12.x, 13.x, and 14.x distros on x86_64 servers from internet-based repositories autoinst_windows.templ Unattended installation template used to autoinstall Windows distros on the supported Moonshot cartridges delivered with the rpm cmu-windows-moonshot-addon-8.0-1.noarch .

In the templates provided by Insight CMU, special CMU keywords are automatically substituted by the autoinstall process. All Insight CMU keywords begin with CMU_. Insight CMU locates the correct values for these variables and makes the substitutions. The following example is from the autoinst_rh6_rh7.templ template: nfs --server=CMU_CN_MGT_IP --dir=CMU_REPOSITORY_PATH will be substituted as: nfs --server=10.0.0.1 --dir=/data/repositories/rh6u4_x86_64 In the example above, Insight CMU substitutes the CMU_CN_MGT_IP value with the IP address of the management node. The CMU_REPOSITORY_PATH is replaced with the value provided when the autoinstall image group is created.

IMPORTANT: As of Insight CMU v7.3, the autoinstall template files contain new keywords not present in earlier versions. After upgrading from earlier versions of Insight CMU, the autoinstall process may fail for pre-existing autoinstall image groups. In such cases, create new autoinstall image groups using the appropriate autoinstall template files available in the /opt/cmu/ templates/autoinstall directory (or) update the autoinst.tmpl-orig file in the pre-existing autoinstall image group directory /opt/cmu/image/ /autoinst.tmpl-orig.

TIP: The Insight CMU autoinstall engine performs keyword substitutions. For more information, see the autoinstall section of /opt/cmu/etc/cmuserver.conf.

In addition, any customer provided template file is supported, provided that: • It is compatible with the software release being autoinstalled. • The NFS server and repository information is correctly configured.

5.2 Autoinstall 65 5.2.3 Autoinstall calling methods Autoinstall commands for creating an autoinstall image group and autoinstalling a compute node are available from the GUI window and from the CLI interface. You can choose to autoinstall any number of nodes registered in Insight CMU. The number of nodes that can be autoinstalled simultaneously is determined by the variable CMU_AUTOINST_PIPELINE_SIZE in /opt/cmu/etc/cmuserver.conf. The default value is 16. However, this value can be modified by the user. 5.2.4 Using autoinstall from GUI

5.2.4.1 Creating an autoinstall image group 1. Log in to administrator mode. 2. Select Cluster Administration→Image Group Management→Create an Auto Install Image Group from the top bar.

Figure 19 Image group management autoinstall

3. Enter the required information in the popup window: • Group name—The name of the autoinstall image group. This name becomes a directory in /opt/cmu/image. • Autoinstall repository—The directory path where you copied the software distribution. IMPORTANT: When creating a Windows image group, Insight CMU uses Samba for exporting the repository. However, this is done automatically and does not require any intervention from Insight CMU users. Exporting via NFS is unnecessary in this case. (Supported only on specific Moonshot cartridges.)

• Autoinstall template file—The path to a Red Hat kickstart file, SLES autoyast file, Ubuntu preseed file, or Windows unattended installation xml file. Information can be entered in the text box, or browsed by clicking on the right side of the text box. After the autoinstall image group is created, the Insight CMU image directory contains a new directory with the name of the image group. This directory contains: • autoinst.tmpl.orig—An exact copy of the autoinstall file • repository—A logical link to the autoinstall repository • README—More information on node-specific customization of autoinstall and pxeboot files For example: # ls /opt/cmu/image/rh6u4_autoinstall/ total 5 autoinst.tmpl-orig README repository -> /data/repositories/rh6u4_x86_64

66 Provisioning a cluster with Insight CMU Figure 20 New autoinstall image group

5.2.4.2 Registering compute nodes To enable autoinstall, a compute node must be registered in the autoinstall image group. Registration is the same as registering a normal Insight CMU image group.

5.2.4.3 Autoinstall compute nodes When you start autoinstall on a compute node, the following files are created: • autoinst.tmpl-cmu—A copy of your autoinstall file with additional directives required by Insight CMU • autoinst-[compute_node_hostname]—The autoinstall template with hard-coded node-specific information • pxelinux_template—The pxelinux boot parameter file template for this image group • pxelinux-[ compute_node_hostname]—The pxelinux boot parameter file for a specific node • uefilinux_template—The pxe boot parameter file template for UEFI-enabled nodes in this image group • uefilinux-[ compute_node_hostname]—The pxe boot parameter file for a UEFI-enabled node • grub.cfg-uefilinux_template—The pxe boot parameter file template for rhel7 on arm64 UEFI-enabled nodes in this image group • grub.cfg-uefilinux -[ compute_node_hostname]—The pxe boot parameter file for rhel7 on arm64 UEFI-enabled node For example: n12:~ # ls -l /opt/cmu/image/ai_rhel6u7/ total 44 -rw-r--r-- 1 root root 3496 Feb 29 07:37 autoinst-n17 -rw-r--r-- 1 root root 3654 Feb 29 07:37 autoinst.tmpl-cmu -rw-r--r-- 1 root root 2094 Feb 29 07:37 autoinst.tmpl-orig -rw-r--r-- 1 root root 1249 Feb 29 07:37 grub.cfg-uefilinux-n17 -rw-r--r-- 1 root root 1319 Feb 29 07:37 grub.cfg-uefilinux_template -rw-r--r-- 1 root root 53 Feb 29 07:45 n17.log -rw-r--r-- 1 root root 894 Feb 29 07:37 pxelinux-n17 -rw-r--r-- 1 root root 960 Feb 29 07:37 pxelinux_template

5.2 Autoinstall 67 -rw-r--r-- 1 root root 1275 Feb 29 07:30 README lrwxrwxrwx 1 root root 13 Feb 29 07:30 repository -> /mnt/rhel6u7/ -rw-r--r-- 1 root root 944 Feb 29 07:37 uefilinux-n17 -rw-r--r-- 1 root root 1010 Feb 29 07:37 uefilinux_template n12:~ #\ After creating the files previously described, Insight CMU network boots the requested compute nodes, and then autoinstall proceeds as with a normal Red Hat kickstart, SLES autoyast, Debian preseed operation, or an unattended Windows installation. During the operation, the autoinstall log is displayed on the terminal.

NOTE: Autoinstall also supports node specific custom pxefiles and custom autoinstall files in each autoinstall group. For example, to create a pxelinux file specific to node n1, create /opt/ cmu/image//pxelinux-n1.custom and customize that file. For more information on node-specific custom autoinstall and pxeboot files, see the /opt/ cmu/image//README file.

Figure 21 Autoinstall log

5.2.5 Using autoinstall from CLI

5.2.5.1 Registering an autoinstall image group To register an autoinstall image group with the CLI, run the cmucli utility, and then enter the image group name, the repository path, and the autoinstall file directory path: # /opt/cmu/cmucli cmu> add_ai_image_group rh5u5_autoinst "/data/repositories/rh5u5_x86_64" "/data/repositories/rh5_x86_64.cfg" repository registration tool registration in progress... --> creating image directory --> copying config file --> creating link to repository in CMU image directory --> exporting CMU image directory via NFS --> registering the cmu image in cmu.conf

==> registration finished ***

68 Provisioning a cluster with Insight CMU *** add nodes to this group *** before using cmu_autoinstall_node... *** press enter to exit

NOTE: The /opt/cmu/bin/cmu_add_image_group command can also be used to create a new autoinstall image group. For example:

# /opt/cmu/bin/cmu_add_image_group -n rh5u5_autoinst -d autoinstall -r /data/repositories/rh5u5_x86_64 -t /data/repositories/rh5_x86_64.cfg

5.2.5.2 Adding nodes to autoinstall image group To add nodes to the autoinstall image group enter the following command at the cmucli prompt: cmu> add_to_image_group node to image_group_name For example: cmu> add_to_image_group node1 to rh5u5_autoinst selected nodes: node1 processing 1 node ... cmu> Or: # /opt/cmu/bin/cmu_add_to_image_group_candidates -t rh5u5_autoinstall node1 node2 processing 2 nodes...

5.2.5.3 Autoinstall compute nodes To autoinstall a node, enter the following command at the cmucli prompt: cmu> autoinstall "image" node1 For example: cmu> autoinstall "rh6u4_autoinst" node1 or /opt/cmu/bin/cmu_autoinstall_node -l rh6u4_autoinst -f nodes.txt Where nodes.txt is the list of nodes to autoinstall. 5.2.6 Customization The configuration file /opt/cmu/etc/cmuserver.conf file includes an autoinstall section with two sets of variables: • Variables which affect the autoinstall process behavior: ◦ CMU_AUTOINST_INSTALL_TIMEOUT ◦ CMU_AUTOINST_PIPELINE_SIZE For example, use CMU_AUTOINST_INSTALL_TIMEOUT if autoinstall times out due to a long disk formatting time increase.

• Variables for keyword substitution into autoinstall templates: ◦ CMU_CN_OS_LANG ◦ CMU_CN_OS_TIMEZONE ◦ CMU_CN_OS_CRYPT_PWD For example, when setting the cmuserver.conf variable: CMU_CN_OS_LANG=en_US in the template file: /opt/cmu/templates/autoinstall/autoinstall_rh6_rh7.template

5.2 Autoinstall 69 the following line: lang CMU_CN_OS_LANG becomes: lang en_US The keyword CMU_CN_DEFAULT_GW in the autoinstall templates allows the user to specify a per-node default gateway value during autoinstall or cloning. The accepted values are default, cmumgt, or the actual IP address of the gateway. default The Insight CMU management node default gateway IP is used as the gateway for the autoinstalled node. cmumgt The Insight CMU management node admin IP is used as the gateway for the autoinstalled node. The default gateway IP of a node can be changed using the GUI. Cluster Administration→Node Management→Select a Node→Modify Node→Default Gateway IP Address For a full list of customizable parameters, see the autoinstall section in /opt/cmu/etc/ cmuserver.conf.

5.2.6.1 RHEL autoinstall customization for nodes configured with HPE Dynamic Smart Array RAID (B120i, B320i, B140) using a driver update diskette image To autoinstall RHEL 6.x or RHEL 7.x on nodes with HPE Dynamic Smart Array RAID configured, the following additional steps are required to pass the "hpvsa" or "hpdsa" driver update diskette image to kickstart environment. The following customizations are applicable to all the new autoinstall image groups that are subsequently created. If the customizations are required only for a specific autoinstall image group, create the autoinstall image group first and then customize the group-specific template files under the /opt/cmu/image// directory, as explained in some of the following steps. 1. Download the appropriate driver diskette image for the corresponding RHEL OS version from the Hewlett Packard Enterprise Drivers & Support web site. • The "hpvsa" driver update diskette is required for B120i and B320i controllers. For example, for RHEL 6.5 download hpvsa-.rhel6u5.x86_64.dd.gz. • The "hpdsa" driver update diskette is required for the B140i controller. For example, for RHEL 6.5 download hpdsa-.rhel6u5.x86_64.dd.gz. 2. Uncompress (gunzip) the driver diskette image and rename the extracted driver diskette image (.dd) with a .iso extension. For example, hpdsa-.rhel6u5.x86_64.iso. 3. Copy the renamed file to the autoinstall repository directory which contains the RHEL DVD ISO contents. This directory is automatically NFS exported by the autoinstall process. 4. Add a driverdisk line at the beginning of the RHEL autoinstall template file. This driverdisk line must point to the uncompressed driver update diskette iso file which was prepared in the previous step. For example: driverdisk --source=nfs:CMU_CN_MGT_IP:CMU_REPOSITORY_PATH/hpvsa-1.2.12-110.rhel6u6.x86_64.iso

NOTE: CMU_CN_MGT_IP and CMU_REPOSITORY_PATH are automatically substituted with correct values during autoinstall. Optionally, these values can be hardcoded in the autoinstall template file. NOTE: If the driver update diskette is only required for a specific autoinstall image group, create the autoinstall image group first and then add the driverdisk line to that group-specific template file /opt/cmu/image//autoinst.tmpl-orig.

70 Provisioning a cluster with Insight CMU 5. (This step is not required for B320i-based nodes.) For nodes with B120i- or B140i-based Dynamic Smart Array , the "ahci" driver conflicts with the "hpvsa" and "hpdsa" drivers. To avoid this conflict, modify the CMU_KS_KERNEL_PARMS in /opt/cmu/etc/ cmuserver.conf to explicitly blacklist the "ahci" driver during OS autoinstall on the target nodes. The blacklisting command may vary across different OS versions. For the OS version-specific kernel command line parameters for blacklisting the "ahci" driver, see the following table. CMU_KS_KERNEL_PARMS="lang=CMU_CN_OS_LANG devfs=nomount ramdisk_size=10240 console=CMU_CN_SERIAL_PORT ksdevice=CMU_CN_MAC_COLON initrd=autoinst-initrd-CMU_IMAGE_NAME blacklist=ahci"

Kernel boot param for Controller OS blacklisting ahci

B320i RHEL 6.x Not applicable

RHEL 7.x Not applicable

B120i, B140i RHEL 6.x blacklist=ahci

RHEL 7.x modprobe.blacklist=ahci

IMPORTANT: If a node is enabled with B120i or B140 based Dynamic Smart Array RAID mode, verify that the "hpvsa" or "hpdsa" driver diskette is inserted. You must explicitly blacklist the "ahci" module. Otherwise autoinstall may be successful, but the backup process may fail to get the image.

NOTE: If the blacklisting step is required only for a specific autoinstall image group, create that group using the GUI or CLI and add nodes to that group. Launch the autoinstall operation on one node and kill that process after a few minutes. This creates the PXE or UEFI boot templates under the /opt/cmu/image// directory. For nodes in Legacy BIOS mode, edit the pxelinux_template file. For nodes in UEFI mode, edit the uefilinux_template to add the appropriate "ahci" blacklisting kernel parameter. Node-specific PXE file customization is also possible. For more details, see “Autoinstall compute nodes” (page 67).

6. If a server contains extra Smart Array RAID disks (for example, P420i) in addition to the B120i-, B320i-, or B140i-driven Dynamic Smart Array RAID (for example, SL4540 has a B120i and one or more P420i controllers), then autoinstall may not work as expected due to those extra disks. To avoid this, also blacklist the "hpsa" driver by editing CMU_KS_KERNEL_PARMS in /opt/ cmu/etc/cmuserver.conf (as described in a previous step) to ensure that the OS or bootloader is always installed on the Dynamic Smart Array disks. For RHEL 6.x, the parameter is blacklist=hpsa and for RHEL 7 it is modprobe.blacklist=hpsa. Also add the following lines to the %post section of the RHEL autoinstall template to ensure that the extra Smart Array disks are re-enabled after the OS installation.

For RHEL 6.x sed -i 's/blacklist=hpsa//g' /boot/grub/grub.conf templates sed -i 's/blacklist=hpsa//g' /boot/efi/EFI/redhat/grub.conf sed -i 's/blacklist hpsa//g' /etc/modprobe.d/anaconda.conf

For RHEL 7.x sed -i 's/modprobe.blacklist=hpsa//g' /boot/grub2/grub.cfg templates sed -i 's/modprobe.blacklist=hpsa//g' /boot/efi/EFI/redhat/grub.cfg

5.2 Autoinstall 71 sed -i 's/blacklist hpsa//g' /etc/modprobe.d/anaconda-blacklist.conf

5.2.6.2 Autoinstall RHEL 6.x and 7.x on nodes having multiple disks or LUNs While autoinstalling nodes containing multiple disks or LUNs with RHEL 6.x or 7.x using the default templates provided by Insight CMU, the RHEL installer environment spreads the OS install across multiple disks or LUNs. As a result, the OS partitions such as /boot and / may spread across multiple disks or LUNs. Backup and cloning of such configurations is not supported. To confine the OS install to a single disk or LUN, pass the --ondisk= option to the partitioning commands in the /opt/cmu/image//autoinst.tmpl-orig file before starting the autoinstall operation. For example: #Disk partitioning information #USE THE APPROPRIATE DISK NAME part /boot --fstype ext4 --size 1000 --asprimary --ondisk=sda part swap --size 4096 --asprimary --ondisk=sda part / --fstype ext4 --size 1 --grow --asprimary --ondisk=sda

NOTE: The alternate autoinstall command ignoredisk --only-use=sda can be used instead of specifying the --on-disk=sda option for every partition command.

5.2.6.3 SLES autoinstall customization for nodes configured with HPE Dynamic Smart Array RAID (B120i, B320i, B140i) using a driver update diskette image To autoinstall SLES 11 or SLES 12 on nodes with Dynamic Smart Array RAID configured, the following additional steps are required to pass the "hpvsa" or "hpdsa" driver update diskette image to the AutoYast environment. The following customizations are applicable to all the new autoinstall image groups subsequently created. If the customizations are required only for a specific autoinstall image group, create the autoinstall image group first and then customize the group-specific template files under the /opt/cmu/ image// directory, as explained in some of the following individual steps. 1. Download the appropriate driver diskette image for the corresponding SLES OS version from Drivers & Support web site. • The "hpvsa" driver update diskette is required for B120i and B320i controllers. For example, for SLES 11 SP3, download hpvsa-.sles11sp3.x86_64.dd.gz. • The "hpdsa" driver update diskette is required for the B140i controller. For example, for SLES 11 SP3, download hpdsa-.sles11sp3.x86_64.dd.gz. 2. Uncompress (gunzip) the driver diskette image and copy that .dd image file to the autoinstall repository directory which contains the SLES DVD ISO contents. This directory is automatically NFS exported by the autoinstall process. 3. Modify CMU_AY_KERNEL_PARMS in /opt/cmu/etc/cmuserver.conf to append a Driver Update Disk dud parameter, which points to the .dd image file mentioned in the previous step. For example: CMU_AY_KERNEL_PARMS="autoyast=nfs://CMU_CN_MGT_IP/opt/cmu/image/CMU_IMAGE_NAME/autoinst-CMU_CN_HOSTNAME install=nfs://CMU_CN_MGT_IP//CMU_REPOSITORY_PATH initrd=autoinst-initrd-CMU_IMAGE_NAMsE netwait=20 dud=nfs://CMU_CN_MGT_IP//CMU_REPOSITORY_PATH/hpvsa-1.2.0-185.sles11sp3.x86_64.dd"

72 Provisioning a cluster with Insight CMU NOTE: CMU_CN_MGT_IP and CMU_REPOSITORY_PATH are automatically substituted with the appropriate values during the autoinstall process. Optionally, these values can be hardcoded in the template file. NOTE: If the Driver Update Disk dud is only required for a specific autoinstall image group, create that group using the GUI or CLI and add nodes to that group. Launch the autoinstall operation on one node and kill that process after a few minutes. This creates the PXE or UEFI boot templates under the /opt/cmu/image// directory. For nodes in Legacy BIOS mode, edit the pxelinux_template file. For nodes in UEFI mode, edit the uefilinux_template to add the appropriate "ahci" and "hpsa" blacklisting kernel parameters. Node-specific PXE file customization is also possible. For more details, see “Autoinstall compute nodes” (page 67).

4. (This step is not required for B320i-based nodes.) For nodes with B120i- or B140i-based Dynamic Smart Array RAIDs, the "ahci" driver conflicts with the "hpvsa" and "hpdsa" drivers. To avoid this conflict, append broken_modules=ahci to CMU_AY_KERNEL_PARMS in /opt/cmu/etc/cmuserver.conf. For example: CMU_AY_KERNEL_PARMS="autoyast=nfs://CMU_CN_MGT_IP/opt/cmu/image/CMU_IMAGE_NAME/autoinst-CMU_CN_HOSTNAME install=nfs://CMU_CN_MGT_IP//CMU_REPOSITORY_PATH initrd=autoinst-initrd-CMU_IMAGE_NAME netwait=20 dud=nfs://CMU_CN_MGT_IP//CMU_REPOSITORY_PATH/hpvsa-1.2.0-185.sles11sp3.x86_64.dd broken_modules=ahci" 5. If a server also has additional Dynamic Smart Array RAID disks (for example P420i) in addition to the B120i-, B320i-, or B140i-driven Dynamic Smart Array RAID, then also blacklist the "hpsa" driver in the autoyast installation environment to avoid potential disk selection conflicts. For example: CMU_AY_KERNEL_PARMS="autoyast=nfs://CMU_CN_MGT_IP/opt/cmu/image/CMU_IMAGE_NAME/autoinst-CMU_CN_HOSTNAME install=nfs://CMU_CN_MGT_IP//CMU_REPOSITORY_PATH initrd=autoinst-initrd-CMU_IMAGE_NAME netwait=20 dud=nfs://CMU_CN_MGT_IP//CMU_REPOSITORY_PATH/hpvsa-1.2.0-185.sles11sp3.x86_64.dd broken_modules=ahci,hpsa"

IMPORTANT: If a node is enabled with B120i or B140 based Dynamic Smart Array RAID mode, verify that the "hpvsa" or "hpdsa" driver diskette is inserted. You must explicitly blacklist the "ahci" module. Otherwise autoinstall may be successful, but the backup process may fail to get the image.

NOTE: If the blacklisting step is required only for a specific autoinstall image group, edit the appropriate PXE or UEFI boot templates under the /opt/cmu/image// directory. For nodes in Legacy BIOS mode, edit the pxelinux_template file. For nodes in UEFI mode, edit the uefilinux_template to add the appropriate "ahci" and "hpsa" blacklisting kernel parameters. Node-specific PXE file customization is also possible. For more details, see “Autoinstall compute nodes” (page 67).

NOTE: Autoinstall SLES 11 SP3 on Gen9 servers may also require a kISO in addition to the driver diskette image. For more details, see “Autoinstall SLES 11 SP3 on Gen9 servers and certain HPE Moonshot server cartridges requires special kISO images in addition to the DVD ISO” (page 73)

5.2.6.4 Autoinstall SLES 11 SP3 on Gen9 servers and certain HPE Moonshot server cartridges requires special kISO images in addition to the DVD ISO Certain Moonshot server cartidges and Gen9 servers require a special kISO image to autoinstall SLES 11 SP3.

5.2 Autoinstall 73 IMPORTANT: For details on the specific Moonshot server cartridges affected by this issue, see the latest version of the Insight CMU Release Notes available from the Insight CMU website: http://www.hpe.com/info/icmu. Under Related Links, click Technical Support / Manuals→Manuals. The kISO image contains updated kernel, initrd, and driver packages required by the latest hardware. 1. Download the appropriate SLES 11 SP3 kISO based on the server model. The SUSE driver update site is http://drivers.suse.com. 2. Mount the downloaded SLES 11 SP3 kISO: # mount -o loop .iso /mnt/kiso 3. Copy the kISO contents to another directory: # cp -r /mnt/kiso/* /media/KISO_REPO_DIR The KISO_REPO_DIR directory should be nfs exported. Manually add it to /etc/exports. 4. Mount the SLES 11 SP3 DVD media ISO and copy all the contents to another directory. # mount -o loop .iso /mnt/dvdiso # cp -r /mnt/dvdiso/* /media/SLES11SP3_AUTOINST_REPO_DIR/ Unmount the kISO and DVD ISO. # umount /mnt/kiso # umount /mnt/dvdiso 5. Copy initrd and kernel from the kISO repository to the SLES 11 SP3 autoinstall repository: # cp /media/KISO_REPO_DIR/boot/x86_64/loader/initrd /media/SLES11SP3_AUTOINST_REPO_DIR/boot/x86_64/loader/initrd # cp /media/KISO_REPO_DIR/boot/x86_64/loader/linux /media/SLES11SP3_AUTOINST_REPO_DIR/boot/x86_64/loader/linux 6. Modify the SLES 11 autoinstall template. Check the /opt/cmu/templates/autoinstall directory for default template files. Otherwise, if this kISO is required only for a specific autoinstall group, then edit the group-specific autoinstall template file /opt/cmu/image/ /autoinst.tmpl-orig. • Add the kISO repository as an add-on inside the tag to the autoinstall template: nfs://CMU_CN_MGT_IP/PATH_TO_KISO_REPO_DIR/ /

• In the autoinstall template, add the tag inside the tag to ignore any signature verification failures during autoinstall: true true true true true

74 Provisioning a cluster with Insight CMU • Add additional packages available in kISO to autoinstall template between tags. ◦ Required packages for ProLiant m300 cartridge: intel-igb intel-igb-kmp-default

◦ Required packages for UEFI-enabled Gen9 servers. (Add other packages based on requirements.) kernel-default hpsa

NOTE: For the ProLiant m710 Moonshot server cartridge, if the kISO contains any additional rpm packages, add those under the tag, based on the requirements.

NOTE: If the Gen9 servers are enabled with the B140i controller, you must add the "hpdsa" driver diskette along with kISO. For details, see “SLES autoinstall customization for nodes configured with HPE Dynamic Smart Array RAID (B120i, B320i, B140i) using a driver update diskette image” (page 72).

5.2.6.5 Disable consistent NIC device names during RHEL 7.x autoinstall To disable RHEL 7.x-specific consistent NIC device names ("eno1" or "ens1p1") during the autoinstall and fall back to legacy names ("eth0" or "eth1"): 1. Append the kernel command line parameter net.ifnames=0 to the CMU_KS_KERNEL_PARMS in the /opt/cmu/etc/cmuserver.conf file. For example: CMU_KS_KERNEL_PARMS="lang=CMU_CN_OS_LANG devfs=nomount ramdisk_size=10240 console=CMU_CN_SERIAL_PORT ksdevice=CMU_CN_MAC_COLON initrd=autoinst-initrd-CMU_IMAGE_NAME net.ifnames=0"

NOTE: Modifications made to /opt/cmu/etc/cmuserver.conf apply to all the RHEL autoinstall image groups created from that point forward. To disable consistent NIC names for a specific RHEL 7.x autoinstall group, edit the pxelinux_template, uefilinux_template and grub.cfg-uefilinux_template files under the /opt/cmu/image// directory and add net.ifnames=0 to the existing kernel parameters list.

2. Modify the autoinstall template to ensure that the net.ifnames=0 parameter is persistent during the subsequent disk boots. To the "bootloader" line, append --append='net.ifnames=0'. For example: #System bootloader configuration bootloader --location=mbr --append='net.ifnames=0'

5.2.6.6 Autoinstall the ARM-based HPE ProLiant m400 and m800 Moonshot servers with Ubuntu14.04 1. Download the Ubuntu14.04 installation files for ProLiant m400 and m800 Moonshot server cartridges: • For m400, install the uImage and uInitrd files from: http://ports.ubuntu.com/dists/trusty-updates/main/installer-arm64/current/images/ generic/netboot/xgene/ • For m800, install the vmlinuz and initrd.gz files from: http://ports.ubuntu.com/dists/trusty-updates/main/installer-armhf/current/images/ keystone/netboot/

5.2 Autoinstall 75 2. Create an Ubuntu14.04 autoinstall repository directory and copy the downloaded files to that directory. 3. Create an autoinstall image group using the Ubuntu14.04 autoinstall repository directory created in the previous step and the Ubuntu ARM autoinstall template (autoinst_ubuntu_mini_arm.templ) in the /opt/cmu/templates/autoinstall directory. 4. Add nodes to the autoinstall image group created in the previous step. 5. Start the autoinstall process.

NOTE: If the Insight CMU management node does not have any http_proxy defined, the autoinstall may fail to access the Ubuntu internet repositories. To fix this issue, define the CMU_WEB_PROXY in /opt/cmu/etc/cmuserver.conf.

5.2.7 Restrictions This implementation contains the following restrictions: • The repository must be on the local storage of the management node. • The repository will be automatically exported via NFS by the Insight CMU autoinstall process. Do not use HTTP or FTP.

IMPORTANT: For Windows autoinstall only, the repository is exported through Samba. However, this is automatically done by Insight CMU and does not require intervention by the user. (Supported only on specific Moonshot cartridges.)

• Updates must be applied through autoinstall post installation scripts. • Only qualified distributions and updates are supported by Hewlett Packard Enterprise. 5.3 Backing up a golden compute node The backup operation captures the entire operating system of a golden compute node and stores it in an image archive on the Insight CMU administration node. This image can be used to clone other nodes of the cluster. Each physical backup image is associated with an image group. This functionality is available only for the administrator. Prior to performing a backup, ensure that the backup golden node contains all the preferred services, such as NTP for synchronizing time across the cluster. Install any additional applications or libraries.

76 Provisioning a cluster with Insight CMU NOTE: Backup and cloning of the Logical Volume Manager (LVM) configuration on the system or OS disk is supported with some limitations: • OS install with LVM must be configured on a single physical disk/LUN and should not span across multiple physical disks/LUNs. Any additional LVM volumes created on extra disks/LUNs are ignored by the backup operation. • Backup and cloning of LVM snapshots, striped volumes and thin volumes are not supported. • Customized physical and logical volume extents are not supported. • LVM backup is only supported by the "Autodetect partitions" or "UUID-based backup" methods. • Before attempting to backup a SLES12 golden node with LVM configuration, a generic initrd file must be created first on the golden node. Otherwise, the nodes cloned with SLES12 images will fail to disk boot. For more details, see the Compute nodes cloned with SLES12 LVM images may fail to disk boot section under the Limitations and workarounds chapter in the Insight CMU v8.0 Release Notes. NOTE: Insight CMU does not support backup and cloning of btrfs file systems on compute nodes. Use alternatives such as ext3, ext4, xfs, and reiserfs file systems on the compute nodes. This limitation does not apply to the Insight CMU management node.

5.3.1 Backing up a disk from a compute node in an image group Insight CMU provides three backup methods: • Root partition number based backup—This method requires the root ('/') partition number of the golden node. • Root partition UUID based backup—This method requires the UUID of the root ('/') partition of the golden node. • Autodetect partitions based backup—This method works only on golden nodes running Linux and accessible via ssh.

NOTE: The Autodetect partitions and Root partition UUID backup methods improve the provisioning experience on nodes with multiple array controllers (SL4540) or running a Windows OS. (Windows is available only on specific Moonshot cartridges.)

5.3.1.1 Backup using GUI To perform a backup: 1. Expand the node list in the left frame. 2. Select a node. 3. Right-click the selected node. The contextual menu appears. 4. Select the backup option. 5. A window displays the existing image groups list and the available backup methods. Select the image group to associate with the backup image. The backup node must be a member of that image group.

5.3 Backing up a golden compute node 77 Figure 22 Backup dialog

6. Select one of the following backup methods: • Root partition number based backup Select the partition number on which the root ('/') directory exists. Insight CMU needs this to find the /etc/ file and grab other partitions on the OS disk.

IMPORTANT: When backing up a Windows golden node (supported only on specific Moonshot cartridges), the root partition is the partition containing the main Windows system folder, for example C:\Windows. Verify by running the diskpart command at the Windows command prompt. In the following example, the backup node has only one partition, so the main Windows partition number is “1”. c:\>diskpart Microsoft DiskPart version 6.1.7601 Copyright (C) 1999-2008 Microsoft Corporation. On computer: SMITH1 DISKPART> list disk Disk ### Status Size Free Dyn Gpt ------Disk 0 Online 465 GB 0 B DISKPART> select disk 0 Disk 0 is now the selected disk. DISKPART> list partition Partition ### Type Size Offset ------Partition 1 Primary 465 GB 1024 KB

78 Provisioning a cluster with Insight CMU Figure 23 Root partition number based back up

• Root partition UUID based backup Specify the UUID of the root partition. Based on the UUID of the Linux '/' partition or Windows system partition of the golden node, the backup automatically backs up all the partitions on the OS disk of that node. On Linux backup nodes, run the blkid command to obtain the UUID of the '/' partition.

IMPORTANT: On Windows backup node, run the fsutil fsinfo ntfsinfo C: command in the Administrator mode, to obtain the UUID of the main Windows partition C:. The UUID is displayed as NTFS Volume Serial Number in the command output. In the following example, the UUID of the main Windows partition is 724c38564c381777 (ignore prefix 0x). The UUID contains at least 16 hexa-decimal characters. C:\Windows\system32>fsutil fsinfo ntfsinfo C: NTFS Volume Serial Number : 0x724c38564c381777 Version : 3.1 Number Sectors : 0x000000000c53ff7f Total Clusters : 0x00000000018a7fef Free Clusters : 0x000000000090688b Total Reserved : 0x00000000000007f0 Bytes Per Sector : 512 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x000000001b0a8000 Mft Start Lcn : 0x0000000000600c8a Mft2 Start Lcn : 0x000000000000340e Mft Zone Start : 0x0000000000ff54a0 Mft Zone End : 0x00000000011ad280

5.3 Backing up a golden compute node 79 Figure 24 Root partition UUID based backup

• Autodetect partitions backup Insight CMU automatically backs up all the partitions on the OS disk of the node. Before starting the backup operation, the golden node must be available and running with a Linux OS and reachable via ssh. This is a Linux-only backup method and does not work with Windows OS.

Figure 25 Autodetect partitions backup

7. Click OK to launch the backup of the selected node with the chosen options.

80 Provisioning a cluster with Insight CMU IMPORTANT: Insight CMU only supports Ext2, Ext3, Ext4, FAT, XFS, and Reiserfs file systems on the Linux partitions for backup. IMPORTANT: If any of the OS partitions on the backup node have less than 50% of free space, warnings appear at the end of the backup operation. The appropriate steps required to fix the warnings are also displayed. For example: ***************************************************************** *** WARNING: only 48% free space left on partition *** cloning process -may- fail at UNCOMPRESSING step for partitions with *** less than 50% of free space - free some space in this partition and retry the backup process - (or) if all compute nodes have enough RAM, set CMU_CLONING_USE_TMPFS=yes in /opt/cmu/etc/cmuserver.conf before cloning operation - (or) use cmu_image_open/cmu_image_commit tools to free-up space under the appropriate partition mount point dir in the backup image refer /opt/cmu/image//fstab-device.txt for the mapping of partition names to mount points *****************************************************************

8. To start the backup process, click OK. If the node to be backed up is linked to a management card, you are prompted to enter the management card login and password. A window displays the backup status.

5.3 Backing up a golden compute node 81 Figure 26 Backup status

While the backup is processing, you cannot use the compute node but you can perform other tasks with the Insight CMU GUI interface. When the backup is successfully executed, a Backup Success message appears in the status window.

82 Provisioning a cluster with Insight CMU 5.3.1.2 Backup using CLI To back up a node from the Insight CMU command-line interface: 1. Enter: # /opt/cmu/cmucli

NOTE: For scripting the backup operation, use the /opt/cmu/bin/cmu_backup command directly instead of using cmucli backup commands.

2. Select one of the following backup methods: • To back up using the Root partition number based method, specify the root partition device name followed by other partition names on the OS disk of golden node. For UEFI-enabled nodes, also specify the partition name of /boot/efi mount. cmu> backup "group name" "root partition,other partitions,...." For example, if the node has root partition sda2 and boot partition sda1: cmu> backup "rhel6u5_x86_64" "sda2,sda1" n13

• To back up using the Autodetect partitions based method: cmu> backup "group name" For example: cmu> backup "rhel6u5_x86_64" n13

• To back up using the Root partition UUID based method: cmu> backup "group name" uuid "root partition uuid" For example: cmu> backup "rhel6u5_x86_64" uuid "3c070824-01b2-400c-adeb-1f554b1e51b6" n13 3. After a successful backup, the /opt/cmu/image/ directory on the management node contains the image backup files. For Root partition number based backup, the content of the image directory on the Insight CMU administration node is: # ls -1 /opt/cmu/image/rhel6u5_x86_64 diskconfig.txt disk-overrides.txt fstab-device.txt fstab-orig.txt header.txt partarchi-sda1.tar.bz2 partarchi-sda3.tar.bz2 parttbl-sda.raw parttbl-sda.txt pre_reconf.sh reconf.sh For Root partition UUID or Autodetect partitions based backup images, the content of the image directory is: # ls -1 /opt/cmu/image/rc2_rhel6u7_a diskconfig.txt disk-overrides.txt fstab-device.txt fstab-orig.txt header.txt partarchi-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1.tar.bz2 partarchi-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part2.tar.bz2 parttbl-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0.raw parttbl-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0.txt pre_reconf.sh reconf.sh Backup logs are available on the management node at: • /opt/cmu/log/cmudolly-.log

5.3 Backing up a golden compute node 83 • /opt/cmu/log/cmudolly--.log 5.4 Cloning

IMPORTANT: A backup image captured from a Legacy BIOS node cannot be used to clone UEFI-enabled compute nodes and vice versa.

IMPORTANT: During the cloning operation, Insight CMU v8.0 enforces strict matching of the system disk on the compute nodes to minimize the risk of inadvertent data loss. In strict mode, the clone operation expects that the golden compute node (backup node) and the nodes-to-be-cloned have exactly same hardware and disk or controller configuration. Otherwise, the cloning operation may fail to select a suitable disk on the compute nodes. For cloning heterogeneous nodes with the same image, set CMU_CLONE_DISK_SELECTION_MODE=FLEXIBLE in the /opt/cmu/etc/cmuserver.conf file. However, this could result in the incorrect disk being selected by the flexible disk selection mode, especially when cloning nodes with multiple disks or LUNs, resulting in data loss.

The Insight CMU cloning operation copies the complete contents of the golden image to other nodes. The copied image is the same except for the following changes: • Insight CMU updates the host name of the node. • Insight CMU updates the IP address of the network used for cloning. • Insight CMU updates the compute node default gateway as per the node Default gateway IP address value in the database. The accepted values are default or cmumgt or the IP address of the gateway. The default gateway IP of a node can be changed from GUI: Cluster Administration Menu→Node management→Select a node→Modify Node→Default gateway IP address All other configurations remain the same. Node-specific configuration changes can be made with the Insight CMU reconf.sh script. Before performing a cloning operation, you must satisfy the following prerequisites: • Create a valid image group. • Perform a backup into the image group. • The nodes to be cloned must belong to the image group. • The nodes to be cloned must belong to a network group. • The image group must have an image that is compatible with the node hardware. • Nodes must be ready to be powered on by the management card. 5.4.1 Performing the cloning operation using the GUI To perform the cloning operation using the GUI: 1. Select the compute nodes to be cloned from the left panel tree. 2. Right-click the selected nodes. 3. Select the image group associated with the correct backup image. 4. Select Start Cloning.

84 Provisioning a cluster with Insight CMU Figure 27 Cloning procedure

When cloning is in progress, the following terminal window is launched.

Figure 28 Cloning status

When cloning is complete, the final status of the cloning operation appears. The correctly cloned compute nodes appear in the chosen image group. The compute nodes that failed remain in the default image group. The cloning feature duplicates the software installation configuration from an installed Linux system to systems with similar hardware configurations. This function eliminates the time-consuming task of system installation and configuration for each node in the cluster. The cloning procedure has the following limitations: • The cloning procedure does not clone the Smart Array hard drive configuration. This type of cloning must be done manually before the cloning.

5.4 Cloning 85 • For nodes without iLO or iLOCM as BMC type, the cloning procedure cannot enable the PXE boot or Wake-On-LAN feature in BIOS settings. This configuration must be done by the system administrator during the hardware preparation phase. For more information about cloning mechanisms, see “Cloning mechanisms” (page 172). Use the following conditions to determine if cloning was successful: • Successfully cloned nodes are added to the image group containing the image. The remaining nodes are added to the default image group. • In the image group that contains the image, if the node name is suffixed with [non-active], then the cloning process failed on the node. If the node name is suffixed with [active], then the cloning process is successful on the node. • The list of successfully cloned nodes is also available in the /opt/cmu/log/ cmucerbere-.log file. 5.4.2 Performing the cloning operation using the CLI 1. Enter: # /opt/cmu/cmucli

NOTE: For scripting the clone operations, use the /opt/cmu/bin/cmu_clone command directly instead of using cmucli clone commands.

2. To clone a group of nodes with an image: cmu> clone "image" node_1 - node_n For example, to clone nodes node01 to node99: cmu> clone "rhel6u5_x86_64" node01 - node99 For more information, use the help clone command. 5.4.3 Cloning Windows images

IMPORTANT: Windows is only available on specific Moonshot server cartridges. For more information, see the latest version of the Insight CMU Release Notes available from the Insight CMU website: http://www.hpe.com/info/icmu. Under Related Links, click Technical Support / Manuals→Manuals. Observe the following for cloning Windows images: • Windows dynamic disks are not supported. Only Windows basic disks are supported. • Insight CMU can clone only one disk per compute node. • When multiple primary and logical partitions are present in a Windows backup image, drive letters (for example, D:, E:) assigned to primary partitions on the cloned nodes are not consistent with the golden node. For more information, see http://support.microsoft.com/ kb/93373. The following partitioning schemes are not affected by drive letter re-ordering: ◦ Only 1 primary partition (for OS) and multiple logical partitions in the golden image ◦ Only 4 primary partitions in the golden image

• The Local Administrator user account is reset on cloned nodes. Any content placed in the Administrator User desktop directory is lost after cloning. Other local user accounts are not affected. Always configure additional local user accounts with administrator privileges on the golden node before performing a backup.

86 Provisioning a cluster with Insight CMU NOTE: To set the Administrator user account password to use during cloning, modify the CMU_CN_WIN_PWD variable in /opt/cmu/etc/cmuserver.conf.

The default Insight CMU autoinstall template for golden nodes adds a “CMU” user account with administrator privileges.

• Cloned nodes reboot twice after first disk-boot for host specific customizations. • GPT partition tables are not supported. 5.4.4 Preconfiguration You can customize the actions to perform on each compute node before starting the cloning process. During cloning, after the node netboots and downloads the cloning image header files, an automatic preconfiguration script is launched on each node. This pre_reconf.sh script is unique for each image. When a new backup image is created, a default reconfiguration file is copied from /opt/cmu/ etc/pre_reconf.sh to /opt/cmu/image/myimage/pre_reconf.sh. You can customize this template by editing the /opt/cmu/image/myimage/pre_reconf.sh file. The script is executed before hard drive partitioning and formatting begins on each of the nodes. The pre_reconf.sh script can be used, for example, to flash the hard drive firmware and enable write cache with hdparm. The default content of pre_reconf.sh is: #!/bin/bash

#keep this version tag here CMU_PRE_RECONF_VERSION=1

#starting from cmu version 4.2 this script is dedicated to custom code #it is running at cloning time after netboot is done and before the #filesystems or even the partitioning is created.

exit 0

NOTE: The cloning process will abort, if /opt/cmu/image/myimage/pre_reconf.sh exits with error codes.

5.4.5 Reconfiguration During cloning, automatic reconfiguration is performed on each node. The first network interface on the node is reconfigured using the IP address and the subnet mask available in the Insight CMU database. If a node has other network interface cards to reconfigure, then these interfaces must be reconfigured with the shell script reconf.sh. This shell script is dedicated to user customization. To perform reconfiguration of network interfaces other than the first network interface, you must insert the appropriate instructions into the reconf.sh script. The reconf.sh script is unique for each image. When a new backup image is created, a default reconfiguration file is copied from /opt/cmu/etc/reconf.sh to /opt/cmu/image/myimage/ reconf.sh.

IMPORTANT: The script must end with a return code equal to 0, otherwise cloning fails. The reconf.sh script expects the following parameters: • CMU_RCFG_PATH contains the path where the "/" partition of the node being cloned is mounted on the network booted system during the cloning operation.

5.4 Cloning 87 • CMU_RCFG_HOSTNAME contains the host name of the node. • CMU_RCFG_DOMAIN contains the domain name of the node. • CMU_RCFG_IP contains the IP address of the node. • CMU_RCFG_NTMSK contains the netmask of the node.

IMPORTANT: “Execute” permission must be set for the reconf.sh file for users.

The default reconf.sh script is stored in the /opt/cmu/etc directory. If no reconf.sh script is associated with an image, then the script in the /opt/cmu/etc/reconf.sh file is copied into the image directory during backup. The default content of the reconf.sh file is: #!/bin/bash

#keep this version tag here CMU_RECONF_VERSION=1

# starting with cmu version 4.2 # this script is now dedicated to custom code and is invoked by: # # /opt/cmu/ntbt/rp/opt/cmu/tools/cmu_post_cloning # # all code below is therefore executed as the last step of the cloning process # into the netboot environnement. This will allow seamless upgrade to cmuv8.0+ # and will avoid support issues. # # environment variables available: # # CMU_RCFG_PATH = path where the root filesystem is currently mounted # CMU_RCFG_HOSTNAME = hostname of the compute node # CMU_RCFG_DOMAIN = dns domainname of the compute node # CMU_RCFG_IP = mgt network ip of this compute node # CMU_RCFG_NTMSK = net mask

exit 0

NOTE: The cloning process will abort, if /opt/cmu/image/myimage/reconf.sh exits with error codes. 5.5 Insight CMU image editor An existing Insight CMU backed up image can be modified directly on the Insight CMU management node, without making the modifications on a golden node and backing up the system. To edit the image, do the following: 1. Use the cmu_image_open command to expand the image. 2. Modify the image. 3. Use the cmu_image_commit command to save the image. 5.5.1 Expanding an image A Insight CMU cloning image is stored in /opt/cmu/image. The image is composed of several archives, one per partition. The cmu_image_open command analyzes the image directory content and expands all the archives into the image directory. Depending on the cloning image size, this script can take several minutes to complete. For example: # /opt/cmu/bin/cmu_image_open -i rh5u4_x86_64 image untared into

88 Provisioning a cluster with Insight CMU NOTE: The Insight CMU image editor commands attempt to preserve the ACL and XATTR attributes of the files, while extracting golden image contents. If the mount-point containing the /opt/cmu/image directory does not support ACL or XATTR attributes, then image opening results in the following warning messages:

/opt/cmu/bin/tar: acl_set_file_at: Cannot set POSIX ACLs for file '': Operation not supported /opt/cmu/bin/tar: acl_delete_def_file_at: Cannot drop default POSIX ACLs for file '': Operation not supported To prevent such messages, verify the relevant /opt mount point is mounted with the appropriate options to support ACL and XATTRr. For example, use the following command on ext4 file systems to enable those mount-time options by default: # tune2fs -o user_xattr,acl /dev/sda1 An alternative solution is to specify the acl and user_xattr options for the relevant mount point in the /etc/fstab file.

After editing the image, commit changes: # /opt/cmu/bin/cmu_image_commit -i rh5u4_x86_64 After this command is complete, the subdirectory image_mountpoint in the image directory contains the expanded image: # ls /opt/cmu/image/rh5u4_x86_64/image_mountpoint/ .autorelabel bin data etc lib lost+found misc opt proc sbin srv tftpboot usr .open_image_finished boot dev home lib64 media mnt poweroff root selinux sys tmp var 5.5.2 Modifying an image Modifications can consist of simple manual commands such as adding, removing, or modifying files. However, complex operations using chroot commands on the expanded image directory are also possible, such as installing a new rpm.

5.5 Insight CMU image editor 89 IMPORTANT: When using chroot, Hewlett Packard Enterprise recommends performing chroot mount /proc or chroot mount /sys in the image directory before executing other chroot commands. For example:

# chroot /opt/cmu/image/rh5u4_x86_64/image_mountpoint/ mount /sys # chroot /opt/cmu/image/rh5u4_x86_64/image_mountpoint/ mount /proc # cp /data/repositories/rh5u4_x86_64/Server/dhcp-3.0.5-21.el5.x86_64.rpm /opt/cmu/image/rh5u4_x86_64/image_mountpoint/tmp # chroot /opt/cmu/image/rh5u4_x86_64/image_mountpoint rpm -ivh /tmp/dhcp-3.0.5-21.el5.x86_64.rpm warning: /tmp/dhcp-3.0.5-21.el5.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 37017186 Preparing... ########################################### [100%] 1:dhcp ########################################### [100%] # chroot /opt/cmu/image/rh5u4_x86_64/image_mountpoint/ umount /sys # chroot /opt/cmu/image/rh5u4_x86_64/image_mountpoint/ umount /proc

IMPORTANT: When using chroot, some commands can alter the management node system: • mount -a—Mounts the partitions of the management node in the image. Any further modifications to the subdirectories modifies the management node tree and not the cloning image. • grub-install—Replaces the boot loader of the management node.

5.5.3 Saving a modified cloning image After modifications are complete, run the cmu_image_commit command to save the content of an expanded image into an Insight CMU cloning image. Depending on the cloning image size, this script can take several minutes to complete. The modified image can either replace the image itself or be saved as a new cloning image. To update cloning image rh5u4_x86_64 with the modifications: # /opt/cmu/bin/cmu_image_commit -i rh5u4_x86_64 The original archive files are renamed in the image directory and the new image content is compacted into one archive file per partition. The new image content replaces each original archive file. To save the modifications to cloning image rh5u4_x86_64 to a new cloning image rh5u4_mod: # /opt/cmu/bin/cmu_image_commit -i rh5u4_x86_64 -n rh5u4_mod

NOTE: The -n option registers a new image group to the Insight CMU database. No nodes are added into this new group. 5.6 Insight CMU diskless environments 5.6.1 Overview Insight CMU provides two methods of provisioning a diskless OS to a set of compute nodes: • An NFS-root file system method utilizing the open-source oneSIS software package • An in-memory file system method called diskless CMURAM Both methods extract a golden file system image from a server running the OS from a disk and convert the image into a diskless image. The diskless image is then deployed to a client compute

90 Provisioning a cluster with Insight CMU node. The client compute node may or may not contain a disk. Neither of these diskless methods requires a disk in the client compute node. The oneSIS NFS-root diskless method exports this image in read-only mode from a central NFS server to a set of diskless client compute nodes.

Figure 29 OneSIS Diskless compute nodes read from an NFS server

Since this NFS-root file system is shared among multiple diskless clients, the files and directories that are written must be pre-configured before the diskless image is created. Insight CMU maintains a standard list of the common writable files and directories for a Linux OS file system, and additional files and directories can be added to this list on a per-image basis as needed. When the diskless image is created, these writable files and directories are renamed and converted into soft-links that redirect their location to a master /ram directory. When a diskless client compute node boots up the oneSIS diskless image, the oneSIS software configures a small, read-write tmpfs /ram directory in memory and copies all of the preconfigured writable files and directories to this location. This is where all file system write activity occurs.

5.6 Insight CMU diskless environments 91 Figure 30 OneSIS diskless compute nodes write to memory

Since the diskless node write activity occurs in memory, this written data is lost when the node is powered off or rebooted. To preserve any write activity, the user can configure a separate read-write NFS location that can be mounted during boot-up. For more information about the oneSIS NFS-root diskless method, see “The Insight CMU oneSIS diskless method” (page 98). The diskless CMURAM method deploys the entire diskless OS file system image into a tmpfs in-memory file system on each diskless client compute node. The size of the tmpfs file system is 4096MB by default, and is configurable for each diskless OS image that is created. Based on testing by the Insight CMU team, this default size is sufficient for a standard Linux OS file system image (for example, a basic RHEL 6.5 kickstart install image is ~1GB in size). Users must ensure that this setting is sufficient for their diskless OS image. Users are encouraged to optimize the diskless CMURAM image by deleting unnecessary files from the diskless image. This can be done manually before deploying the diskless image to the diskless client compute nodes, or it can be scripted to occur whenever a diskless file system image is created. For more information about the diskless CMURAM method, see “The CMURAM diskless method” (page 93). There is no need to preconfigure the writable files and directories when using the diskless CMURAM method because all files are read-writable (similar to a disk-based file system). But like the oneSIS NFS-root method, all write activity occurs in memory and is lost when the node is powered off or rebooted. If any write activity needs to be preserved, users can configure a separate read-write NFS location which can be mounted during boot-up. Typically for any diskless cluster, the only write activity that should be preserved is the syslog activity, which can be configured using the rsyslogd package on each diskless client compute node to be forwarded to a central log server. Each diskless method has its advantages. When deciding which diskless method to employ, consider the following factors: • Boot times: The diskless CMURAM method takes longer to boot up than the oneSIS NFS-root method. This is because each diskless CMURAM client compute node needs to download and uncompress the entire diskless CMURAM file system image into memory. By contrast, the oneSIS NFS-root method just needs to download the and initrd during boot-up, and then any files that are needed during startup are retrieved from the NFS server.

92 Provisioning a cluster with Insight CMU • Memory utilization: The oneSIS NFS-root method uses much less memory in each diskless client compute node than the diskless CMURAM method. • Configuration tasks: The oneSIS NFS-root method may require several image builds and boot-ups to identify and correctly configure all of the writable files and directories that are required by the diskless OS image and its software. The diskless CMURAM method is easier to administer because the image behaves more like a disk-based file system. The only concerns with the diskless CMURAM method are the total size of the diskless OS file system and the memory usage. • Network requirements: Both diskless methods require a network with good bandwidth between the server and client nodes for booting. Once booted, the oneSIS NFS-root diskless client compute nodes continue to rely on good network bandwidth to read files from their NFS server. • Hardware at scale: The oneSIS NFS-root method requires additional NFS servers when scaling beyond 200 diskless client compute nodes. Depending on any boot time requirements, the diskless CMURAM method can benefit from additional boot servers.

5.6.1.1 Enabling diskless support in Insight CMU Regardless of which diskless method you use, you must enable diskless support in Insight CMU first. 1. Edit /opt/cmu/etc/cmuserver.conf to activate CMU_DISKLESS: #cmu diskless feature true/false CMU_DISKLESS=true 2. Save and exit the file. 3. Restart the Insight CMU server: # /etc/init.d/cmu restart Also restart the Insight CMU GUI.

5.6.2 The CMURAM diskless method The default Insight CMU diskless solution is the diskless CMURAM method. This diskless CMURAM method loads the configured diskless image into a tmpfs memory file system when the diskless compute node is booted. The default size of the tmpfs file system is 4096M and is configured using the CMURAM parameter setting in the PXE-boot template file /opt/cmu/image/ {image group name}/cmuram_pxeboot_template when the diskless image is first created. This setting can be changed as needed. A diskless CMURAM image is created on the Insight CMU management server when a diskless CMURAM image group is created in Insight CMU. Insight CMU creates the diskless CMURAM image by copying (via rsync) the root file system of the given golden node into a temporary directory. The copied image is cleaned up and configured as a diskless image and details are displayed on the screen and logged to a file in the image group image directory. A custom initrd is created and copied into the /opt/cmu/image/{image group name}/cmuram_pxeboot/ directory, along with the given kernel. Then the reconf-cmuram-image.sh script is run to apply any user-scripted customizations to the image, such as adding NFS file system mountpoints to the new diskless /etc/fstab file or deleting all unneeded files. Finally the image is compressed and ready for deployment, and the new image group is created in Insight CMU. If an error occurs during the diskless image creation process, then the diskless image group is not created. When a compute node is added to a diskless CMURAM image group, a custom PXE-boot file for that node is created in the /opt/cmu/image/{image group name}/cmuram_pxeboot/ directory.

5.6 Insight CMU diskless environments 93 When a compute node is booted into the diskless CMURAM image group, a DHCP entry is created that directs the target compute node to parse the PXE-boot file in /opt/cmu/image/ {image group name}/cmuram_pxeboot/, which boots the kernel with the diskless initrd. The diskless CMURAM init script within the custom-built initrd creates the tmpfs file system, downloads the compressed image file system via tftp, and unpacks it into the tmpfs file system. At this point, the reconf-cmuram-snapshot.sh script is run to apply any per-node customizations to the image. Then the root file system is switched from the initrd to the tmpfs, and the normal initialization and startup process within the diskless image file system is invoked.

5.6.2.1 Operating systems supported Insight CMU has qualified the CMURAM diskless support on x86_64 servers with RHEL 6, RHEL 7, SLES 11, SLES 12, Ubuntu 12.04, and Ubuntu 14.04. Ubuntu 14.04 is qualified on ARM7 and AArch64 hardware.

5.6.2.2 Enabling CMURAM support To ensure that the diskless CMURAM support is enabled in Insight CMU: 1. Edit the /opt/cmu/etc/cmuserver.conf file to confirm, or add CMURAM to the list of valid diskless toolkits. CMU_VALID_DISKLESS_TOOLKITS=cmuram:oneSIS 2. Verify that CMU_DISKLESS=true in the cmuserver.conf file. 3. Save and exit the cmuserver.conf file. 4. Restart the Insight CMU GUI.

5.6.2.3 Preparing the Insight CMU management node 1. Confirm that the diskless compute nodes are properly configured in the Insight CMU database. 2. If the number of diskless compute nodes is greater than 100, then you may want to either boot them in groups or configure additional tftp servers to assist with deploying the diskless image. For more information on scaling out a diskless CMURAM cluster, see “Scaling out a diskless CMURAM compute node cluster” (page 97).

5.6.2.4 Preparing the golden node Ensure that the following packages are installed on the golden node: • busybox (RHEL and SLES) / busybox-static (Ubuntu) • dhclient (RHEL) / dhcp-client (SLES) / isc-dhcp-client (Ubuntu)

5.6.2.5 Capturing and customizing a diskless CMURAM image To capture a diskless CMURAM image, ensure that the diskless CMURAM support is enabled in Insight CMU and that the golden node is running and has the required software packages installed.

Creating a new image group in CMU 1. In the Insight CMU GUI New Image Group window, enter the name of the Image Group to represent this new diskless image. 2. Select the diskless check box. 3. In the Diskless toolkit drop-down box, select cmuram. 4. Enter the golden node hostname or IP address and click Get Kernel List. Select the kernel to boot diskless. 5. Click OK to begin building the diskless CMURAM image.

94 Provisioning a cluster with Insight CMU Figure 31 CMURAM new image group

To create a diskless CMURAM image using the /opt/cmu/bin/cmu_add_image_group command, provide the following required options: -n -d diskless-cmuram -k |CURRENT ("CURRENT" means use the currently running kernel) -I A compressed copy of the diskless CMURAM file system image is created in a file named /opt/ cmu/image/{image group name}/rootfs.tar.bz2. This is the file that is downloaded and uncompressed when a diskless client is booted with this image. You can make changes to this image. For example, you can delete content such as manpages and other documentation to save memory space, or edit the fstab file to add external file system mounting instructions. To unpack the image, run the following command: # /opt/cmu/bin/cmu_image_open –i {image group name} This unpacks the diskless CMURAM image into the /opt/cmu/image/{image group name}/ image_mountpoint/ directory. After your changes to the diskless file system are complete, you must run the following command to produce a new compressed copy of the updated image: # /opt/cmu/bin/cmu_image_commit -i {image group name}

5.6 Insight CMU diskless environments 95 The compressed copy of the file system image is uploaded and installed when a compute node is booted with the diskless CMURAM image. Ideally, these changes should be scripted in the reconf-cmuram-image.sh file, where unpacking and repacking the image is unnecessary. This script is run after the diskless image is created and before the diskless image is compressed. The master copy of this file resides in /opt/cmu/etc/. If it does not exist in this location, then it is copied from the template file in /opt/cmu/diskless/cmuram/ the first time a diskless CMURAM image group is created. When you create a diskless CMURAM image group, the /opt/cmu/etc/ reconf-cmuram-image.sh file is copied into the /opt/cmu/image/{image group name}/ directory and executed at the end of the diskless image build process. Thus, if you have diskless image changes that apply to all diskless images, then they should go in the /opt/cmu/etc/ reconf-cmuram-image.sh file. If you have diskless image changes that apply to a specific diskless CMURAM image, then you can create the /opt/cmu/image/{image group name}/ reconf-cmuram-image.sh file before you create the diskless CMURAM image group, and these changes are applied when the image is created. Changes specific to each node, such as configuring additional networks, should be scripted in the reconf-cmuram-snapshot.sh file. Like the reconf-cmuram-image.sh file, the global copy of reconf-cmuram-snapshot.sh resides in /opt/cmu/etc/, and this global copy is copied into the /opt/cmu/image/{image group name}/ directory when a diskless CMURAM image group is created. The reconf-cmuram-snapshot.sh file is invoked each time a compute node is booted into the diskless CMURAM image group. It is invoked just before the diskless file system initializations take place and can be scripted to customize the diskless image for the given compute node. Keep in mind that the reconf-cmuram-snapshot.sh script is run within the initrd environment, which is limited to the basic scripting tools. If you need to make changes that require the diskless image environment, you can script those in the diskless image rc.local file or in another startup script.

5.6.2.6 Adding nodes to the CMU diskless image group When your diskless CMURAM image is ready to deploy, then you can add nodes to the diskless CMURAM image group. This action adds the nodes as "inactive" members of the diskless CMURAM image group in the Insight CMU database.

NOTE: Nodes are only "active" after they are cloned or booted into the image group. Adding nodes to the diskless CMURAM image group also creates the appropriate PXE-boot files in /opt/cmu/image/{image group name}/cmuram_pxeboot/pxelinux.cfg/, based on the /opt/cmu/image/{image group name}/cmuram_pxeboot_template (Legacy BIOS) and /opt/cmu/image/{image group name}/cmuram_efiboot_template (UEFI) files, specifically tailored for each given compute node.

5.6.2.7 Booting nodes into a diskless CMURAM image group When the diskless CMURAM compressed image is ready to deploy to a set of nodes that are already added to the diskless CMURAM image group, then you can boot those nodes. 1. In the Insight CMU GUI, select the set of compute nodes to boot from the left pane and right-click on this selection to bring up the remote management menu. 2. Select the Boot option. 3. In the Boot window, click network and select the Insight CMU diskless image group. 4. To boot the nodes, click OK. When the diskless CMURAM compressed image is ready to be deployed to a set of nodes already added to the diskless CMURAM image group, then you can boot those nodes.

96 Provisioning a cluster with Insight CMU 1. In the Insight CMU GUI, select the set of compute nodes to boot from the left pane and right-click on this selection to bring up the remote management menu. 2. Select the Boot option. 3. In the Boot window, click network and select the Insight CMU diskless image group.

Figure 32 CMURAM boot

To boot a selection of compute nodes into a diskless CMURAM image group using the /opt/cmu/bin/cmu_boot command, specify the diskless CMURAM image group using the -d option. Hewlett Packard Enterprise recommends monitoring the console output of a node when that node is booting a new diskless CMURAM image for the first time. This is the best method to watch for any errors and to ensure that your diskless image is configured and booting up as expected. 5.6.3 Scaling out a diskless CMURAM compute node cluster When a compute node is booted into a diskless CMURAM diskless image group, the compressed diskless image is downloaded via tftp. Booting a large number of nodes at the same time can stress the bandwidth capability of the management network and the tftp server. By default, the Insight CMU management server is the tftp server, but for scalability you can configure multiple tftp servers to serve the diskless image and help distribute that load.

5.6.3.1 Configuring multiple boot servers The /opt/cmu/etc/cmu_diskless_boot_servers file allows you to configure additional boot servers and assign compute nodes to each boot server to distribute the load. The syntax of the file is: The space-separated list of nodes can consist of node names and/or CMU Network Group names. Ideally, the /opt/cmu/etc/cmu_diskless_boot_servers file should be configured before the diskless CMURAM image group is created because the file is parsed and the diskless image is copied to all of the configured boot servers. However, you can run the /opt/cmu/diskless/cmuram/cmu_cmuram_zip_image –l command to distribute the diskless image to the configured boot servers after the image is configured. The /opt/cmu/etc/cmu_diskless_boot_servers file must be configured before any nodes are added to the CMURAM image group because that is when the nodes are configured with their boot server. If nodes are added before the /opt/cmu/etc/cmu_diskless_boot_servers file is configured, you can remove the nodes from the group and re-add them to update their boot server configuration.

5.6.3.2 Preparing the boot servers The boot servers must be running Linux and passwordless ssh must be configured for the root user from the Insight CMU management server to each boot server. Hewlett Packard Enterprise

5.6 Insight CMU diskless environments 97 recommends configuring one boot server and then backing it up and cloning that image to the other boot servers. Each boot server must have the tftp server package installed and configured the same way it is configured for the Insight CMU management server. This means that in the /etc/xinetd.d/ tftp file, the disable setting is set to no and the server args setting includes the /opt/ cmu/ntbt/tftp location. Restart the xinetd service after these tftp settings are configured. 5.6.4 The Insight CMU oneSIS diskless method The Insight CMU oneSIS diskless method is based on the open-source oneSIS software available at http://onesis.org. The primary difference between the oneSIS implementation documented on the website and the oneSIS implementation included with Insight CMU is that the Insight CMU implementation does not require you to rebuild your kernel with NFS support. Instead Insight CMU allows you use the existing kernel from the golden node, and it rebuilds an initrd image containing the appropriate driver support plus NFS support for mounting the read-only root file system. Everything else is the same. The Insight CMU support for oneSIS includes scripts that adapt the oneSIS process to the Insight CMU diskless process. When you create a oneSIS diskless image group in Insight CMU, the oneSIS copy-rootfs command is run to copy the golden node image to the Insight CMU management node. Insight CMU also configures the writable files and directories specified in the given files and files.custom files in the oneSIS sysimage.conf file in the golden image, and the oneSIS mk-sysimage command is run to update the golden image. Insight CMU also prepares the golden image for diskless operation by cleaning up log directories and configuring an appropriate diskless fstab file.

5.6.4.1 Operating systems supported Insight CMU has qualified the oneSIS diskless support with RHEL 6 and 7, and SLES 11 and 12 on x86 64-bit servers. Hewlett Packard Enterprise recommends using oneSIS 2.0.4-1, which is required with RHEL 6.5.

5.6.4.2 Enabling oneSIS support To enable oneSIS diskless support in Insight CMU: 1. Edit /opt/cmu/etc/cmuserver.conf to add oneSIS to the list of valid diskless toolkits. CMU_VALID_DISKLESS_TOOLKITS=oneSIS 2. Verify that CMU_DISKLESS=true in cmuserver.conf. 3. Save and exit the cmuserver.conf file. 4. Restart the Insight CMU GUI.

5.6.4.3 Preparing the Insight CMU management node 1. Install the oneSIS rpm on the Insight CMU management node. The oneSIS rpm qualified with Insight CMU is available on the Insight CMU ISO image in the Tools/oneSIS/ directory. 2. Verify that the diskless compute nodes are properly configured in the Insight CMU database. 3. If the number of diskless compute nodes is greater than 250, then you must configure additional NFS servers before proceeding. For more information, see Scaling out diskless NFS-root cluster with multiple NFS servers (page 103).

5.6.4.4 Preparing the golden node 1. Install the following prerequisites on the golden node:

98 Provisioning a cluster with Insight CMU NOTE: Package names may vary depending on the OS distribution.

• busybox (RHEL and SLES) • dhclient (RHEL) / dhcp-client (SLES) • bind-utils (RHEL) 2. Install the oneSIS rpm on the golden node. Install the same oneSIS rpm that was installed on the Insight CMU management node. 3. Configure the DISTRO setting for oneSIS. The /etc/sysimage.conf file is present after the oneSIS rpm is installed on the golden node. This is the main oneSIS configuration file for this image. For now, the only configuration setting that must be made is the DISTRO setting. This setting indicates which Linux distribution is running on this node, so that the oneSIS golden image capture process can apply the appropriate oneSIS Linux distribution patch. OneSIS comes with distribution-specific patches for many Linux distributions. This patch converts a disk-based golden image into a oneSIS diskless image. The oneSIS Linux distribution patches are in /usr/share/oneSIS/distro-patches/. The Insight CMU patches for oneSIS are in /opt/cmu/diskless/oneSIS/ and file names end in .patch. The Insight CMU patches are copied to /usr/share/oneSIS/ distro-patches/ when the first oneSIS diskless image is created. Find the patch that matches the Linux distribution on the golden node, and configure it as the DISTRO setting in /etc/sysimage.conf. In some cases, you can use a patch that is close to a match. For example, if your golden node Linux distribution is Centos 6.3, you can configure DISTRO: Redhat EL-6.2 in /etc/sysimage.conf. Verify that the syntax of the Linux distribution matches the name of the corresponding oneSIS Linux distribution patch.

5.6.4.5 Capturing and customizing a oneSIS diskless image After the above preparations are complete and the software on the golden node is ready to be deployed in a diskless image, capture a oneSIS diskless golden image. If this is your first time building a diskless image with Insight CMU, Hewlett Packard Enterprise recommends creating the diskless image group. Later, you might want to script some customizations to the image that will require re-creating the image. This is supported by Insight CMU. You can create the image, add scripted changes to the image creation process, and then delete the image and recreate the image to confirm that your scripted changes work. When you create the diskless image group, the oneSIS image creation process is initiated. • The golden image is copied from the golden node to the /opt/cmu/image/ /onesis directory on the Insight CMU management node. • The initial ramdisk image is created for diskless booting. • All writable files and directories in the files and files.custom files are properly configured. • The diskless fstab file is installed in the golden image. 1. Create the Insight CMU diskless image group directory in /opt/cmu/image/. 2. Copy the /opt/cmu/diskless/oneSIS/reconf-onesis-image.sh file into the /opt/ cmu/image/ directory. 3. Add any image customizations to that script before you create the group. 4. Copy the /opt/cmu/diskless/oneSIS/files.custom file to the image group directory. 5. Add any additional files and directories to be configured as writable in the golden image. 6. Create your image.

5.6 Insight CMU diskless environments 99 To perform any user-defined image customizations, run the /opt/cmu/image//reconf-onesis-image.sh.

NOTE: If this is the first time that this image group image directory is created, then this script is installed as a blank template file. For more information on customizing the diskless golden image, see “Customizing an Insight CMU oneSIS diskless image” (page 101).

5.6.4.5.1 Creating an Insight CMU oneSIS diskless image To create a oneSIS diskless image group using the Insight CMU GUI: 1. Log in with Administrator privileges. 2. Select Cluster Administration→Image Group Management→Create an Image Group.

3. In the New Image Group window, enter a name for this diskless image group. 4. Select the diskless check box.

NOTE: If the diskless box is not available, then add CMU_DISKLESS=true to /opt/ cmu/etc/cmuserver.conf and restart the GUI.

5. In the Diskless toolkit drop-down box, select oneSIS.

100 Provisioning a cluster with Insight CMU NOTE: If oneSIS is not available, then add oneSIS to the CMU_VALID_DISKLESS_TOOLKITS variable in /opt/cmu/etc/cmuserver.conf and restart the GUI.

6. Enter the Golden node host name. 7. Click Get Kernel List to retrieve the list of existing kernels on the golden node. 8. Select the preferred kernel version string for PXE-booting the diskless image. 9. Click OK. A terminal window displays the status of Insight CMU extracting the golden image from the golden node and preparing the image for diskless bootup. To create the same oneSIS diskless image group using the Insight CMU command line: # /opt/cmu/bin/cmu_add_image_group –n -d diskless-oneSIS –k -I The -k option parameter must be the kernel version of the kernel that exists on the golden node and the one you want to boot diskless (for example, the output of uname –r on the golden node). If you want to select the current running kernel as the kernel to boot diskless, then provide the CURRENT keyword (for example, –k CURRENT ). If any errors occur during the creation of the golden image, the image group will not be created in Insight CMU. Correct the errors and recreate the diskless image group.

5.6.4.5.2 Customizing an Insight CMU oneSIS diskless image A Insight CMU oneSIS diskless image can be managed in two ways: manually and automatically. The system administrator can manually make changes to the golden image. This is a quick way to prepare a diskless image for deployment. The downside is that these changes are not preserved. The other way to manage an Insight CMU oneSIS diskless image is to script changes to occur automatically when the image is created. Scripting these changes allows the system administrator to update the software on the golden node and extract new diskless golden images without manually repeating the previous customizations. The Insight CMU oneSIS diskless image creation process provides support for scripting any customizations to the golden image. Before making any manual customizations to this golden image, Hewlett Packard Enterprise strongly recommends that you read and become familiar with the "oneSIS-HOWTO" documentation section on the http://onesis.org website. This documentation explains the file system layout of the read-only oneSIS golden image and how to configure oneSIS to manage per-node file changes in this golden image. You must not change any settings that specifically support the oneSIS diskless environment. When an Insight CMU oneSIS diskless image group is created for the first time, the contents of the /opt/cmu/image// directory are populated with the appropriate support infrastructure. After creating an Insight CMU oneSIS diskless image for the first time, familiarize yourself with the Insight CMU oneSIS diskless image support: [root@cmumaster ~]# cd /opt/cmu/image/centos63-onesis [root@cmumaster centos63-onesis]# ls -l total 28 -rw-r--r-- 1 root root 692 Sep 18 09:04 files -rw-r--r-- 1 root root 250 Sep 18 09:04 files.customd rwxr-xr-x 2 root root 4096 Sep 18 09:05 onesis drwxr-xr-x 3 root root 4096 Sep 18 09:05 onesis_pxeboot -rw-r--r-- 1 root root 268 Sep 18 09:04 onesis_pxeboot_template -rwxr-xr-x 1 root root 2570 Sep 18 09:04 reconf-onesis-image.sh -rwxr-xr-x 1 root root 2115 Sep 18 09:04 reconf-onesis-snapshot.sh [root@cmumaster centos63-onesis]# The files file contains the list of files and directories identified by Insight CMU as writable by default for all Linux distributions. This file may be updated with Insight CMU patches or updates.

5.6 Insight CMU diskless environments 101 It is overwritten automatically each time the diskless image group is re-created, so this file should not be modified. These files and directories are automatically configured in the onesis/etc/ sysimage.conf file and when the mk-sysimage /opt/cmu/image//onesis command is run during the diskless image creation process (when you add the image group to Insight CMU). An additional list of files and directories can be configured in the files.custom file to be writable whenever the diskless image group is created. You can also add files and directories directly to the oneSIS image by manually modifying the onesis/etc/sysimage.conf file and then running the mk-sysimage /opt/cmu/image//onesis command, but the files.custom file is recommended to enable repeatability. The onesis directory contains the root file system for the golden image. This directory is automatically configured to be exported as read-only through NFS during the image creation process. You can update the contents of this directory but you must respect the alterations made by the oneSIS process. For more information about these alterations, see the "oneSIS-HOWTO" documentation section of http://onesis.org website. The alterations consist of soft links in place of files and directories renamed to support the in-memory writable file system created at bootup. The onesis_pxeboot directory contains the following components: • The vmlinuz kernel • The initrd.img initial ramdisk • The pxelinux.0 PXE-boot loader • The pxelinux.cfg/ directory where the PXE-boot files for each node will be installed These components are used during the PXE-boot process to boot the compute nodes into a diskless environment. The onesis_pxeboot_template file is the PXE-boot template file. It contains keywords that are replaced by node-specific information to create the PXE-boot file for that compute node. This occurs when a node is added to this Insight CMU diskless image group. To make changes to the kernel arguments, for example to change the PXE-boot network from eth0 to eth1, edit this file. However, this template file functions properly as is. The reconf-onesis-image.sh file is run after the image creation process completes. System administrators can automate any preferred customizations to the image with this file; for example, to add mountpoints to the etc/fstab file in the golden image so that all of the diskless compute nodes will mount a shared file system on bootup. That customization can be performed manually, or it can be done with this script. Another common customization example is configuring a second network on the diskless compute nodes, such as an InfiniBand network. This example is documented in the comments of this file and in the reconf-onesis-snapshot.sh file. The reconf-onesis-snapshot.sh file is run after a node is added to this Insight CMU oneSIS diskless image group. System administrators can script any node-specific customizations to the golden image with this file. Be aware that some node-specific customizations require scripting in both the reconf-onesis-image.sh script and in the reconf-onesis-snapshot.sh file. An example of this is configuring TCP IP addresses over InfiniBand, which is documented in both files. The reconf-onesis-image.sh script handles configuring the ifcfg-ib0 as a writable file in oneSIS and creating the template file, while the reconf-onesis-snapshot.sh script handles the per-node IP address configuration for each node.

5.6.4.6 Manage the writeable memory usage by the oneSIS diskless clients OneSIS allows you to configure the amount of memory allocated for use by the oneSIS writable file system that is created on bootup. The RAMSIZE setting in the onesis/etc/sysimage.conf file controls the amount of memory allocated for oneSIS. By default, Insight CMU configures this setting to 500m which is 500 MB of memory. Change this setting as appropriate.

102 Provisioning a cluster with Insight CMU 5.6.4.7 Adding nodes and booting the diskless compute nodes After the oneSIS diskless golden image is ready and the per-node configurations are set: 1. Add nodes to this Insight CMU diskless image group. 2. Boot the nodes over the network. 3. Select this image group name. When the nodes boot up, the kernel and the initrd: • Mount the read-only root file system. • Configure the oneSIS/ram tmpfs file system. • Copy the appropriate read-writable files into that file system. • Proceed with the configured bootup sequence. 5.6.5 Scaling out diskless NFS-root cluster with multiple NFS servers By default, the Insight CMU diskless support configures the Insight CMU management node as the NFS server that will serve the diskless image, regardless of the diskless implementation method. Hewlett Packard Enterprise recommends that a single NFS server can support up to ~200 diskless clients over a 1Gb ethernet management network, and an NFS server with a 10Gb ethernet management network can support up to ~400 diskless clients. These are recommendations based on bootup tests. Users should factor in their expected usage of the diskless solution and adjust these numbers accordingly. In any case, to scale beyond these numbers means that you need more NFS servers to serve the additional diskless clients and distribute the network traffic among multiple network switches. Insight CMU diskless support includes support for configuring additional NFS servers and assigning groups of compute nodes to these NFS servers. To build a large-scale diskless cluster with Insight CMU: 1. Determine the number of nodes per NFS server and identify the NFS server nodes. Hewlett Packard Enterprise recommends: • Not more than 200 nodes per NFS server over a 1Gb network and not more than 400 over a 10Gb network • 4GB of (non-SATA) storage per compute node on each NFS server These recommendations are sufficient for booting and serving the OS. Also, these recommendations are for a cluster that includes a high-performance cluster-wide file system and/or a local scratch disk for the user workload. When choosing the NFS server nodes, factor in the network topology of the cluster. Make sure that the compute nodes have uncongested access to the NFS server. Ideally, each NFS server is on the same switch as all of the compute nodes it serves. 2. Install the NFS server nodes with the selected Linux distribution, and verify that the NFS server package is installed and configured to start at bootup.

On Red Hat # chkconfig nfs on

On SLES # chkconfig nfsserver on 3. Ensure that enough NFS daemons and threads are configured to handle the anticipated volume of NFS traffic.

5.6 Insight CMU diskless environments 103 On Red Hat Set RPCNFSDCOUNT in the /etc/sysconfig/nfs file to the requested number of NFS daemons. By default, RPCNFSDCOUNT=8.

On SLES Set USE_KERNEL_NFSD_NUMBER in the /etc/sysconfig/nfs file, which defaults to 4. Hewlett Packard Enterprise recommends that the setting be at least half of the maximum number of compute nodes served by each NFS server, and is typically a multiple of the total number of CPUs on the NFS server. For more information and tips on tuning NFS, see the NFS documentation. 4. Hewlett Packard Enterprise recommends that you install one of these nodes, use Insight CMU to take a backup of the node, and then clone this image to all of the NFS servers, including the node that was initially installed. This approach ensures that all of the NFS servers are consistent with the same /etc/ hosts file and with passwordless ssh configured for the root account. This is required for the rest of the setup to succeed. 5. Scalable diskless support in Insight CMU is based on the presence of a single configuration file: /opt/cmu/etc/cmu_diskless_nfs_servers. The existence and content of this file enables the scalable diskless support in Insight CMU. Edit this file and insert the NFS topology of the cluster. The syntax of this file supports Insight CMU node names and network groups. Remember that Insight CMU network groups represent groups of nodes on a common network switch. The acceptable formats of this file are: ... ... ... Sample file: [root@head ~]# cat /opt/cmu/etc/cmu_diskless_nfs_servers 172.20.0.5 n06 n07 rack1 172.20.0.50 encl3 encl4 [root@head ~]# In this sample, node n06, node n07, and all of the nodes in the 'rack1' network group obtain the NFS-root file system from node n05 which has IP address 172.20.0.5. All nodes in the 'encl3' and 'encl4' network groups obtain the NFS-root file system from node n50, which has IP address 172.20.0.50. Any diskless compute nodes that are not assigned to an NFS server in this file obtain the NFS-root file system from the Insight CMU management node. 6. Proceed with the diskless installation procedure in “The Insight CMU oneSIS diskless method” (page 98). If the /opt/cmu/etc/cmu_diskless_nfs_servers file is detected, the following additional actions occur:

When a diskless image group is created • Each additional NFS server gets a copy of the root file system from /opt/cmu/image/ on the Insight CMU management node. • Each additional NFS server is configured to export the /opt/cmu/image/{image group name}/onesis directory.

When a node is added to the diskless image group • A PXE-boot file is created in the tftp pxelinux.cfg directory that instructs the kernel to obtain its root file system from the assigned NFS server.

104 Provisioning a cluster with Insight CMU IMPORTANT: When booting the compute nodes in a large-scale diskless cluster, only one DHCP and tftp server are available for the cluster. Hewlett Packard Enterprise recommends booting no more than 256 nodes at a time to avoid DHCP and tftp timeouts.

5.6.5.1 Comments on High Availability (HA) Configuring an HA solution for the additional NFS servers is beyond the scope of the procedure described in “Scaling out diskless NFS-root cluster with multiple NFS servers” (page 103). If HA NFS servers are needed, then configure the HA solution on the NFS servers during step 2 in“Scaling out diskless NFS-root cluster with multiple NFS servers” (page 103) so the servers are ready for use by step 6. When configuring the NFS server IP addresses in step 5 of “Scaling out diskless NFS-root cluster with multiple NFS servers” (page 103), use the alias IP address for the NFS server that is managed by the HA solution. 5.6.6 Configuring DNS and the default gateway in an Insight CMU diskless cluster Insight CMU diskless compute nodes obtain their network information from the DHCP server running on the Insight CMU management server. To configure DNS servers and a default gateway, you must you must add the following information to the /opt/cmu/etc/ cmu_dhcpd_header_addons file: option domain-name-servers ; option routers ; After these changes are made, run the /opt/cmu/bin/cmu_test_dhcp command to confirm that these changes are valid and to reload the DHCP server with these changes. These edits are necessary to maintain these changes in the Insight CMU DHCP environment. For more information, see Modifying the management network configuration (page 170).

5.6 Insight CMU diskless environments 105 6 Monitoring a cluster with Insight CMU

NOTE: Monitoring support is not available for cartridges running Windows (available only on specific Moonshot cartridges). However, users can gather metrics for Windows cartridges from external sources and scripts, then use Insight CMU extended metrics features to feed those metrics to the Insight CMU monitoring engine and GUI display. 6.1 Installing the Insight CMU monitoring client You must install the Insight CMU monitoring client to properly monitor your cluster. 1. Select the compute nodes that need the rpm installation, and right-click to access the contextual menu. 2. Select Update. This displays a submenu. 3. On the submenu, click Install CMU monitoring client. A window appears with the status of the installation. A summary of the installation is provided. 4. When the installation is complete, press Enter to close the window.

Figure 33 Monitoring client installation

6.2 Deploying the monitoring client If you intend to use Insight CMU monitoring, you must install it on the golden node before performing a backup. The expect package is mandatory and must be installed on the compute nodes. After a golden image node is created, the Insight CMU monitoring client can be deployed. Ensure that the required expect package is installed on the golden image node and install the monitoring agent as follows, using the Insight CMU GUI client: 1. Enable Administrator mode Options→Administrator Mode.

106 Monitoring a cluster with Insight CMU 2. In the left panel tree of the Insight CMU GUI client, right-click the node holding the golden image. 3. Click Update. 4. Select Install CMU monitoring client. The window displays the status of the rpm installation. A dialog box notifies you when the installation is complete.

NOTE: If you are upgrading from an older version of Insight CMU, then you must reinstall the new Insight CMU monitoring agents from the Insight CMU v8.0 rpm or Insight CMU monitoring will not start. 6.3 Monitoring the cluster Launch the Insight CMU GUI.

Figure 34 Main window

In Figure 34 (page 107), the left frame lists the resources, such as Network Groups, Image Groups, Nodes Definitions, etc. The '+' sign expands a resource. Compute nodes can be displayed: • By network group • By image group • By custom group • By nodes definition For example, to see nodes belonging to a image group, expand the Image Group resource list, then expand a selected image group. Figure 34 (page 107) displays nodes belonging to the image group "rh62_ks".

6.3 Monitoring the cluster 107 NOTE: When viewing by image group, nodes are listed in the active image group where nodes have been successfully cloned. The nodes which are not active in the image group are listed under the non-active candidate category. To change the classification, select a group from the drop-down menu.

NOTE: If no network group is defined, or if a node is not included in any network group, a default network group is created that contains unclassified nodes. An icon represents the status of each node in the tree.

Figure 35 Node status

The status of this node is okay. Node values are correctly reported to the main monitoring daemon.

The node is pinging properly, and the monitoring is working properly, but an alert is currently reported for this node. One of the thresholds you defined has been exceeded. Click the node in the tree to view the detail of this alert.

The status of this node is "No Ping". User action is required to identify the problem.

The status of this node is unknown. The daemon of this node is not monitored because it failed or is late. This state changes when Insight CMU monitoring selects a new monitoring server for this node. No user action is required.

NOTE: In very large clusters (2000+ nodes) if all the nodes display this unknown state, the backend process that gathers node status is taking too long to complete. When this occurs, a pingStaleDelay timeout (10 seconds by default) is reached in the Insight CMU GUI which causes this unknown state to display for all nodes. To increase this pingStaleDelay timeout to 30 seconds, add -Dcmu.monitoring.pingStaleDelay=30 to the CMU_JAVA_SERVER_ARGS variable in the /opt/cmu/etc/cmuserver.conf file and restart the cmu service.

108 Monitoring a cluster with Insight CMU 6.3.1 Node and group status For image groups, network groups, and custom groups, a status bar represents the proportion of nodes in "OK" state (green) and "no ping" state (red). In Figure 35 (page 108), the red/green status bar at the top shows the node status. • Green represents the portion of nodes in an okay status. • Grey represents the portion of nodes in an unknown state. • Red represents the portion of nodes in "no ping" or "monitoring error" state. 6.3.2 Selecting the central frame display Information in the central frame appears according to the elements selected in the tree. • When CMU Cluster is selected, the central frame displays the Global Cluster View. • When a group is selected, the central frame displays the Group View. When a node is selected, the central frame displays the Node View. In the central frame, the following tabs are available: • Instant View • Table View • Time View • Details • Alerts For a single node view, the following tabs are available: • Monitoring • Details • Alerts 6.3.3 Global cluster view in the central frame By default, the central frame displays the monitoring values of the whole cluster. You can return to this view at any time by clicking CMU Cluster at the root of the node tree. The global cluster view displays one or more pies representing the cluster monitoring sensor value. To select the sensors being monitored, right-click an item in the central frame. A metrics window appears. See Figure 36 (page 110). Select a metric and click OK.

6.3 Monitoring the cluster 109 Figure 36 Monitoring window

Pausing the mouse on a portion of the pie displays the name of the corresponding node, the status, and the value of the sensor displayed. For a given metric, the internal circle of a pie represents zero and the external circle represents the maximum value. By default, the current value of the metric appears in blue. The default color can be changed by clicking Option→Properties in the top bar, then selecting the monitor options. Color for a specific petal can also be changed on the fly by clicking on the petal. A grey colored pie means no activity on the node or a metric is not correctly updated. 6.3.4 Resource view in the central frame Monitoring values can be visualized by: • Global cluster • A specific image group • A specific network group • A specific custom group Click the resource in the left-frame tree and the title of the central frame displays the name of the selected resource.

NOTE: Resource- or node-specific monitoring metrics and alerts can be displayed in CLI mode using /opt/cmu/bin/cmu_monstat. For more information, see the cmu_monstat manpage.

6.3.4.1 Resource view overview To see pies representing the monitored values, in the resource view, click the Instant View tab. To change pies, right-click on the central frame and select the metrics and click OK.

110 Monitoring a cluster with Insight CMU Figure 37 Resource view overview

To view alerts raised for nodes in this group, select the Alerts tab in the central frame.

NOTE: You can define reactions to alerts in the /opt/cmu/etc/ActionAndAlertsFile.txt file. For more information, see “Customizing Insight CMU monitoring, alerting, and reactions” (page 119).

Figure 38 Alert messages

6.3.4.2 Detail mode in resource view To display a table with sensor values, select the Instant View tab in the central frame. • The cell is green when the value is below 33% of the maximum value. • The cell is orange when the value is between 33% and 66% of the maximum value. • The cell is red when the value is above 66% of the maximum value.

6.3 Monitoring the cluster 111 Figure 39 Resource view details

6.3.5 Gauge widget The middle of the pie shows average values for a sensor. Click in the middle of a pie to toggle the widget on/off.

112 Monitoring a cluster with Insight CMU Figure 40 Memory used summary

The widget also displays marks for average, maximum, and minimum values during the last two minutes for a given metric. 6.3.6 Node view in the central frame To display the details of a node, select that node in the tree. The following tabs are available in the central frame: • Monitoring—Shows monitoring metric values for that node. • Details—Shows static data for the node. Some of the values are filled during the initial node discovery (scan node). Other values are filled by right-clicking on the node in the tree to get the contextual menu. Then select Update→Get Node Static Info. • Alerts—Contains the alerts currently raised for this node.

6.3 Monitoring the cluster 113 Figure 41 Node details

The central frame title displays the name of the node. The title is colored according to the state of the node. The following tables appear: • The Node Properties table contains the static information from Insight CMU monitoring. • The Information Retrieved table contains the current values of the sensors retrieved for this node. • The Alerts Raised table contains the alerts currently raised for this node. 6.3.7 Using Time View Insight CMU v8.0 can be used to visualize the activity of your Insight CMU cluster in time and in a scalable manner. Assuming the GUI client has enough memory and OpenGL capabilities, Time View extends the 2D flowers visualization to provide an evaluative 3D view of your cluster with the Z-axis representing the time. For system requirements, see “Technical dependencies” (page 117). Time View visualizes the last 2 minutes at the finest 5 seconds resolution and visualizes the previous 40 minutes at a 30 seconds per ring resolution. For more information, see “Adaptive stacking” (page 115). Detailed values are still available for the entire 42 minutes with the use of tooltip functionality. The long standing 2D flowers are still available in the Instant View panel.

114 Monitoring a cluster with Insight CMU 6.3.7.1 Tagging nodes Nodes can be labeled with a color. This allows chosen nodes to be easily tracked through different views or partitions. This functionality is available from the Instant View and Time View tabs. Iterate through a predefined set of four colors by clicking on a node. Colored nodes are shared between Instant View and Time View, which allows them to be efficiently located regardless of the chosen visualization.

6.3.7.2 Adaptive stacking Adaptive stacking is an efficient way to monitor your cluster over a long period of time. Adaptive stacking provides 42 minutes of data, without sacrificing the finest 5 second granularity provided by the monitoring engine. The first 24 rings (representing 2 minutes of data, with a 5 second granularity) progressively slide and consolidate into an intermediate ring, making room for newest data. The intermediate ring is full when six rings are stacked in it, representing 30 seconds. Then stacked rings slide and a new intermediate ring is created. The entire 42 minutes of history is displayed as 24 rings of 5 seconds (representing 2 minutes of data) and 80 rings of 30 seconds (representing 40 minutes of data). Stacked rings are displayed darker than single rings to differentiate them.

Figure 42 Time view

6.3 Monitoring the cluster 115 6.3.7.3 Bindings and options

6.3.7.3.1 Mouse control • Left-click on a node—Mark the node from a set of four predefined colors • Right-click on a node—Open the interactive menu for this node • Right-click elsewhere—Open the metrics selection menu NOTE: Time View cannot display more than 10 metrics. For more information, see “Technical dependencies” (page 117).

• Navigating within the 3D scene ◦ Left-click and drag—Translate the scene ◦ Right-click and drag—Rotate the scene ◦ Rotate the mousewheel—Rotate tubes on themselves ◦ Press the mousewheel and drag—Zoom

6.3.7.3.2 Keyboard control Keyboard shortcuts are available for some Time View options. All of the following shortcuts are also available in Options→Properties. • K, k—Increase or decrease space between the tube and petals (Radial offset option) • L, l—Increase or decrease space between rings (Z offset option) • M, m—Increase or decrease space between petals (Angular offset option) • +, - —Increase or decrease petal outline width (Petal outline width option)

6.3.7.3.3 Custom cameras To save a custom camera position, press Ctrl+1 to 5. Restore it later by pressing 1 to 5. (Custom camera position 1 ... 5 options.) • e—Set perspective view • z—Set history view • s—Set front view

6.3.7.3.4 Options The following options are also available in Options→Properties: • Anti-aliasing level—Set the smoothness of the line rendering. Higher levels are best, but not all graphic cards can support it, and it can reduce performance. • Petal pop-out speed—The petal inflate speed for a new petal. When set to the maximum, petals directly appear fully inflated. • Activate ring sliding—Enable or disable ring slide along the tube. Deactivating this option can improve low performance conditions. • Draw petal outline—Set to display the black outline surrounding each petal. Improves the readability in most cases. • Display metrics skeleton/name/cylinder—Set to display the tube skeleton, name, or cylinder.

116 Monitoring a cluster with Insight CMU 6.3.7.4 Technical dependencies Time View is a live history tool, meaning that the GUI stores the history data (42 minutes of data in a circular buffer fashion) from the time it is started. The memory requirements of Time View on the end station running the Insight CMU GUI depend on the cluster size. A typical 500 node cluster can require 2GB to 3GB of RAM. Memory consumption does not impact the management node. For larger clusters, the memory consumption can exceed 4GB requiring a 64-bit JVM on the GUI client side. Because of the high memory and CPU/GPU consumption, Time View is limited to displaying 10 metrics at a time. Hewlett Packard Enterprise recommends using OpenGL hardware acceleration for a higher quality experience such as improved graphics and faster activation of anti-aliasing. Time View has been tested using Oracle JVM in many environments including Linux, Windows 7, and Windows Vista. OpenGL problems can occur with Windows Vista. For more information, see “Troubleshooting” (page 117). (Windows is only available on specific Moonshot cartridges.)

6.3.7.5 Troubleshooting If Time View prints an "OutOfMemory [...]" error, try increasing the maximum HEAP memory usage of the GUI. To specify the memory consumption allowed for the JVM, set the –Xmx JVM argument when starting the CLI. In the GUI, edit CMU_GUI_MB (specified in MB) in cmuserver.conf.

IMPORTANT: Setting this value too high may create "Unable to start JVM" messages on hosts with insufficient memory or on hosts running a 32-bit JVM. Hewlett Packard Enterprise recommends a 64-bit JVM and requires it for large clusters.

If Time View stops running, a restart button appears below the Time View panel. Some GPUs may not support anti-aliasing levels set to 8. Symptoms are black strips on the left and right of Time View, or cylinders above the rings making the visualization inoperable. If this occurs, set anti-aliasing to a lower value such as 4, or 0 if the problem persists. 6.3.8 Archiving custom groups Monitoring data for deleted custom groups can be archived and visualized later as “history data”. To archive monitoring data for deleted custom groups: 1. Delete a custom group 2. Answer Yes to Do you want to archive this Custom group? See Figure 43 (page 118).

6.3 Monitoring the cluster 117 Figure 43 Archiving deleted custom groups

After the delete operation is complete, then the custom group displays in the list of Archived Custom Groups in the left-frame tree. See Figure 44 (page 118)

Figure 44 Archived custom groups

NOTE: Custom groups can also be archived using a the cmu_del_custom_group command. For more information, see the cmu_del_custom_group manpage.

6.3.8.1 Visualizing history data When selecting an archived custom group in the left-frame tree, a static Time View picture appears in the central frame. The picture shows the activity view of the custom group during its existence. All options available with Time View are also available when visualizing archived custom groups.

6.3.8.2 Limitations To display an archived custom group, the following conditions must be satisfied: • Time must not exceed 24 hours. • The number of nodes must not exceed 4096. • The number of metrics must not exceed 100. • The product of the three parameters above must not exceed 409600. Table 3 (page 119) displays examples of valid combinations of these three parameters.

118 Monitoring a cluster with Insight CMU Table 3 Valid archived custom group parameters

Nodes Metrics Hours Nodes*Metrics*Hours

4096 10 10 409600

4096 5 20 409600

4096 100 1 409600

256 100 12 307200

2048 8 24 393216

1024 16 24 393216

IMPORTANT: If the above criteria is not met, the display fails with a warning message. 6.4 Stopping Insight CMU monitoring To stop the Insight CMU Monitoring GUI, click the X in the upper right corner of the main Insight CMU Monitoring window. When the Monitoring GUI is stopped, the monitoring engine is not automatically stopped. To stop the monitoring engine on the cluster, on the toolbar, click the Monitoring tab, and then select Stop Monitoring Engine. 6.5 Customizing Insight CMU monitoring, alerting, and reactions 6.5.1 Action and alert files Sensors, alerts, and alert reactions are described in the /opt/cmu/etc/ ActionAndAlertsFile.txt file. The following is an example of the contents of the file: #This is a CMU action and alerts description file #======# # ACTIONS # # # #------KERNEL VERSION, RELEASE, BIOS VERSIONS------# kernel_version "kernel version" 9999999 string Instantaneous release uname -r #------CPU------# # #- Native cpuload "% cpu load (raw)" 1 numerical MeanOverTime 100 % awk '/cpu / {printf"%d\n",$2+$3+$4}' /proc/ #- Collectl #cpuload "% cpu load (normalized)" 1 numerical Instantaneous 100 % COLLECTL (cputotals.user) + (cputotals.nice) + (cputotals.sys) #cpuload "% cpu load (normalized)" 1 numerical Instantaneous 100 % COLLECTL 100 - (cputotals.idle) # #------MEMORY------# # #- Native #memory_used "% memory used" 1 numerical Instantaneous 100 % free | awk ' BEGIN { freemem=0; totalmemory=0; } /cache:/ { freemem=$4; } /Mem:/ { totalmemory=$2; } END { printf "%d\n", (((totalmemory-freemem)*100)/totalmemory); }' # # # ALERTS # # #cpu_freq_alert "CPU frequency is not nominal" 1 24 100 < % sh -c "b=`cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq`;a=`cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq`;echo 100 \* \$b / \$a |bc" login_alert "Someone is connected" 3 24 0 > login(s) w -h | wc -l root_fs_used "The / filesystem is above 90% full" 4 24 90 > % df / | awk '{ if ($6=="/") print $5}' | cut -f 1 -d % - #reboot_alert "Node rebooted" 4 24 5 < rebooted awk '{printf "%.1f\n",$1/60}' /proc/uptime # The line below allows to report MCE errors; be careful for possible false positives #mce_alert "The kernel has logged MCE errors; please check /var/log/mcelog" 5 60 1 > lines wc -l /var/log/mcelog |cut -f 1 -d ' '

6.4 Stopping Insight CMU monitoring 119 # # ALERT_REACTIONS # # #login_alert "Sending mail to root" ReactOnRaise echo -e "Alert 'CMU_ALERT_NAME' raised on node(s) CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'w -h'`" | mailx -s "CMU: Alert 'CMU_ALERT_NAME' raised." root # #root_fs_used "Sending mail to root" ReactOnRaise echo -e "Alert 'CMU_ALERT_NAME' raised on node(s) CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'df /'`" | mailx -s "CMU: Alert 'CMU_ALERT_NAME' raised!" root # #reboot_alert "Sending mail to root" ReactOnRaise echo -e "Alert 'CMU_ALERT_NAME' raised on node(s) CMU_ALERT_NODES. \n\nDetails:\n`/opt/cmu/bin/pdsh -w CMU_ALERT_NODES 'uptime'`" | mailx -s "CMU: Alert 'CMU_ALERT_NAME' raised." root #

Lines prefixed with # are ignored. Lines cannot begin with a leading white space. Each line corresponds to a sensor, alert, or an alert reaction. Sensors are placed at the beginning of the file, between the ACTIONS and ALERTS tags. Each alert is in the middle of the file between the ALERTS and ALERT_REACTIONS tags, and each alert reaction is at the end of the file below the ALERT_REACTIONS tag. Most sensors have both a “native” line and a commented “collectl” line. To use collectl for collecting monitoring data, enable it by removing the comment from the corresponding sensor line.

NOTE: Using collectl requires additional steps described in “Using collectl for gathering monitoring data” (page 123).

6.5.2 Actions Each action contains the following fields: Name The name of the sensor as it appears in the Java GUI. It must consist of letters only. Description A quote-contained string to describe in a few words what the sensor is. This appears in the GUI. Time multiple An integer value that determines when the sensors are monitored. If the monitoring has a default timer of 5 seconds: • A time multiple of 1 means the value is monitored every 5 seconds. • A time multiple of 2 means the value is monitored every 10 seconds. Data type This can be numerical or a string. A string sensor cannot be displayed in the pies by the interface. Measurement method This can be either Instantaneous or MeanOverTime. • Instantaneous returns the sensor value immediately. • MeanOverTime returns the difference between the current value and the previous value divided by the time interval. For example, if the sensors return 1, 100, 50, 100 at 4 continuous time steps of 5 seconds: • Insight CMU Monitoring with the Instantaneous option returns 1, 100, 50, 100. • Insight CMU Monitoring with the MeanOverTime option returns N/A, 19.8, -10, 10.

120 Monitoring a cluster with Insight CMU Max value Used by the interface to create the pies at the beginning. If a greater value is return by a sensor, the maximum value is automatically updated in the interface. Unit The unit of the sensor. The GUI uses this measurement. Command The command to be executed by the script. This can be an executable or a shell command. The executable and the shell command must be available on compute nodes. 6.5.3 Alerts Each alert contains the following fields: Name The name of the sensor as it appears in the Java GUI. It must consist of letters only. Description A quote-contained string to describe in a few words what the sensor is. This appears in the GUI. Severity An integer from 1 to 5. A fatal alert is 5 stands. A minor alert is 1. This appears by the interface. Time multiple An integer value that determines when the sensors are monitored. If the monitoring has a default timer of 5 seconds: • A time multiple of 1 means the value is monitored every 5 seconds. • A time multiple of 2 means the value is monitored every 10 seconds. Threshold The threshold that must not be overcome by the sensor. Operator The comparison operator between the sensor and the threshold. Only > is available. Unit The unit of the sensor. The GUI uses this measurement. Command The command to be executed by the script. This can be an executable or a shell command. The executable and the shell command must be available on compute nodes. 6.5.4 Alert reactions Each alert reaction contains the following fields: Name(s) The names of one or more alerts from the ALERTS section. The reaction is associated with each of the alerts. If an alert is specified in more than one reaction, then only the first reaction is taken. The list of alert Names is white-space separated. Description A quote-contained brief description of the reaction. Condition The reaction is performed under this condition. • ReactOnRaise—Execute the reaction whenever the alert shows as raised and the previous state of the alert was lowered.

6.5 Customizing Insight CMU monitoring, alerting, and reactions 121 • ReactAlways—Execute the reaction whenever the alert shows as raised, subject to the alert’s time multiple. For example, if the monitoring has a default timer of 5 seconds and the alert’s time multiple is 6, the reaction will trigger every 5x6=30 seconds as long as the alert is raised. Command The command to be executed. This can be a single-line shell command, a shell script, or an executable file. Scripts and executable files must be available on the management node. The following keywords are supported within the “Command”. Each keyword is substituted globally (throughout the command line) using the defined values: CMU_ALERT_NAME The name of the alert that caused the reaction. CMU_ALERT_LEVEL The level of the alert. CMU_REACT_MESSAGE The text of the “Description” for this reaction. CMU_ALERT_NODES The list of names of all the nodes that raised the alert during the current monitoring pass. The list is condensed in the form provided by cmu_condense_nodes. CMU_ALERT_NODES_EXPANDED The same as CMU_ALERT_NODES, only the list is expanded, ordered, and separated by commas. CMU_ALERT_VALUES The list of alert values. This list is comma separated and ordered like the names of CMU_ALERT_NODES_EXPANDED. CMU_ALERT_TIMES The time the alert was triggered on each node. This list is comma separated and ordered like the names of CMU_ALERT_NODES_EXPANDED. CMU_ALERT_SEQUENCE_FILE The path of the Insight CMU “sequence” file containing the alerts and alert values from the monitoring pass that triggered the reaction. Analyze this file with the /opt/cmu/bin/cmu_monstat command.

NOTE: To protect the management node from large numbers of concurrent reactions, a reaction will only launch on behalf of compute nodes that do not have previous instances of the reaction still running. Limit the command runtime of a reaction if the reaction is expected to be triggered frequently.

6.5.5 Modifying the sensors, alerts, and alert reactions monitored by Insight CMU Several optional sensors, alerts, and alert reactions are commented out in the ActionAndAlertsFile.txt file. • Comment a sensor, alert, or alert reaction to stop monitoring it. • Uncomment a sensor, alert, or alert reaction to start monitoring it. • Modify a sensor, alert, or alert reaction line to change its parameters. • Add your own sensors, alerts, or alert reactions by adding a line to the ACTIONS, ALERTS, or ALERT_REACTIONS section. Modifications in the ActionAndAlertsFile.txt file are only taken into consideration when the monitoring daemons are restarted.

122 Monitoring a cluster with Insight CMU To restart the monitoring daemons: 1. Change the ActionAndAlertsFile.txt file on the management node. 2. Stop the Java GUI. 3. Stop the daemons. # /etc/init.d/cmu stop 4. Restart the daemons. # /etc/init.d/cmu restart 5. Start the Java interface. 6.5.6 Using collectl for gathering monitoring data The default method for specifying commands to collect monitoring data is described in “Actions” (page 120). This default method is referred to as native mode in this section. Insight CMU provides an alternative method that uses the collectl tool for gathering monitoring data. Data appears using the same Insight CMU interface as native mode.

6.5.6.1 Installing and starting collectl on compute nodes 1. On compute nodes, install the package from the Insight CMU DVD: # mount /dev/cdrom /mnt # cd /mnt/tools/collectl # rpm -ivh collectl-3.x.x-x.noarch.rpm Preparing... ########################################### [100%] 1:collectl ########################################### [100%] 2. If you have not already done it, install the monitoring rpm on compute nodes as described in “Installing the Insight CMU monitoring client” (page 106). 3. Edit the /etc/collectl.conf file as follows: DaemonCommands = -s+dcmnNE --import misc --export lexpr -A server -i5

IMPORTANT: For Insight CMU diskless configurations, only use the DaemonCommands options provided in the example above. Do not use any option which causes disk I/O.

4. Start collectl: # /etc/init.d/collectl start Starting collectl: [ OK ] 5. Configure collectl to start automatically: # chkconfig --add collectl collectl 0:off 1:off 2:on 3:on 4:on 5:on 6:off

6.5.6.2 Modifying the ActionAndAlerts.txt file The ActionAndAlerts.txt file contains definitions for using collectl monitoring. These lines are commented out so Insight CMU works in native mode by default, without collectl. If you want to switch to collectl, you must comment native and uncomment collectl in /opt/cmu/etc/ActionAndAlertsFile.txt. For example: #------CPU------# # #- Native #cpuload "% cpu load (raw)"1 numerical MeanOverTime 100 % awk '/cpu / {printf"%d\n",$2+$3+$4}' /proc/stat #- Collectl cpuload "% cpu load (normalized)" 1 numerical Instantaneous 100 % COLLECTL (cputotals.user) + (cputotals.nice) + (cputotals.sys)

6.5 Customizing Insight CMU monitoring, alerting, and reactions 123 The command field must start with the string “COLLECTL” in capital letters. The line continues with a series of collectl variables included in parenthesis and connected with arithmetical operators. For this example, the cpuload metric reports the sum of cputotals.user, cputotals.nice, and cputotals.sys. For a full list of available collectl variables, run the collectl command interactively, as follows: # collectl –c 1 -s+C --export lexpr The -c 1 option runs one shot only. The command output is the list of collectl variables and the current value: waiting for 1 second sample... sample.time 1217858718.002 cputotals.user 1 cputotals.nice 0 cputotals.sys 0 cputotals.wait 7 cputotals.irq 0 cputotals.soft 0 cputotals.steal 0 cputotals.idle 90 ctxint.ctx 239 ctxint.int 1073 ctxint.proc 4 ctxint.runq 152 disktotals.reads 0 disktotals.readkbs 0 disktotals.writes 11 disktotals.writekbs 80 nettotals.kbin 4 nettotals.pktin 49 nettotals.kbout 6 nettotals.pktout 17 cpuinfo.user.cpu0 0 cpuinfo.nice.cpu0 0 cpuinfo.sys.cpu0 0 cpuinfo.wait.cpu0 0 cpuinfo.irq.cpu0 0 cpuinfo.soft.cpu0 0 cpuinfo.steal.cpu0 0 cpuinfo.idle.cpu0 100 cpuinfo.intrpt.cpu0 0 cpuinfo.user.cpu1 0 cpuinfo.nice.cpu1 0 cpuinfo.sys.cpu1 0 cpuinfo.wait.cpu1 11 cpuinfo.irq.cpu1 0 cpuinfo.soft.cpu1 0 cpuinfo.steal.cpu1 0 cpuinfo.idle.cpu1 89 cpuinfo.intrpt.cpu1 0 cpuinfo.user.cpu2 4 cpuinfo.nice.cpu2 0 cpuinfo.sys.cpu2 2 cpuinfo.wait.cpu2 0 Create the monitoring lines by using these variables. Native Insight CMU lines and collectl lines can be mixed in the ActionAndAlertFile.txt file. For more information about using and fine tuning collectl, see http://collectl.sourceforge.net/.

124 Monitoring a cluster with Insight CMU 6.5.6.3 Installing and configuring colplot for plotting collectl data

IMPORTANT: Do not to use this option for Insight CMU diskless configurations. 1. On the Insight CMU administration server, create an NFS export a directory to store collectl data from compute nodes: # mkdir /var/log/collectl # vi /etc/exports 2. Add the following line: /var/log/collectl *(rw,sync,no_all_squash,no_root_squash) 3. Refresh exports: # exportfs -r 4. Install the collectl-utils package from the Insight CMU DVD in tools/collectl:

NOTE: In order to install collectl-utils you must install the gnuplot rpm. # mount /dev/cdrom /mnt # cd /mnt/tools/collectl # rpm -ivh collectl-utils-x.x-x.noarch.rpm

Preparing... ########################################### [100%] 1:collectl-utils ########################################### [100%] apache found, enabling it to run colplot... creating /etc/httpd/conf.d/colplot.conf creating /var/www/html/colplot 5. Copy colplot HTML files to the Insight CMU web directory. Those files are located in different directories depending on whether your Insight CMU administration server runs a Red Hat or SUSE distribution. • For Red Hat: # cp -a /var/www/html/colplot /opt/cmu/www/colplot

• For SUSE: # cp -a /srv/www/htdocs/colplot /opt/cmu/www/colplot 6. Change the default colplot plot directory to point to the common collectl directory: # vi /etc/colplot.conf #PlotDir = /opt/hp/collectl/plotfiles PlotDir = /var/log/collectl 7. If not already done, install the collectl rpm on compute nodes: # mount /dev/cdrom /mnt # cd /mnt/tools/collectl # rpm -ivh collectl-3.x.x-x.noarch.rpm Preparing... ########################################### [100%] 1:collectl ########################################### [100%] 8. Otherwise, if the collectl rpm is already installed, ensure that collectl is stopped: # /etc/init.d/collectl stop 9. Import the common directory created on the administration server for collectl. # mkdir /var/log/collectl # vi /etc/fstab

6.5 Customizing Insight CMU monitoring, alerting, and reactions 125 X.X.X.X:/var/log/collectl /var/log/collectl nfs defaults 0 0

where X.X.X.X is the address of your Insight CMU administration server. 10. Modify the collectl configuration file to save data to be plotted in the common directory: # vi /etc/collectl.conf DaemonCommands = -s+dcmnNE --import misc --export lexpr -A server -i5 -f /var/log/collectl -P -oz -r 00:01,7 11. Restart collectl: # /etc/init.d/collectl restart

6.5.6.3.1 Plotting data You can use a web browser to display collectl data. Use the address of the Insight CMU administration server followed by /colplot: http://X.X.X.X/colplot

126 Monitoring a cluster with Insight CMU Figure 45 ColPlot window

Select plotting options, and then click Generate Plot.

6.5 Customizing Insight CMU monitoring, alerting, and reactions 127 Figure 46 ColPlot results

6.5.7 Monitoring GPUs and coprocessors

6.5.7.1 Monitoring NVIDIA GPUs If your client nodes contain NVIDIA GPUs and are running version 270.xx.xx or later of the NVIDIA GPU driver, you can monitor your GPUs with Insight CMU. If you haven’t done so already, install the NVIDIA GPU driver version 270.xx.xx or later on your client nodes. This can be done two ways: 1. Install the NVIDIA GPU driver manually on one of the client nodes, back up the client image and clone the remaining clients with this new image. 2. Use the script /opt/cmu/contrib/install_nvidia.pl to install the NVIDIA GPU driver on all running clients. For more information, see the file /opt/cmu/contrib/ install_nvidia.README. To enable GPU monitoring, the /opt/cmu/etc/ActionAndAlertsFile.txt file must be updated with entries for Insight CMU GPU monitoring. This is done by running the script /opt/ cmu/bin/cmu_config_nvidia. This script takes the number of GPUs on each client as an argument. The following example updates ActionAndAlertsFile.txt to monitor clients that have 3 GPUs each. Monitoring must be restarted for the updates to take effect. # cmu_config_nvidia 3 CMU GPU monitoring enables driver persistence mode on all GPUs and requires all GPU-enabled clients to be running NVIDIA driver 270.xx.xx or newer. Continue only if an appropriate driver is installed on the clients and persistence mode is permissible. Continue? [y/n] y Configuring GPU monitoring in CMU... GPU monitoring configured successfully. Copy of orignial /opt/cmu/etc/ActionAndAlertsFile.txt can found in /opt/cmu/etc/ActionAndAlertsFile.txt_before_cmu_config_nvidia_config

128 Monitoring a cluster with Insight CMU Please restart CMU ('/etc/init.d/cmu restart') to enable these changes.

# /etc/init.d/cmu restart . . Running /opt/cmu/bin/cmu_config_nvidia adds a list of predefined GPU metrics to ActionAndAlertsFile.txt. To monitor these metrics using the GUI, select the metrics from the Monitoring sensors list as described in Figure 36 (page 110).

NOTE: Not all metrics are supported by all NVIDIA GPUs and some lesser used metrics may be commented out within ActionAndAlertsFile.txt. To introduce/remove metrics from the Monitoring sensors list, you can uncomment/comment out the associated lines inside ActionAndAlertsFile.txt as described in “Action and alert files” (page 119). NOTE: Insight CMU dynamically determines if a client has working GPUs when monitoring is initially started after installation on the client. This monitoring process allows for configurations that have clients with GPUs and clients without GPUs. If the GPUs are not working when monitoring is started (or GPUs are added at a later date), redeploy monitoring to the client (see “Installing the Insight CMU monitoring client” (page 106)) and restart monitoring to ensure the GPUs are recognized.

6.5.7.2 Monitoring AMD GPUs If your client nodes contain AMD GPUs and are running version 8.83.5 or later of the AMD GPU driver, you can monitor your GPUs with Insight CMU. If you haven’t done so already, install the AMD GPU driver version 8.83.5 or later on your client nodes. This can be done two ways: 1. Install the AMD GPU driver manually on one of the client nodes, back up the client image, and clone the remaining clients with this new image. 2. Use the script /opt/cmu/contrib/cmu_install_amd to install the AMD GPU driver on all running clients. For more information, see the file /opt/cmu/contrib/ cmu_install_amd.README. To enable GPU monitoring, the /opt/cmu/etc/ActionAndAlertsFile.txt file must be updated with entries for Insight CMU GPU monitoring. This is done by running the script /opt/ cmu/bin/cmu_config_amd. This script takes the number of GPUs on each client as an argument. The following example updates ActionAndAlertsFile.txt to monitor clients that have 2 GPUs each. Monitoring must be restarted for the updates to take effect. # cmu_config_amd 2 You are about to update the CMU ActionsAndAlerts file with metrics for monitoring AMD GPUs. Continue? [y/n] y Configuring GPU monitoring in CMU... GPU monitoring configured successfully. Copy of orignial /opt/cmu/etc/ActionAndAlertsFile.txt can found in /opt/cmu/etc/ActionAndAlertsFile.txt_before_cmu_config_amd_config Please restart CMU ('/etc/init.d/cmu restart') to enable these changes. # /etc/init.d/cmu restart . . Running /opt/cmu/bin/cmu_config_amd adds a list of predefined GPU metrics to ActionAndAlertsFile.txt. To monitor these metrics using the GUI, select the metrics from the Monitoring sensors list as described in Figure 36 (page 110).

6.5 Customizing Insight CMU monitoring, alerting, and reactions 129 NOTE: Not all metrics are supported by all AMD GPUs and some metrics may be commented out within ActionAndAlertsFile.txt. To introduce/remove metrics from the Monitoring sensors list, you can uncomment/comment out the associated lines inside ActionAndAlertsFile.txt as described in “Action and alert files” (page 119). NOTE: Insight CMU dynamically determines if a client has working GPUs when monitoring is initially started after installation on the client. This monitoring process allows for configurations that have clients with GPUs and clients without GPUs. If the GPUs are not working when monitoring is started (or GPUs are added at a later date), redeploy monitoring to the client (see “Installing the Insight CMU monitoring client” (page 106)) and restart monitoring to ensure the GPUs are recognized.

6.5.7.3 Monitoring Intel coprocessors If your client nodes contain Intel coprocessors, you can monitor the coprocessors with Insight CMU.

IMPORTANT: If you currently monitor Intel coprocessors using Insight CMU, you must deploy an updated set of images. To deploy the images: 1. Redeploy the Insight CMU monitoring client to all nodes. It contains a new binary for collecting coprocessor metrics. 2. Remove the existing coprocessor metrics from the /opt/cmu/etc/ ActionsAndAlertsFile.txt file and install the updated metrics: a. Stop Insight CMU monitoring. b. Run /opt/cmu/bin/cmu_config_intel –r to remove the existing coprocessor metrics from the /opt/cmu/etc/ActionsAndAlertsFile.txt file. c. Run /opt/cmu/bin/cmu_config_intel to install the new metrics into the /opt/cmu/etc/ActionsAndAlertsFile.txt file. d. Start Insight CMU monitoring.

Install the selected coprocessor drivers on your client nodes and verify the coprocessors are working. Use one of the following procedures to install the drivers:

Install manually 1. Install the coprocessor driver manually on one of the client nodes. 2. Back up the client image. 3. Clone the remaining clients with this new image.

Deploy directly to running clients

NOTE: The following procedure is an example of installing the coprocessor driver. Other configuration steps may be necessary to configure the coprocessor hardware and environment. See the driver and coprocessor documentation to determine possible additional steps. 1. Distribute your coprocessor driver to your clients. Select your coprocessor-enabled clients from the list of clients in the Resources tree of the GUI. 2. Right-click on PDCP (Distributed Copy). 3. Enter the location of the coprocessor driver file (typically a .tgz file) into Source. 4. Enter /tmp into the Destination. 5. Click OK to copy the file to /tmp on the selected clients. 6. Select the same set of resources as in step 1 and right-click on Multiple Windows Broadcast. 7. When the windows appear, select the Console window so that typed commands are issued to all windows.

130 Monitoring a cluster with Insight CMU 8. Install the driver as follows:

NOTE: The following steps are an example of installing the driver. See the driver readme.txt file for specifics on installing your driver.

a. # cd /tmp b. # tar xvzf driver-file.tgz c. # cd created-driver-directory d. For Red Hat Enterprise Linux: # sudo yum install --nogpgcheck --noplugins --disablerepo=* *.rpm e. For SUSE Enterprise Linux Server: # sudo zypper --no-gpg-checks install *.rpm f. If this is the first time installing the driver, initialize the MIC cards: # sudo micctrl initdefaults g. Restart the driver: # sudo micctrl –r h. Verify the driver is loaded and the coprocessors are initialized and ready: # sudo micctrl status mic0: ready mic1: ready i. Start up the coprocessors: # service mpss start j. Verify the cards are seen by the OS and are working: # /opt/intel/mic/bin/micinfo k. Review the results and verify no errors are reported. l. With the coprocessors working, enable coprocessor monitoring by updating the /opt/ cmu/etc/ActionAndAlertsFile.txt file with metric entries for coprocessor monitoring. Do this by running the script /opt/cmu/bin/cmu_config_intel. This script takes the number of coprocessors on each client as an argument. The following example updates ActionAndAlertsFile.txt to monitor clients that have 3 coprocessors each. Monitoring must be restarted for the updates to take effect. # /opt/cmu/bin/cmu_config_intel 3 You are about to update the CMU ActionsAndAlerts.txt file with metrics for monitoring Intel coprocessors. Continue? [y/N] y Updating CMU monitoring... Monitoring updated successfully. Copy of original /opt/cmu/etc/ActionAndAlertsFile.txt can found in /opt/cmu/etc/ActionAndAlertsFile.txt_before_cmu_config_intel

Please restart the CMU GUI and CMU monitoring to enable these changes.

# /etc/init.d/cmu restart . . To monitor the added metrics, using the GUI select the metrics from the monitoring sensors list as described in “Global cluster view in the central frame” (page 109).

NOTE: Not all metrics are supported by all coprocessors and some lesser-used metrics may be commented out within ActionAndAlertsFile.txt. To introduce or remove metrics from the monitoring sensors list, you can uncomment/comment out the associated lines in ActionAndAlertsFile.txt as described in “Modifying the sensors, alerts, and alert reactions monitored by Insight CMU” (page 122).

6.5 Customizing Insight CMU monitoring, alerting, and reactions 131 NOTE: Insight CMU dynamically determines if a client has working coprocessors when monitoring is initially started after installation on the client. This monitoring process allows for configurations that have clients with coprocessors and clients without coprocessors. If the coprocessors are not working when monitoring is started (or coprocessors are added at a later date), redeploy monitoring to the client (see “Installing the Insight CMU monitoring client” (page 106)) and restart monitoring to ensure the coprocessors are recognized.

6.5.8 Monitoring Insight CMU alerts in HPE Systems Insight Manager

IMPORTANT: This section assumes you have knowledge of HPE Systems Insight Manager (SIM) and Simple Network Management Protocol (SNMP). If you use SIM, then you can create an environment to monitor Insight CMU alerts using SIM. This can be accomplished many ways. This section offers one possible model. You can use this example as an outline for creating a model that works for your environment. Alerts in Insight CMU are similar to events in SIM. However, alerts and events are defined and responded to differently in each product. You define alerts in Insight CMU in the ActionAndAlertFile.txt file. For more information, see “Alerts” (page 121). An alert is raised when the result of the alert's command exceeds a defined threshold relative to its declared operator. When this occurs, the alert is displayed in the Insight CMU GUI. To convey the result of an Insight CMU alert to the SIM Central Management Server (CMS), you can use the Insight CMU alert reaction feature and one of the SIM supported event protocols like SNMP traps. Create an alert reaction for the Insight CMU alert you want to convey. For more information, see “Alert reactions” (page 121). For the alert command, provide a command or script that sends the selected SNMP trap to the SIM CMS. For more information, see Figure 47 (page 132). HPE OpenView NNM, the perl SNMP_util CPAN module, and the Net-SNMP Open Source package are commonly used utilities/libraries for sending SNMP traps. All Insight CMU client alerts are handled through the Insight CMU management node. Insight CMU alert reaction keywords such as CMU_ALERT_NODES can be used to convey the names of the nodes that raised the alert through the SNMP trap.

Figure 47 Insight CMU alert converted to SIM event

To create a complete model for conveying Insight CMU alerts to SIM, you may choose to create your own SNMP Management Information Base (MIB) to handle the alerts you define. For information on how to configure SNMP with SIM, or how to compile and customize MIBs with SIM, see the Systems Insight Manager user guide. 6.5.9 Extended metric support The default monitoring support in Insight CMU executes pre-configured commands on every compute node to extract metric values and then aggregates these values on the management node for display with the GUI. The extended metric support in Insight CMU allows users to gather

132 Monitoring a cluster with Insight CMU metrics on the management node with scripting or any other method and pass the data directly into the GUI for display. The extended metric support consists of a new keyword, EXTENDED, that is configured in the ActionAndAlertsFile.txt file to identify each extended metric, and a new command cmu_submit_extended_metrics in the /opt/cmu/bin/ directory. An example of extended metric support is configuring Insight CMU to monitor workload scheduler information. Typically this information can be viewed by executing a single command that displays status information for all of the compute nodes. Using extended metric support, you can create a script that periodically executes this command, parses the data into a simple format, and passes it to the GUI for monitoring. The following example shows how to monitor the number of nodes that are allocated to jobs. For this example, we gather the list of allocated nodes from SLURM, which is an open-source workload scheduler. Use the SLURM command to list the nodes that are currently allocated: [root@cmumaster ~]# sinfo -t alloc -o "%N" -h node[10-12,14,20-21,33-39,41-48,50-55] [root@cmumaster ~]# To use an Insight CMU tool for expanding names to create a space-separated list of allocated nodes: [root@cmumaster ~]# sinfo -t alloc -o "%N" -h | /opt/cmu/tools/cmu_expand_names -s " " node10 node11 node12 node14 node20 node33 node34 node35 node36 node37 node38 node39 node41 node42 node43 node44 node45 node46 node47 node48 node50 node51 node52 node53 node54 node55 [root@cmumaster ~]# To apply this example to your workload scheduler, replace this SLURM command with the appropriate command from your workload scheduler. To submit this data into Insight CMU: /opt/cmu/bin/cmu_submit_extended_metrics The ‘help’ option describes how to submit data into Insight CMU: [root@cmumaster ~]# /opt/cmu/bin/cmu_submit_extended_metrics -h Usage: /opt/cmu/bin/cmu_submit_extended_metrics -f The filename must exist and contain per-node metric data in the following format: BEGIN_NODE metric1_name metric1_value metric2_name metric2_value ... metricN_Name metricN_value BEGIN_NODE metric1_name metric1_value metric2_name metric2_value ...

The nodelist is typically one node name, but can be a space-separated list of node names if the subsequent metrics and values apply to a given list of nodes. To obtain and submit this data, write a bash script: [root@cmumaster ~]# cat ./allocated_nodes.sh #!/bin/bash

CMU_EXPAND=/opt/cmu/tools/cmu_expand_names CMU_SUBMIT=/opt/cmu/bin/cmu_submit_extended_metrics CMU_NODES=/opt/cmu/bin/cmu_show_nodes file=/tmp/alloc_nodes.txt alloc_nodes=`sinfo -t alloc -o "%N" -h | $CMU_EXPAND -s " "`

# find the list of nodes that are unallocated all_nodes=`$CMU_NODES`

6.5 Customizing Insight CMU monitoring, alerting, and reactions 133 free_nodes=”” for n in $all_nodes; do found=0 for a in $alloc_nodes; do if [ $a = $n ]; then found=1 break fi done if [ $found = 0 ]; then free_nodes=”$free_nodes $n” fi done

# write the file and submit to CMU rm –f $file echo “BEGIN_NODE $alloc_nodes” > $file echo “allocated 1” >> $file echo “BEGIN_NODE $free_nodes” >> $file echo “allocated 0” >> $file

$CMU_SUBMIT –f $file

[root@cmumaster ~]# The script above obtains and submits the "allocated" metric to Insight CMU. The last step is to configure this new metric in the Insight CMU ActionAndAlertsFile.txt file: allocated "nodes allocated to users" 2 numerical Instantaneous 1 alloc EXTENDED /root/allocated_nodes.sh The following table provides explanations of each field in the line example above: Table 4 Extended metric fields

Field Description Example above

Name The name of the extended metric. allocated

Description A brief description of the extended metric. "nodes allocated to users"

Time Multiple This is a "time-to-live" setting. Multiply this 2 number by 5 to determine the number of seconds that the extended metric data is considered valid after being received. If no new metric data is received after this time interval expires, the GUI marks the extended metric data as "Inactive Action”.

Data Type A description of the format of the extended numerical metric data. This is either numerical or string.

Measurement Method This is either Instantaneous or MeanOverTime. Instantaneous Instantaneous means display the latest value. MeanOverTime displays the difference between the current value and the previous value divided by the time interval.

Max Value This is used by the GUI to initialize the metric 1 pies. If this value is exceeded, then the scale of the metric pie will adjust to the new maximum value.

Unit The unit of the extended metric. alloc

134 Monitoring a cluster with Insight CMU Table 4 Extended metric fields (continued)

Field Description Example above

“EXTENDED” keyword Indicates that the metric is submitted by EXTENDED cmu_submit_extended_metrics

script/command The script or command that collects, formats /root/allocated_nodes.sh and submits the metric to CMU

After you finish editing the ActionAndAlertsFile.txt file, you must restart Insight CMU monitoring and the GUI for the modifications to take effect. Insight CMU monitoring will schedule the script to be run each "Time multiple*5" seconds. On a large cluster, your data-gathering script may require additional time to complete. If your data gathering script takes an unsatisfactory amount of time to gather, parse, and submit the data to Insight CMU, then determine that length of time and adjust the 'time multiple' setting to ensure that enough time is allocated to run the script to completion. Otherwise, Insight CMU may display the metric as an "Inactive action" in the GUI. Determine the run time of the script by executing it with the time command: [root@cmumaster ~]# time ./allocated_nodes.sh real 7.036s [root@cmumaster ~]# Then divide the running time by 5 to get the time multiple. In this example: 7/5=2 Note that your data gathering script may obtain, parse, and submit more than one metric to Insight CMU. A typical example of this is gathering multiple temperature readings from a single source, such as through IPMI or from the Onboard Administrator of an HPE Blade enclosure. In this case, you only need to configure the script to run with one metric in the ActionAndAlertsFile.txt file. The other metrics gathered by this script can be configured in the ActionAndAlertsFile.txt file without any command after the EXTENDED keyword. Several preconfigured scripts in /opt/cmu/contrib/ can gather and submit metrics to Insight CMU with the extended monitoring support. These scripts come with README files that document how they work and how they can be configured in Insight CMU. Copy and/or modify these scripts freely to operate correctly on your Insight CMU cluster. • The cmu_IPMI_monitoring script gathers IPMI metrics by querying the Management Card for this information. • The cmu_OA_monitoring script gathers power and temperature readings from the Onboard Administrators of Blade enclosures. • The cmu_get_ganglia_metrics script gathers metrics from the ganglia monitoring daemons.

6.5.9.1 Configuring iLO 4 AMS extended metric support Insight CMU supports gathering server metrics from iLO 4 (or later) via the iLO Agentless Monitoring Support (AMS) and submitting them to Insight CMU via the EXTENDED monitoring support. To enable this support, run the /opt/cmu/bin/cmu_config_ams –c command.

6.5 Customizing Insight CMU monitoring, alerting, and reactions 135 Figure 48 cmu_config_ams command

The cmu_config_ams command configures an AMS submenu in the Insight CMU GUI remote management menu. Verify this by checking the /opt/cmu/etc/cmu_custom_menu file (see above), or by checking the Insight CMU GUI:

Figure 49 Verify AMS submenu

The following three components are included in the Insight CMU AMS with the HPE iLO: 1. Configuring the iLO on each server with a public SNMP read-only port and enabling AMS. 2. Requesting and displaying a full data report of all available iLO data. 3. Configuring iLO SNMP data as metrics to be monitored by Insight CMU.

NOTE: You must configure the iLO with AMS to get the data report and configure the monitoring.

6.5.9.1.1 Configuring the HPE iLO SNMP port To configure the iLO SNMP port and enable AMS, select the servers in the Insight CMU GUI (on the left side) and right-click to bring up the remote management menu. Then select AMS→Configure iLO:

136 Monitoring a cluster with Insight CMU Figure 50 Configure iLO SNMP port

When this command completes, a summary displays which iLOs are successfully configured.

Figure 51 Configure iLO finished

To test which iLOs are configured with AMS, select the nodes. Then select AMS→Test iLO Config.

6.5.9.1.2 Accessing and viewing the HPE iLO data via SNMP Enabling the AMS functionality in the iLO makes it possible to query the iLO for data via an SNMP query:

Figure 52 SNMP query

The published HPE MIBs, which define the SNMP strings, are available on the internet. Insight CMU includes a subset of these MIBs in /opt/cmu/snmp_mibs/. Insight CMU uses these MIBs to translate the SNMP OID strings into human-readable strings when you gather and view the data via the Insight CMU GUI AMS menu options.

6.5 Customizing Insight CMU monitoring, alerting, and reactions 137 To request a complete set of SNMP data from one or more iLOs, select the nodes in the Insight CMU GUI. Then select AMS→Get/Refresh SNMP Data. This process takes several seconds to complete, after which the following window appears:

Figure 53 Get/Refresh SNMP data

Now you can view the data by selecting the nodes in the Insight CMU GUI and selecting AMS→View/Compare SNMP Data.

Figure 54 View/Compare SNMP data

The data is piped through the CMU_Diff filter before being displayed, in case you selected more than one node to compare the data. The SNMP OID string is the first column, followed by the MIB definition for that SNMP OID string, and finally the value of the SNMP OID string. In some cases the SNMP OID string value is translated to a human-readable string based on the definitions provided in the MIB. Insight CMU translates the SNMP OID string and value into the strings defined by the MIB to make the data easier to read and understand.

138 Monitoring a cluster with Insight CMU 6.5.9.1.3 Configuring HPE iLO SNMP metrics in Insight CMU Many of the SNMP data values are static, but some are volatile and worth monitoring such as current temperature and power usage. Insight CMU provides a tool to query a set of pre-configured SNMP OID strings and submit them to Insight CMU for display via the GUI. The pre-configured SNMP OID strings and their corresponding metric name in Insight CMU are in /opt/cmu/ etc/cmu_ams_metrics:

Figure 55 cmu_ams_metrics

If you want to configure additional SNMP OID strings for monitoring, you can add them to this file with their corresponding metric name in Insight CMU. The Insight CMU command to gather this data and submit it to CMU is /opt/cmu/bin/cmu_get_ams_metrics. This command needs a file (-f ) containing the list of nodes with the iLOs to be queried. Or, you can request a query of all the iLOs on all of the nodes in the Insight CMU cluster (-a). Use the -d option on the cmu_get_ams_metrics command to display the data. This option is useful to confirm that this command is retrieving the correct SNMP data from the given nodes. Otherwise the default action is to submit the data to Insight CMU, and this may not work if the metrics are not yet configured in Insight CMU (see below).

6.5 Customizing Insight CMU monitoring, alerting, and reactions 139 Figure 56 cmu_get_ams_metrics

The last step is to configure the SNMP metrics in Insight CMU. Add the following lines to the /opt/cmu/etc/ActionAndAlertsFile.txt file: amb1_temp "ambient temp" 4 numerical Instantaneous 60 Celsius EXTENDED /opt/cmu/bin/cmu_get_ams_metrics -f /opt/cmu/etc/cmu_gen8_nodes cpu1_temp "CPU 1 temp" 4 numerical Instantaneous 60 Celsius EXTENDED cpu2_temp "CPU 2 temp" 4 numerical Instantaneous 60 Celsius EXTENDED power "power usage" 4 numerical Instantaneous 100 watts EXTENDED The cmu_get_ams_metrics command is only added to the amb1_temp metric because it only needs to get invoked once per monitoring cycle, and it provides Insight CMU with all 4 pre-configured SNMP-based metrics. Also note that cmu_get_ams_metrics is invoked with the -f /opt/cmu/etc/cmu_gen8_nodes option in this example. If all of the nodes in your Insight CMU cluster have iLO 4 or later and are configured to support SNMP port queries, then you can replace this option with -a. Otherwise, create a file (as this example shows) that contains a list of the nodes that support iLO SNMP queries, and provide that file to this command. After you are finished configuring the ActionAndAlertsFile.txt file, restart the Insight CMU monitoring and the Insight CMU GUI. These new metrics will appear in the Insight CMU GUI display.

140 Monitoring a cluster with Insight CMU Figure 57 Instant view display

6.5.9.2 Configuring HPE Moonshot power and temperature monitoring Insight CMU supports monitoring Moonshot server power and temperature. To gather and submit data to Insight CMU, use the following command: /opt/cmu/tools/cmu_get_moonshot_metrics By default, this command queries all the iLO Chassis Managers (ILOCMs) associated with the nodes currently in the Insight CMU database and submits the power and temperature data to Insight CMU. Hewlett Packard Enterprise recommends verifying the data can be retrieved from your chassis by first running the command with the -d option. This gathers the data and displays it rather than submitting it to Insight CMU.

Figure 58 Metrics data window

6.5 Customizing Insight CMU monitoring, alerting, and reactions 141 After you verify that the data can be gathered, configure the power and temperature metrics in Insight CMU by adding the following lines to the /opt/cmu/etc/ActionAndAlertsFile.txt file under the ACTIONS section: amb1_temp "ambient temperature" 6 numerical Instantaneous 20 Celsius EXTENDED cpu1_temp "CPU temperature" 6 numerical Instantaneous 40 Celsius EXTENDED /opt/cmu/tools/cmu_get_moonshot_metrics power "power usage" 6 numerical Instantaneous 10 watts EXTENDED

NOTE: The /opt/cmu/tools/cmu_get_moonshot_metrics command is only added to one of the metrics, cpu1_temp because cmu_get_moonshot_metrics must be invoked only once per monitoring cycle to gather all three metrics. If AMS power and temperature metrics are configured (see “Configuring HPE iLO SNMP metrics in Insight CMU” (page 139)), one or more of these metrics may already be defined in the /opt/cmu/etc/ActionAndAlertsFile.txt file. In this case, add the /opt/cmu/tools/cmu_get_moonshot_metrics command to one of the metrics that does not contain a command following the EXTENDED keyword. To begin gathering the metrics, restart Insight CMU monitoring. The new metrics can be viewed in the Insight CMU GUI by following the steps in “Global cluster view in the central frame” (page 109) for choosing sensors to monitor. 6.6 Customizing node static information The cmu_add_feature, cmu_show_features, and cmu_del_feature commands allow you to add and manage more node static information in Insight CMU. For the command syntax, see the manpages or run these commands with –h. The cmu_show_features command default output syntax mimics the output of the pdsh command, which means that you can pipe it through the cmu_diff command to get a condensed display of all of the node static information:

142 Monitoring a cluster with Insight CMU 7 Managing a cluster with Insight CMU

Cluster management tasks can be performed on one or more nodes with Insight CMU. These tasks depend on your privileges and the number of selected nodes. 7.1 Unprivileged user menu When the Insight CMU GUI is in normal mode, you can only monitor node status and visualize static data. You cannot perform any other action on the cluster nodes because of potentially destructive actions. 7.2 Administrator menu When the Insight CMU GUI is in administrator mode, you can perform all available actions on the cluster nodes. For information about administrator mode, see “Administrator mode” (page 47). When one or more nodes are selected, right-click to access a contextual menu. The contextual menu enables you to execute actions on selected nodes. This contextual menu is available in network group, image group, and custom group views. This menu is also accessible by right-clicking in the overview frame.

Figure 59 Contextual menu for administrator

7.1 Unprivileged user menu 143 7.3 SSH connection This menu is only available when one node is selected. A secure shell session is launched to the selected node with ssh. 7.4 Management card connection This menu is only available when one node is selected. A telnet or secure shell session is launched to the management card of the selected node. The management card must be properly configured when Insight CMU is installed. If the node does not have a management card, then this menu is disabled. 7.5 Virtual serial port connection This option is only available when one node is selected. A secure shell session is launched to the management card of the selected node. Then the appropriate command is automatically issued to open a virtual serial port on the management card. 7.6 Shutdown This action is available when one or more nodes are selected. This task enables a system administrator to issue the shutdown command on selected nodes. The shutdown command can be performed immediately by default or delayed for a specified time between 1 to 60 minutes. The administrator can also send a message to the users on the selected nodes by filling in the message box.

Figure 60 Halt dialog

IMPORTANT: Several ProLiant and SMP servers do not support HPE APM. If the nodes are not linked to a management card, then do not use the shutdown command. Otherwise, the nodes might hang and require a manual shutdown. Use the reboot command instead. The shutdown command is performed on nodes using rsh or ssh. On the compute node, permission must be given to perform commands as superuser or root from the management node. Otherwise, the shutdown command does not work properly. 7.7 Power off When one or more nodes are selected, this task enables you to power off the nodes that have a management card. Nodes to be powered off must have the same management card password.

IMPORTANT: You must use the shutdown command before powering off, or the file systems might be damaged.

Figure 61 Power off dialog box

144 Managing a cluster with Insight CMU 7.8 Boot When one or more nodes are selected, this task enables you to boot a collection of nodes on their own local disk or over the network. You must select nodes to be booted prior to running this command. The boot procedure uses the management card of each node. The password for the management card must be entered. Nodes to be booted must have the same management card password.

IMPORTANT: If the nodes are booted, the boot procedure attempts a proper shutdown. If the shutdown fails, then the management card resets which can damage the file system. To avoid this risk, perform a shutdown operation before issuing a boot command.

Figure 62 Boot dialog box

7.9 Reboot When one or more nodes are selected, this task enables a system administrator to issue the reboot command on selected nodes. The reboot command is performed immediately by default, or delayed for a specified time between 1 to 60 minutes. The administrator can also send a message to the users logged on selected nodes by filling in the message edit box.

IMPORTANT: The reboot command is performed on nodes using rsh or ssh. On the compute node, permission must be given from the management node to perform commands as superuser or root. Otherwise, the reboot command does not work properly.

Figure 63 Reboot dialog box

7.10 Change UID LED status When one or more nodes are selected, this task changes the status of the locator LED on selected nodes. If the switch is on, it will turn on the LED on selected nodes.

NOTE: This menu is available only if the node has an iLO management card properly registered in the Insight CMU database and the system is equipped with a status LED. 7.11 Multiple windows broadcast This task is available when one or more nodes are selected. The following connections are available for multiple windows broadcast: • A secure shell connection through the network, when the network is up on selected nodes. • Connection through the management card, if selected nodes have a management card. • Connection to the virtual serial port through management card.

7.8 Boot 145 The multiple windows broadcast command launches a master console window and concurrent mirrored secure shell sessions embedded in a CMU term on all selected nodes. All input entered on the master console window is broadcast to the secure shell sessions an selected nodes. This enables the system administrator to issue the same command on several nodes by entering it only once. To issue commands to a specific node, enter the input directly in the selected CMU term for that node. To improve the CMU term window display appearance, every window can be shifted in x and y from the previous one to fit on the screen. By default, the shift values are computed so the windows tile the screen and no window appears outside of the screen. To paste the content of your clipboard in all terminals, click Paste in the master console.

Figure 64 Multiple windows broadcast command

NOTE: Hewlett Packard Enterprise recommends limiting the multiple window command to 64 nodes at the same time. 7.12 Single window pdsh The single window pdsh command uses a single terminal to execute commands on several nodes in parallel, simultaneously. The output of pdsh can be piped to a filter which formats the output from all the selected nodes in a synthetic form that omits repetitions of identical results. You can choose among three filtering options: • cmudiff [interactive] (default) • cmudiff [non-interactive] • dshbak

146 Managing a cluster with Insight CMU Figure 65 pdsh window

You can toggle the two filters on and off using dshbak or cmudiff. These two filters are mutually exclusive, so you can perform the following: • Filter with cmudiff • Filter with dshbak • Use no filter 7.12.1 cmudiff examples

Example 1 date command

The cmudiff output is two fields separated by dotted lines. The header displays the following information: • The number of responses, 4 in this example (This amount means a response has been received from 4 compute nodes.) • The reference node (This is the node chosen by cmudiff as a reference. Differences in output from this reference node are highlighted.) • The number of ignored lines

7.12 Single window pdsh 147 • The number of output lines The output line appears below the header. In this example, the output is only 1 line: • The “m”, which appears on the left, indicates that the output from some compute node differs from the reference node. • Some details about output processing results are provided on the right. Characters that differ from the reference node are highlighted in red. In this example, the time drift in the “seconds” field differs. Depending on the output length, the output of cmudiff can be piped to the less editor to enable scrolling through the output with arrows. Output editing is terminated by entering q.

Example 2 dmidecode command

This example uses cmudiff to detect BIOS firmware differences with the dmidecode command. cmu_pdsh> dmidecode

NOTE: The window shows only 2% of the output. Use the arrows to scroll up and down. A difference is found in the BIOS release date. The comment “2 populations, not displayed” on the right suggests that two groups of nodes are present with two different BIOS release dates. One of the two populations might be a single node without a firmware upgrade. Display the full list of cmudiff options: cmu_pdsh> cmudiff –h

148 Managing a cluster with Insight CMU Narrow the search of the failing nodes with the -d option to display node populations: cmu_pdsh> cmudiff –d cmudiff filter is , with parameters -d cmu_pdsh> cmu_pdsh> dmidecode

The comment now shows “(2 populations) o185i[040,042] are 83% similar”. This comment suggests that those two compute nodes have a different BIOS release date than all other nodes.

7.12 Single window pdsh 149 NOTE: A nonresponsive node in the node selection for single window pdsh causes the answer from other nodes to be delayed until a timeout occurs from the nonresponsive node. You can reduce this delay by setting the value in the ConnectTimeout in .ssh/config variable. For example: # vi /root/.ssh/config Host * StrictHostKeyChecking no ConnectTimeout 1

7.13 Parallel distributed copy (pdcp) The pdcp task enables you to copy a file from the Insight CMU administration server to multiple nodes simultaneously. To copy the file: 1. Right-click the nodes you want to copy to. 2. On the contextual menu, select pdcp (distributed copy). The following window opens:

Figure 66 Parallel distributed copy window

3. Complete the Source and Destination fields, and then click OK to execute the distributed copy. 7.14 Custom group management Custom groups are not required for backup and cloning operations. However, you can use the Custom Group Management window to add, delete, or rename a custom group. A custom group is a set of nodes named by the Insight CMU administrator. Each node can belong to several custom groups. To perform tasks using the Custom Group Management option, click Cluster Administration and then select Custom Group Management. 7.14.1 Adding custom groups 1. In the Custom Group Management window, click Create. 2. Enter the new custom group name. 3. Click OK.

150 Managing a cluster with Insight CMU Figure 67 Custom group management

Select any number of nodes from the list of “Nodes in Cluster” on the left and use the arrows to move the nodes to the list of “Nodes in Custom Group” on the right. 7.14.2 Deleting custom groups 1. In the Custom Group Management window, select the custom group to delete. 2. Click Delete. 3. Click OK. 7.15 HPE Insight firmware management Insight CMU provides support for managing your firmware. You can view and compare BIOS settings and BIOS firmware versions across a set of chosen nodes. This functionality helps you confirm that your cluster is configured correctly and consistently. Insight CMU also provides the ability to run a firmware executable on a set of nodes to upgrade your firmware to the latest version available. This ensures that your cluster hardware is performing efficiently and consistently. This firmware executable can be an online flash component, or a firmware RPM. 7.15.1 Viewing and analyzing BIOS settings Use the Insight conrep tool to extract BIOS settings from each node to a file. The conrep tool is freely available from the HPE Support Center at www.hpe.com/support/hpesc. It can be found separately or packaged within the Smart-Start Scripting Tool Kit (SSSTK). Insight CMU provides the latest conrep kit available at release time. If a different version of conrep is required for the servers in your cluster (for example, the current version of conrep is incompatible with RHEL 5.10): 1. Download the appropriate version of conrep for your environment from the Hewlett Packard Enterprise Support website. (The conrep binary might be packaged within the Smart Start Scripting Toolkit).

7.15 HPE Insight firmware management 151 2. Copy the conrep binary and the conrep.xml file to a location on your Insight CMU management server. 3. Configure the full path and file name of the new conrep binary. a. Edit the CMU_BIOS_SETTINGS_TOOL variable in /opt/cmu/etc/cmuserver.conf to point to the location of this new conrep binary. b. Change the CMU_BIOS_SETTINGS_FILE variable to point to the location of the new conrep.xml file. 4. Insight CMU is now ready to use the new conrep binary and conrep.xml file. Select one or more nodes in the GUI. 5. Click Show BIOS Settings. a. If an error occurs, additional software might be required on your compute nodes by the conrep binary. Log into one of the selected nodes. b. Change the directory to /opt/cmu/tmp/conrep. c. To identify the missing library, run conrep –h. d. When the conrep –h command runs correctly, then the Show BIOS Settings feature is enabled on this server. The conrep tool also requires an XML file containing the information necessary to interpret the BIOS flash memory data on your server into human-readable text. Insight CMU is pre-configured with the most common XML file, but depending on your server type this common XML file might not be compatible with your servers. If your servers require a special XML file, configure the CMU_BIOS_SETTINGS_FILE variable in /opt/cmu/etc/cmuserver.conf to the full path and file name of the correct XML file.

NOTE: The 64-bit conrep tool shipped with the Insight CMU kit requires the hp-health package to be installed on the compute nodes. However, the Show BIOS Settings feature works without any issues when the compute nodes are booted into the Insight CMU network image. The BIOS settings are extracted to a local file on each node. Display the contents of those files using the cmu_dsh command with the CMU_Diff filter. This allows the user to identify different settings across the set of chosen nodes. The following items are located in /opt/cmu/tmp/conrep/ on each selected node: • A copy of the conrep binary • A copy of the conrep XML file • The file containing the BIOS settings 7.15.2 Checking BIOS versions To check the BIOS version on sets of selected nodes, Insight CMU extracts the BIOS Vendor, Version, and Release Date fields from the output of dmidecode and concatenates them with hyphens to form a single string. These strings are aggregated with the cmu_dsh command and filtered using dsh_bak to provide a condensed display of the sets of nodes running common BIOS versions. 7.15.3 Installing and upgrading firmware Most servers provide the ability to upgrade the firmware while the server is running by invoking an online ROM flash executable, or firmware executable. This is a Linux executable that tests for the correct server type and installs a later version of the firmware. Newer firmware executables for Linux are packaged in an RPM format. You can obtain this firmware executable or firmware RPM from the HPE Support Center at www.hpe.com/support/hpesc. Copy it to /opt/cmu/firmware/. Then, you can use Insight CMU to select the nodes to upgrade, and these binaries are copied and executed in parallel. If

152 Managing a cluster with Insight CMU a firmware RPM is given to Insight CMU, it will be unpacked and the firmware executable is copied to the selected nodes and executed in parallel. By default, Insight CMU executes these binaries with the -s option, which tells the binary to run in script mode. If necessary, you can change the arguments by editing the CMU_FIRMWARE_EXECUTABLE_ARGUMENTS variable in /opt/cmu/etc/cmuserver.conf. Hewlett Packard Enterprise recommends installing the firmware executable on one node first, to test the operation. After the binary finishes executing, you must reboot the node for the new firmware to take effect. If this process is successful, then you can use Insight CMU to duplicate this process on a larger set of nodes. 7.16 Customizing the GUI menu You can add your own menu options to the Insight CMU GUI. For more information, see the file /opt/cmu/etc/cmu_custom_menu. This file includes instructions on including your GUI menu option and provides commented ready to use examples. When you add a custom GUI option, the corresponding command is also available from /opt/ cmu/cmucli. For example: 1. In the /opt/cmu/etc/cmu_custom_menu file, uncomment the following line: SERVER;audit|dmidecode;/opt/cmu/bin/cmu_dsh -f CMU_TEMP_NODE_FILE -c "dmidecode" -e "-b -n -v0 -R0" 2. Run the CLI. 3. cmu> custom_run Title Command ------|------audit|dmidecode /opt/cmu/bin/cmu_dsh -f CMU_TEMP_NODE_FILE -c "dmidecode" -e "-b -n -v0 -R0" cmu> The available custom commands appear. 4. Run the dmidecode command on node10 from the CLI. cmu> custom_run "audit|dmidecode" node10

7.16.1 Saving user settings Users can save and restore preferences locally in the cmu_gui_local_settings file without cluster administrator privileges. 7.17 Insight CMU CLI 7.17.1 Starting a CLI interactive session The cmucli command can be invoked at any time to start the CLI session in interactive mode. # cmucli The following output appears: CMU - Cluster Management Utility (c)HP Competency Center

Start of CMU CLI Session Wed Jan 4 17:30:49 CET 2006

cmu> 7.17.2 Basic commands These commands help you manage the cluster in command-line mode. The command-line mode is used interactively or in a script mode, giving a file name when invoking the command. Each command is executed on a specified set of nodes. Use a complete list of nodes, or a regular expression to specify the nodes. This interface includes the administration, backup, and cloning features of Insight CMU.

7.16 Customizing the GUI menu 153 Exiting an Insight CMU CLI interactive session cmu> exit #

Starting a noninteractive Insight CMU CLI session To start the CLI session and execute commands from a file: # cmucli my_path/my-file

NOTE: The file must contain a set of valid Insight CMU commands. The available commands and syntax are described in the following sections.

Management card naming Management card naming differs between Insight CMU GUI and Insight CMU CLI. iLO in Insight CMU GUI is ILO in Insight CMU CLI.

Help commands To get help during a CLI session, use the help command. This command displays all available commands of Insight CMU CLI. cmu> help HELP COMMANDS help administration | help database help configuration | help other or help

DATABASE COMMANDS group | node image_group | network_group custom_group

ADMINISTRATION COMMANDS boot | broadcast halt | shutdown locate | power reboot | modify_password backup | clone change_kernel | kickstart | autoinstall pdcp | custom run

CONFIGURATION COMMANDS add_node | add_{ks|ai}_image_group add_image_group | add_network_group add_custom_group | delete_node delete_image_group | delete_network_group delete_custom_group | add_to_image_group add_to_network_group | add_to_custom_group del_from_image_group | del_from_network_group del_from_custom_group | change_active_image_group probe_kernel | scan_macs

OTHER COMMANDS Exit | cat Date | echo sh | vi #

Getting help for a command To get detailed information and the exact syntax of a command:

154 Managing a cluster with Insight CMU # help command_name For example, to get more information about the halt command: cmu> help halt Delay can be anything in "now, 1, 5, 10, 15, 30, 60" minutes halt delay "mesg" all halt all nodes halt delay "mesg" node_1 halt node node_1 halt delay "mesg" node_1 node_2 halt nodes node_1 node_2 halt delay "mesg" node_1 - node_n halt all nodes between node_1 and node_n halt delay "mesg" node_* halt all nodes starting with node_ word halt delay "mesg" allbut node_1 halt all nodes except node_1 halt delay "mesg" all group_1 halt nodes of image group group_1 halt delay "mesg" all group_1 but node_exp halt nodes of image group group_1 except node_exp halt delay "mesg" all group_1 group_2 halt nodes of group_1 and group_2 cmu>

Displaying image groups of a cluster The groups command displays the list of the image groups. cmu> groups list of group(s) with active nodes : debian default nodevmap pfmon sfs2 list of available group(s) for backup and cloning : default sfs2 suse10 pfmon testrh3u4 debian nathclontest nodevmap cmu> You can also call this command followed by a group name. The result displays all active nodes of the group. cmu> groups default active node list selected: o185i194 o185i202 o185i216 o185i222 o185i233 o185i243 o185i252 o185i253 o185i254 cmu>

Displaying nodes of an image group To view nodes in a group and the attributes: cmu> groups For example: cmu> groups default active node list selected: o185i194 o185i202 o185i216 o185i222 o185i233 o185i243 o185i252 o185i253 o185i254 cmu>

Displaying nodes of the cluster The nodes command displays the list of nodes with the attribute name. cmu> nodes Machines.node.n1 = 192.168.1.1,255.255.255.0,00-19-BB-3A-8A-60,rh6u7,192.168.1.101,ILO,x86_64,-1,-1,generic,default,default,default,default,auto,default,default,none

Machines.node.n2 = 192.168.1.2,255.255.255.0,00-19-BB-3A-A8-64,rh6u7,192.168.1.102,ILO,x86_64,-1,-1,generic,default,default,default,default,auto,default,default,none

Machines.node.n3 = 192.168.1.3,255.255.255.0,00-1A-4B-DE-19-54,default,192.168.1.103,ILO,x86_64,-1,-1,generic,default,default,default,default,auto,default,default,none cmu> You can also use a subset of this command. The next section describes how to specify subset of nodes in the Insight CMU CLI.

7.17 Insight CMU CLI 155 7.17.3 Specifying nodes

NOTE: The following commands are Insight CMU commands. You can only run them in the Insight CMU CLI. Using the Insight CMU CLI, you can execute a command on any number of nodes. The nodes command: • Displays node information • Tests regular expressions for selecting nodes before executing a command

Executing a command on one node To execute a command on only one node, you must specify the name of the node. The following command executes on node o185i222: cmu> node o185i222 active node list selected: o185i222 cmu>

Executing a command on a list of nodes To execute a command on multiple nodes, you must specify the names of nodes. cmu> boot o185i222 o185i233 o185i243 active node list selected: o185i222 o185i233 o185i243 cmu>

Executing a command on a range of nodes To execute a command on a range of nodes, you must specify the range using their attributes. Commands are executed on all nodes within the range. cmu> boot o185i222 - o185i225 active node list selected: o185i222 o185i223 o185i224 o185i225 cmu>

IMPORTANT: Spaces between “-” and node names are mandatory.

Using wildcards You can use the * as a wildcard to select all nodes with matching node names. For example: cmu> nodes o185i22* active node list selected: o185i220 o185i221 o185i222 o185i223 o185i224 o185i225 o185i226 o185i227 o185i228 o185i229 cmu>

Complex list of nodes A complex list of nodes can be specified using any combination of the above regular expressions.

NOTE: If a node is mentioned twice, the command is executed twice on this node. To boot specific nodes: cmu> boot o185i209 o185i217 o185i22* o185i232 - o185i235 active node list selected: o185i209 o185i217 o185i220 o185i221 o185i222 o185i223 o185i224 o185i225 o185i226 o185i227 o185i228 o185i229 o185i232 o185i233 o185i234 o185i235

cmu>

Executing a command on all nodes A command followed by the all option is executed on all nodes.

156 Managing a cluster with Insight CMU cmu> boot all active node list selected: o185i192 o185i193 o185i194 o185i195 o185i196 o185i197 o185i198 o185i199 o185i200 o185i201 o185i202 o185i203 o185i204 o185i205 o185i206 o185i207 o185i208 o185i209 o185i210 o185i211 o185i212 o185i213 o185i214 o185i215 o185i216 o185i217 o185i218 o185i219 o185i220 o185i221 o185i222 o185i223 o185i224 o185i225 o185i226 o185i227 o185i228 o185i229 o185i230 o185i231 o185i232 o185i233 o185i234 o185i235 o185i236 o185i237 o185i238 o185i239 o185i240 o185i241 o185i242 o185i243 o185i244 o185i245 o185i246 o185i247 o185i248 o185i249 o185i250 o185i251 o185i252 o185i253 o185i254 o185i255

cmu>

Excluding specific nodes from a command Use the allbut option to select all nodes except specific ones. The allbut option can be followed by any combination of the above regular expressions. cmu> boot allbut o185i2* active node list selected: o185i192 o185i193 o185i194 o185i195 o185i196 o185i197 o185i198 o185i199 cmu>

Executing a command on all nodes of an image group You can use the all option followed by a group name to select all active nodes of this group. cmu> boot all default active node list selected: o185i194 o185i202 o185i216 o185i222 o185i233 o185i243 o185i252 o185i253 o185i254 cmu>

Executing a command on specific nodes of an image group You can use the but option to exclude active nodes of a group from the selection. Nodes to exclude can be specified with any combination of regular expressions. cmu> boot all default but o185i222 - o185i252 active node list selected: o185i194 o185i202 o185i216 o185i253 o185i254 cmu> 7.17.4 Administration and cloning commands

Booting a set of nodes You can boot any number of nodes in the cluster. Regular expressions to specify a list of nodes described above are accepted. The following boot modes are available: • Normal mode boots nodes on the default device, generally the hard drive. • Network mode fills the dhcpd.conf file to boot nodes over the network. cmu> boot [net] cmu_nodes_regular_expression Booting node o185i192 in normal mode on the hard drive: cmu> boot o185i192 active node list selected: o185i192 Entering normal Boot Booting node o185i192 Not using ssh/rsh shutdown because node o185i192 is not pinging Doing a powercycle of node o185i192 via lo100i (OFF then ON) Boot order has been sent to every node Boot process finished !! Please read /opt/cmu/log/Boot-lo100i-Tue-08-Aug-at-12h36m23s.log to check errors cmu> Booting node o185i192 in network mode: cmu> boot net o185i192 active node list selected: o185i192

Cleaning previous kickstart boot option Copying /etc/hosts to /opt/cmu/ntbt/rp/etc/

7.17 Insight CMU CLI 157 Entering Network boot Writing dhcpd.conf dhcpd.conf written successfully

Booting node o185i192 Not using ssh/rsh shutdown because node o185i192 is not pinging Doing a powercycle of node o185i192 via lo100i (OFF then ON) Waiting 1 second (cooldown)...

Waiting 5 minutes before removing the dhcpd.conf

Boot process finished !!

Please read /opt/cmu/log/Boot-lo100i-Tue-08-Aug-at-12h40m36s.log to check errors cmu>

Broadcasting commands to a set of nodes Broadcast launches the interactive single window broadcast utility. You can use the regular expressions described above to specify nodes. The command uses telnet and secure shell.

NOTE: This command has limited functionality. For more information on this command, see “Multiple windows broadcast” (page 145). cmu> broadcast cmu_nodes_regular_expression To broadcast on all nodes of the cluster: cmu> broadcast all selected nodes: o185i192 o185i193 o185i194 o185i195 o185i196 o185i197 o185i198 o185i199 o185i200 o185i201 o185i202 o185i203 o185i204 o185i205 o185i206 o185i207 o185i208 o185i209 o185i210 o185i211 o185i212 o185i213 o185i214 o185i215 o185i216 o185i217 o185i218 o185i219 o185i220 o185i221 o185i222 o185i223 o185i224 o185i225 o185i226 o185i227 o185i228 o185i229 o185i230 o185i231 o185i232 o185i233 o185i234 o185i235 o185i236 o185i237 o185i238 o185i239 o185i240 o185i241 o185i242 o185i243 o185i244 o185i245 o185i246 o185i247 o185i248 o185i249 o185i250 o185i251 o185i252 o185i253 o185i254 o185i255

CMU pdsh interface dshbak filter is help|h|? to get help cmu_pdsh>

Rebooting a set of nodes Enables rebooting any number of nodes in the cluster. You can use all the regular expressions previously described. cmu> reboot delay "message" cmu_nodes_regular_expression where delay Indicates the time period after which the specified nodes are rebooted. It can be “now” or after “1”, “5”, “10”, “15”, “30”, or “60” minutes. message Is a string that appears on every rebooted node. cmu> reboot 1 "Reboot for maintenance" o185i192 active node list selected: o185i192

Rebooting the nodes ..Return-->Command too long to execute<--:wait_for_command: .Please read /opt/cmu/log/Reboot.log to check errors cmu>

Halting a set of nodes Halts any number of nodes in the cluster. You can use the regular expressions previously described. cmu> halt delay "message" cmu_nodes_regular_expression

158 Managing a cluster with Insight CMU where delay Indicates the time period after which the specified nodes are halted. It can be “now” or after “1”, “5”, “10”, “15”, “30”, or “60” minutes. message Is a string that appears on every halted node. cmu> halt now "Halt for maintenance" o185i192 active node list selected: o185i192

Halting the nodes ..Please read /opt/cmu/log/Halt.log to check errors cmu>

Powering off a set of nodes Powers off any number of nodes in the cluster. You can use all the regular expressions previously described. cmu> power off cmu_nodes_regular_expression For example: cmu> power off o185i192 active node list selected: o185i192

Please read /opt/cmu/log/PowerOff.log for errors. cmu>

Setting the locator LED on or off Sets the locator LED of any number of nodes on or off. You can use the regular expressions previously described. cmu> locate on|off cmu_nodes_regular_expression For example: cmu> locate on o185i192 active node list selected: o185i192 cmu> For example: cmu> locate off o185i192 active node list selected: o185i192 cmu>

Cloning a set of nodes Clones an image to a node or a set of nodes. You can use all the regular expressions previously described. After cloning, successfully cloned nodes are active in the image group associated with the image. Failed nodes are active in the image group “default”. cmu> clone "image_name" cmu_nodes_regular_expression For example: cmu> clone "cluster " o185i195 active node list selected: o185i195 node list found in backup group cluster: o185i195 Save /opt/cmu/log/cmucerbere.log file Cleaning boot directory Configuring the system [INFO] CMU does not seem to be running

7.17 Insight CMU CLI 159 Copying the ssh settings Rebuilding network-boot image /opt/cmu/tmp/GUI/config.txt was rewritten Starting cloning Every 2.0s: /opt/cmu/tools/logAnalyser.sh

Cloning started on 2006-08-08 at [17:07:55] +------+------+ | NET BOOTING | | +------+------+ | PARTITION & FORMAT DISKS | | +------+------+ | GETTING DATA | | +------+------+ | CLONING ERROR | | +------+------+ | CLONED | o185i195 | +------+------+

Cloning process finished on 2006-08-08 at [17:10:51]

Database report: | cloned | error | unknown ne1 | 1 | 0 | 0 sfs | 0 | 0 | 0 ne2 | 0 | 0 | 0 ne3 | 0 | 0 | 0 ne4 | 0 | 0 | 0 test | 0 | 0 | 0 Total | 1 | 0 | 0

Detailed logs are in /opt/cmu/log/cmucerbere.log and/opt/cmu/log/cmucerbere-*.log [INFO] CMU does not seem to be running /opt/cmu/tmp/GUI/config.txt was rewritten cmu>

Adding a new image group The add_image_group command creates a new image group. Parameters are specified on one line: cmu> add_image_group groupname "device" For example: cmu> add_image_group my_image_group "sda" processing 1 image group ...

Adding nodes to an image group The add_to_image_group command adds nodes to an image group. Parameters are specified on one line: cmu> add_to_image_group nodes to group_name For example: cmu> add_to_image_group o184i115 - o184i116 to my_image_group selected: o184i115 { 2 nodes } processing 2 nodes...

Backing up a node Backs up one node in an existing image group. Four possible syntaxes are available for this command. You can specify all the parameters in one line, or use only the backup command to get an interactive backup menu. For more information about backup CLI commands, see “Backing up a golden compute node” (page 76). cmu> backup

160 Managing a cluster with Insight CMU Or: cmu> backup "image_name" "root_partition,other_partition" node_name

IMPORTANT: Prior to using the above backup syntax, you must know the complete name of all partitions that should be backed up. If some partitions are omitted, then nodes which are cloned from the image created by the backup command might not function correctly. For example, an image of a system with separate / and /boot partitions cannot be cloned successfully if you omit /boot from the list of partitions to back up.

Or: cmu> backup "image_name" node_name

IMPORTANT: The above method works only on golden nodes running Linux and accessible via ssh. Or: cmu> backup "image_name" uuid "root_partition_uuid" node_name

IMPORTANT: On Linux nodes, use the blkid tool in Linux to find the UUID of the root partition. On Windows, run fsutil fsinfo ntfsinfo from the command prompt in Administrator mode. (Windows is only available on specific Moonshot cartridges.) The following example backs up three partitions on the hard disk of a node, where sda1 is the root partition, test_julien is the image group name, and o185i195 is the node name. cmu> backup "test_julien" "sda1,sda5,sda9" o185i195 active node list selected: o185i195 node list found in backup group test_julien: o185i195 [INFO] CMU does not seem to be running Copying the ssh settings hostname setup ok logout ok Booting the Node over the network Waiting for node o185i195 to boot /opt/cmu/tmp/GUI/config.txt was rewritten [WARNING] Timeout have expired while waiting for the node to network boot [WARNING] Backup may fail Save /opt/cmu/log/cmudolly.log file tail -f /opt/cmu/log/cmudolly.log Starting retrieving fstab [16:15:13] [Dolly] Running OS : "Linux/2.6.16-rc6-git1-4-bigsmp" [16:15:13] OSTYPE:Linux-CMU [16:15:13] [DollyClient] Starting to get fstab files [16:15:13] [DollyClient] Getting "/opt/cmu/tmp/fstab.txt" [16:15:14] [DollyClient] fstab of /dev/sda1 received and stored into /opt/cmu/tmp/fstab.txt [16:15:14] [DollyClient] Executing: /bin/grep "LABEL" /opt/cmu/tmp/fstab.txt | /usr/bin/wc -l >/opt/cmu/tmp/number_of_label [16:15:14] [DollyClient] No label in /opt/cmu/tmp/fstab.txt fstab file [16:15:14] [DollyClient] Executing: /bin/rm -rf /opt/cmu/image/test_julien/fstab-label.txt [16:15:14] [DollyClient] Executing: /bin/cp /opt/cmu/tmp/fstab.txt /opt/cmu/image/test_julien/fstab-device.txt [16:15:14] Dolly Terminated...Starting backup process [16:15:17] [Dolly] Add partition "sda1" [16:15:17] [Dolly] Add partition "sda5" [16:15:17] [Dolly] Add partition "sda9" [16:15:17] [Dolly] Running OS : "Linux/2.6.16-rc6-git1-4-bigsmp" [16:15:17] OSTYPE:Linux-CMU [16:15:17] [DollyClient] This is not SLES9 or above. No specific work to do [16:15:17] [DollyClient] Starting to get image [16:15:17] [DollyClient] Asking for main devices list [16:15:17] [DollyClient] Device is sda [16:15:17] [DollyClient] Device is sda [16:15:17] [DollyClient] Asking for partition table of "/dev/sda" [16:15:17] [DollyClient] Getting /opt/cmu/image/test_julien/parttbl-sda.txt [16:15:17] [DollyClient] Getting /opt/cmu/image/test_julien/parttbl-sda.raw [16:15:17] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda1.tgz (1/3) [16:15:27] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda5.tgz (2/3) [16:15:49] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda9.tgz (3/3) [16:15:49] [DollyClient] Image test_julien received [16:15:49] Dolly Terminated... [INFO] CMU does not seem to be runningbackup successThe golden image node will be rebooted in 30 seconds /opt/cmu/tmp/GUI/config.txt was rewritten cmu> The following example is an interactive backup. cmu> backup image name? > test_julien

7.17 Insight CMU CLI 161 partition list(root,part_1,part_2,...)? > sda1,sda5,sda6,sda8,sda9 node name? > o185i195 node list found in backup group test_julien: o185i195 [INFO] CMU does not seem to be running Copying the ssh settings /opt/cmu/tmp/GUI/config.txt was rewritten -- Booting the Node over the network Waiting for node o185i195 to boot Save /opt/cmu/log/cmudolly.log file tail -f /opt/cmu/log/cmudolly.log Starting retrieving fstab [16:25:03] [Dolly] Running OS : "Linux/2.6.16-rc6-git1-4-bigsmp" [16:25:03] OSTYPE:Linux-CMU [16:25:03] [DollyClient] Starting to get fstab files [16:25:03] [DollyClient] Getting "/opt/cmu/tmp/fstab.txt" [16:25:03] [DollyClient] fstab of /dev/sda1 received and stored into /opt/cmu/tmp/fstab.txt [16:25:03] [DollyClient] Executing: /bin/grep "LABEL" /opt/cmu/tmp/fstab.txt | /usr/bin/wc -l >/opt/cmu/tmp/number_of_label [16:25:03] [DollyClient] No label in /opt/cmu/tmp/fstab.txt fstab file [16:25:03] [DollyClient] Executing: /bin/rm -rf /opt/cmu/image/test_julien/fstab-label.txt [16:25:03] [DollyClient] Executing: /bin/cp /opt/cmu/tmp/fstab.txt /opt/cmu/image/test_julien/fstab-device.txt [16:25:03] Dolly Terminated... Starting backup process [16:25:06] [Dolly] Add partition "sda1" [16:25:06] [Dolly] Add partition "sda5" [16:25:06] [Dolly] Add partition "sda6" [16:25:06] [Dolly] Add partition "sda8" [16:25:06] [Dolly] Add partition "sda9" [16:25:06] [Dolly] Running OS : "Linux/2.6.16-rc6-git1-4-bigsmp" [16:25:06] OSTYPE:Linux-CMU [16:25:06] [DollyClient] This is not SLES9 or above. No specific work to do [16:25:06] [DollyClient] Starting to get image [16:25:06] [DollyClient] Asking for main devices list [16:25:06] [DollyClient] Device is sda [16:25:06] [DollyClient] Device is sda [16:25:06] [DollyClient] Device is sda [16:25:06] [DollyClient] Device is sda [16:25:06] [DollyClient] Asking for partition table of "/dev/sda" [16:25:06] [DollyClient] Getting /opt/cmu/image/test_julien/parttbl-sda.txt [16:25:07] [DollyClient] Getting /opt/cmu/image/test_julien/parttbl-sda.raw [16:25:07] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda1.tgz (1/5) [16:25:17] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda5.tgz (2/5) [16:25:38] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda6.tgz (3/5) [16:25:45] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda8.tgz (4/5) [16:25:46] [DollyClient] Getting /opt/cmu/image/test_julien/partarchi-sda9.tgz (5/5) [16:25:46] [DollyClient] Image test_julien received [16:25:46] Dolly Terminated... backup success The golden image node will be rebooted in 30 seconds /opt/cmu/tmp/GUI/config.txt was rewritten cmu> The following example backs up all partitions on the OS disk of a node, where rhel6u5-auto is the image group name and n3 is the node name. Before starting the backup operation, the golden node must be up and running with a Linux OS and reachable via ssh. cmu> backup "rhel6u5-auto" n3 backup id is 16384 log file is /opt/cmu/log/cmudolly-16384.log copying ssh settings hostname setup ok logout ok root partition UUID = a0ff813c-26dc-4160-9c88-647db594703b netbooting node waiting 60 secs for node n3 to boot netbooted correctly,removing dhcp entry tail -f /opt/cmu/log/cmudolly-16384.log Starting retrieving fstab [25-Aug-2014_14:20:09] [Dolly] Running OS : "Linux-RedHat" [25-Aug-2014_14:20:09] OSTYPE:Linux-CMU [25-Aug-2014_14:20:09] [DollyClient] Starting to get fstab [25-Aug-2014_14:20:09] [DollyClient] Getting "/opt/cmu/image/rhel6u5-auto/fstab-device.txt" [25-Aug-2014_14:20:41] Dolly Terminated... starting backup process... partition list = disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part3, disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1 [25-Aug-2014_14:20:41] [Dolly] Add partition "disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part3" [25-Aug-2014_14:20:41] [Dolly] Add partition "disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1" [25-Aug-2014_14:20:41] [Dolly] Running OS : "Linux-RedHat" [25-Aug-2014_14:20:41] OSTYPE:Linux-CMU [25-Aug-2014_14:20:41] [DollyClient] This is not SLES9 or above. No specific work to do [25-Aug-2014_14:20:41] [DollyClient] redhat like OS, launching the script to detect the nic configuration file name used for the cloning [25-Aug-2014_14:20:41] [DollyClient] Starting to get image [25-Aug-2014_14:20:41] [DollyClient] Asking for main devices list [25-Aug-2014_14:20:41] [DollyClient] Device is [25-Aug-2014_14:20:41] [DollyClient] Writing generic reconfiguration file "/opt/cmu/image/rhel6u5-auto/reconf.sh"

162 Managing a cluster with Insight CMU (info...currently-->No such file or directory<--) [25-Aug-2014_14:20:41] [CMUTools ] about to run: -->cp -vf /opt/cmu/etc/reconf.sh /opt/cmu/image/rhel6u5-auto/reconf.sh &> /opt/cmu/tmp/CMU_TEMPO_FILE_EKvQu1<-- cmu_system_helper [25-Aug-2014_14:20:41] [DollyClient] Writing generic reconfiguration file "/opt/cmu/image/rhel6u5-auto/pre_reconf.sh" (info...currently-->No such file or directory<--) [25-Aug-2014_14:20:41] [CMUTools ] about to run: -->cp -vf /opt/cmu/etc/pre_reconf.sh /opt/cmu/image/rhel6u5-auto/pre_reconf.sh &> /opt/cmu/tmp/CMU_TEMPO_FILE_jAJkql<-- cmu_system_helper [25-Aug-2014_14:20:41] [DollyClient] Writing disk overrides file "/opt/cmu/image/rhel6u5-auto/disk-overrides.txt" (info...currently-->No such file or directory<--) [25-Aug-2014_14:20:41] [CMUTools ] about to run: -->cp -vf /opt/cmu/etc/disk-overrides.txt /opt/cmu/image/rhel6u5-auto/disk-overrides.txt &> /opt/cmu/tmp/CMU_TEMPO_FILE_aZxYmF<-- cmu_system_helper [25-Aug-2014_14:20:41] [DollyClient] : clnt->curTxt="GETDISKMETADATA rootctrlvendorid:103c:3238;rootctrlbusid:0000:0b:08.0;" [25-Aug-2014_14:20:41] [DollyClient] : diskmeta="rootctrlvendorid:103c:3238;rootctrlbusid:0000:0b:08.0;" [25-Aug-2014_14:20:41] [DollyClient] Asking for partition table of "/dev/disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0" [25-Aug-2014_14:20:41] [DollyClient] Getting /opt/cmu/image/rhel6u5-auto/parttbl-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0.txt [25-Aug-2014_14:20:41] [DollyClient] Getting /opt/cmu/image/rhel6u5-auto/parttbl-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0.raw [25-Aug-2014_14:20:41] [DollyClient] Getting /opt/cmu/image/rhel6u5-auto/partarchi-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part3.tar.bz2 (1/2) [25-Aug-2014_14:23:56] [DollyClient] Getting /opt/cmu/image/rhel6u5-auto/partarchi-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1.tar.bz2 (2/2) [25-Aug-2014_14:23:57] [DollyClient] Image rhel6u5-auto received [25-Aug-2014_14:23:57] Dolly Terminated... *** *** backup success *** golden node reboot in 0s The following example backs up all partitions on the OS disk of a node, where a0ff813c-26dc-4160-9c88-647db594703b is the root partition UUID, rhel6u5 is the image group name, and n3 is the node name. cmu> backup "rhel6u5" uuid "a0ff813c-26dc-4160-9c88-647db594703b" n3 backup id is 7387 log file is /opt/cmu/log/cmudolly-7387.log copying ssh settings hostname setup ok logout ok netbooting node waiting 60 secs for node n3 to boot netbooted correctly,removing dhcp entry tail -f /opt/cmu/log/cmudolly-7387.log Starting retrieving fstab [25-Aug-2014_14:09:54] [Dolly] Running OS : "Linux-RedHat" [25-Aug-2014_14:09:54] OSTYPE:Linux-CMU [25-Aug-2014_14:09:54] [DollyClient] Starting to get fstab [25-Aug-2014_14:09:54] [DollyClient] Getting "/opt/cmu/image/rhel6u5/fstab-device.txt" [25-Aug-2014_14:10:26] Dolly Terminated... starting backup process... partition list = disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part3, disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1 [25-Aug-2014_14:10:26] [Dolly] Add partition "disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part3" [25-Aug-2014_14:10:26] [Dolly] Add partition "disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1" [25-Aug-2014_14:10:26] [Dolly] Running OS : "Linux-RedHat" [25-Aug-2014_14:10:26] OSTYPE:Linux-CMU [25-Aug-2014_14:10:26] [DollyClient] This is not SLES9 or above. No specific work to do [25-Aug-2014_14:10:26] [DollyClient] redhat like OS, launching the script to detect the nic configuration file name used for the cloning [25-Aug-2014_14:10:26] [DollyClient] Starting to get image [25-Aug-2014_14:10:26] [DollyClient] Asking for main devices list [25-Aug-2014_14:10:26] [DollyClient] Device is [25-Aug-2014_14:10:26] [DollyClient] Writing generic reconfiguration file "/opt/cmu/image/rhel6u5/reconf.sh" (info...currently-->No such file or directory<--) [25-Aug-2014_14:10:26] [CMUTools ] about to run: -->cp -vf /opt/cmu/etc/reconf.sh /opt/cmu/image/rhel6u5/reconf.sh &> /opt/cmu/tmp/CMU_TEMPO_FILE_e0zcSq<-- cmu_system_helper [25-Aug-2014_14:10:26] [DollyClient] Writing generic reconfiguration file "/opt/cmu/image/rhel6u5/pre_reconf.sh" (info...currently-->No such file or directory<--) [25-Aug-2014_14:10:26] [CMUTools ] about to run: -->cp -vf /opt/cmu/etc/pre_reconf.sh /opt/cmu/image/rhel6u5/pre_reconf.sh &> /opt/cmu/tmp/CMU_TEMPO_FILE_saR81c<-- cmu_system_helper [25-Aug-2014_14:10:26] [DollyClient] Writing disk overrides file "/opt/cmu/image/rhel6u5/disk-overrides.txt" (info...currently-->No such file or directory<--) [25-Aug-2014_14:10:26] [CMUTools ] about to run: -->cp -vf /opt/cmu/etc/disk-overrides.txt /opt/cmu/image/rhel6u5/disk-overrides.txt &> /opt/cmu/tmp/CMU_TEMPO_FILE_Dzi1cZ<-- cmu_system_helper [25-Aug-2014_14:10:26] [DollyClient] : clnt->curTxt="GETDISKMETADATA rootctrlvendorid:103c:3238;rootctrlbusid:0000:0b:08.0;" [25-Aug-2014_14:10:26] [DollyClient] : diskmeta="rootctrlvendorid:103c:3238;rootctrlbusid:0000:0b:08.0;" [25-Aug-2014_14:10:26] [DollyClient] Asking for partition table of "/dev/disk/by-path/cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0" [25-Aug-2014_14:10:26] [DollyClient] Getting /opt/cmu/image/rhel6u5/parttbl-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0.txt [25-Aug-2014_14:10:26] [DollyClient] Getting /opt/cmu/image/rhel6u5/parttbl-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0.raw [25-Aug-2014_14:10:26] [DollyClient] Getting

7.17 Insight CMU CLI 163 /opt/cmu/image/rhel6u5/partarchi-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part3.tar.bz2 (1/2) [25-Aug-2014_14:13:41] [DollyClient] Getting /opt/cmu/image/rhel6u5/partarchi-cmu_pci0000:00_0000:00:03.0_0000:0a:00.0_0000:0b:08.0_smartarray0_c0d0-part1.tar.bz2 (2/2) [25-Aug-2014_14:13:43] [DollyClient] Image rhel6u5 received [25-Aug-2014_14:13:43] Dolly Terminated... *** *** backup success *** golden node reboot in 0s

Modifying a management card password Modifies the management card password in the Insight CMU database. cmu> modify_password ILO|ILOCM|lo100i For example: cmu> modify_password lo100i login> hpe password> service password successfully changed

cmu>

IMPORTANT: You can only change the password in the Insight CMU database. This command does not change the actual password of the card. The password is echoed during this command.

Discovering MAC address for new nodes The scan_macs command enables discovering the MAC address for new nodes. Enter parameters interactively. For example: cmu> scan_macs Enter the first nodename (ex. "n%i"): 1 Enter the nodename prefix: n Enter the IP address of the first node: 192.168.1.1 Enter the netmask: 255.255.0.0 Enter the BMC type (ex. ILO, lo100i, ILOCM): ILO Enter the compute node NIC number attached to the admin network [1]: If you have a file of BMC IP addresses to scan enter the filename: Enter one or more BMC IPs to scan separated by commas or enter a starting IP if you'd like CMU to generate addresses to scan: 192.168.1.101 Enter the total number of sequential addresses to scan from this IP [1]: Do you want to modify other default scanning behaviors? (y/[n]): n Discovered nodes are inserted into the CMU database by default. Enter a file name if you wish to write the node information to a file instead: SUMMARY First nodename is '1' The node prefix is 'n' The first IP address is '192.168.1.1' The netmask is '255.255.0.0' The BMC type is 'ILO' The ILO IP address list is '192.168.1.101' The compute node NIC attached to the admin network is '1' The management node IP for the scanned nodes is 'default' The gateway IP for the scanned nodes is 'default' The BIOS boot mode is 'auto' The iscsi root string for the scanned nodes is 'none' Discovered nodes will be written to the CMU database. Is this correct? ([y]/n/q): y INFO: It looks like StrictHostKeyChecking is set to 'no' in /root/.ssh/config... Make sure you can ssh to all client nodes without providing a password or answering(yes/no) to a registration question or various CMU commands/systems will fail to run. Scanning complete. 1 nodes added, 0 nodes updated. 7.17.5 Administration utilities pdcp and pdsh Insight CMU includes the open source software pdcp and pdsh. Usage example of pdcp: # /opt/cmu/bin/pdcp -w cn0001,cn0002 source /tmp/dest where:

164 Managing a cluster with Insight CMU source is a file on the management node. dest is the name of the destination file copied to compute nodes cn0001 and cn0002. Usage example of pdsh: # /opt/cmu/bin/pdsh -w cn0001,cn0002 ls cn0001: bin cn0001: inst-sys cn0002: anaconda-ks.cfg cn0002: CMU_CLONING_INFO cn0002: install.log.syslog cn0002: install.log The ls command is executed on compute nodes cn0001 and cn0002. 7.17.6 Insight CMU Linux shell commands Insight CMU provides a Linux shell API interface. Most functions provided from the GUI and CLI have their equivalent in the API interface. The API interface is easily called from a shell script. For more information on Insight CMU commands, see “Insight CMU manpages” (page 224), or the manpages.

7.17 Insight CMU CLI 165 8 Advanced topics 8.1 Enabling non-root users All of the Insight CMU functionality must be run as the root user. To enable non-root users to run Insight CMU functionality from the CLI, the Linux sudo support is recommended. For examples of configuring sudo support for various Insight CMU commands, see “Examples of sudo configurations” (page 169) All of the Insight CMU CLI commands in /opt/cmu/bin/ and /opt/cmu/tools/ can only be run by the root user, except for the pdsh commands: • pdsh • pdcp • rpdcp • cmu_diff • cmu_dsh (pdsh plus cmu_diff) These pdsh commands require password-less ssh configuration. (Insight CMU configures this for the root user). Also, the cmu_dsh and cmu_diff commands require setting CMU_DIFF_TMP_DIR=/tmp in the user’s environment before being invoked. The Insight CMU GUI provides another interface to the Insight CMU functionality. The Insight CMU GUI provides its own permissions control that enables non-root users to log into the GUI and controls which functionality is enabled for the non-root user. Some of the Insight CMU GUI features that only interact with the Insight CMU database do not require sudo support because they do not invoke any of the Insight CMU CLI commands. All of the other Insight CMU GUI features that do invoke Insight CMU CLI commands still require the same sudo support as the CLI. To enable users to log into the Insight CMU GUI, edit the /opt/cmu/etc/admins file appropriately. This file also provides fine-grain access control to specific Insight CMU database functions like creating and deleting nodes and groups. For details, see “The /opt/cmu/etc/ admins file” (page 168). 8.1.1 Configuring non-root user access with the Insight CMU GUI To enable sudo support in the Insight CMU GUI for access to cluster operations, configure the CMU_SUDO setting in /opt/cmu/etc/cmuserver.conf with the path of the sudo binary on the Insight CMU management server. After setting this variable, restart the cmu service and restart the Insight CMU GUI. Table 5 (page 166) and Table 6 (page 167) map each Insight CMU GUI feature to the corresponding CLI command (if required). Table 5 (page 166) also maps the Insight CMU GUI feature to the DB access keyword (if required). Table 5 Operational Insight CMU features and controls

Underlying command (def: /opt/ Insight CMU GUI feature (top menu bar) cmu/bin/) DB keyword

Monitoring→Start Monitoring Engine /opt/cmu/tools/cmu_start_monitoring

Monitoring→Stop Monitoring Engine /opt/cmu/tools/cmu_stop_monitoring

Cluster Administration→Node NODE_ADD Management→Add Node

Cluster Administration→Node NODE_DELETE Management→Delete Node

166 Advanced topics Table 5 Operational Insight CMU features and controls (continued)

Underlying command (def: /opt/ Insight CMU GUI feature (top menu bar) cmu/bin/) DB keyword

Cluster Administration→Node NODE_MODIFY Management→Modify Node

Cluster Administration→Node cmu_scan_macs NODE_ADD Management→Scan Node

Cluster Administration→Node IMAGE_GROUP_MAKE_NODE_ACTIVE Management→Change (Active) Image Group

Cluster Administration→Node NODE_ADD Management→Import Nodes

Cluster Administration→Node Management→Export Nodes

Cluster Administration→Network Group NETWORK_GROUP_ADD Management→Create

Cluster Administration→Network Group NETWORK_GROUP_DELETE Management→Delete

Cluster Administration→Network Group NETWORK_GROUP_MODIFY Management→Manage Nodes

Cluster Administration→Image Group IMAGE_GROUP_ADD Management→Create (disk-based)

Cluster Administration→Image Group cmu_autoinstall_node IMAGE_GROUP_ADD Management→Create Autoinstall

Cluster Administration→Image Group cmu_add_image_group IMAGE_GROUP_ADD Management→Create (diskless)

Cluster Administration→Image Group cmu_rename_image_group Management→Rename

Cluster Administration→Image Group cmu_del_image_group IMAGE_GROUP_DELETE Management→Delete

Cluster Administration→Image Group IMAGE_GROUP_MODIFY Management→Manage Nodes (disk-based and Autoinstall)

Cluster Administration→Image Group cmu_add_to_image_group IMAGE_GROUP_MODIFY Management→Manage Nodes (diskless)

Cluster Administration→Image Group cmu_del_from_image_group IMAGE_GROUP_MODIFY Management→Manage Nodes (diskless)

Cluster Administration→Custom Group CUSTOM_GROUP_ADD Management→Create

Cluster Administration→Custom Group CUSTOM_GROUP_DELETE Management→Delete

Cluster Administration→Custom Group CUSTOM_GROUP_MODIFY Management→Manage Nodes

Table 6 Operational Insight CMU GUI features and controls

Insight CMU GUI feature (right-click node selection) Underlying command

ssh to CMU Mgmt Node/ssh Connection ssh (node access assumes user account on target node)

Management Card Connection /opt/cmu/bin/cmu_console

8.1 Enabling non-root users 167 Table 6 Operational Insight CMU GUI features and controls (continued)

Insight CMU GUI feature (right-click node selection) Underlying command

Shutdown /opt/cmu/bin/cmu_halt

Power Off /opt/cmu/bin/cmu_power

Boot /opt/cmu/bin/cmu_boot

Reboot /opt/cmu/bin/cmu_halt

Change UID LED Status /opt/cmu/bin/cmu_power

Multi-Window Broadcast (Mgt Card/VSP) /opt/cmu/bin/cmu_console

Backup (Capture Image) /opt/cmu/bin/cmu_backup

Cloning (Deploy Image) /opt/cmu/bin/cmu_clone

AutoInstall (kickstart|autoyast|preseed) /opt/cmu/bin/cmu_autoinstall_node

Update→Get Nodes Static Info /opt/cmu/tools/cmu_cn_install

Update→Install CMU Monitoring Client /opt/cmu/tools/cmu_cn_install

Update→Rescan MAC /opt/cmu/tools/cmu_rescan_mac

Insight→Show BIOS Settings /opt/cmu/bin/cmu_firmware_mgmt

Insight→Show BIOS Version /opt/cmu/bin/cmu_firmware_mgmt

Insight→Upgrade Firmware /opt/cmu/bin/cmu_firmware_mgmt

[Archived Custom Group] Rename /opt/cmu/bin/cmu_rename_archived_custom group

[Archived Custom Group] Permanently delete /opt/cmu/bin/cmu_del_archived_custom_groups

8.1.1.1 Non-root user support for custom menu options All Insight CMU custom menu options configured in the /opt/cmu/etc/cmu_custom_menu file that execute on the Insight CMU management node are executed as the user who is logged into the Insight CMU GUI. If these commands require root privilege, then the command line must include sudo support. The custom menu keyword CMU_SUDO can be used in the /opt/cmu/etc/cmu_custom_menu file to apply sudo support to a command. If a custom menu command containing the CMU_SUDO keyword is executed by a non-root user, then the Insight CMU GUI replaces this keyword with the value of the CMU_SUDO keyword in the /opt/cmu/etc/cmuserver.conf file. If a custom menu command containing the CMU_SUDO keyword is executed by the root user, then this keyword is removed from the command before it is executed.

8.1.1.2 The /opt/cmu/etc/admins file The /opt/cmu/etc/admins file controls which non-root users can log into the Insight CMU GUI, and which Insight CMU database features these non-root users can modify. The /opt/cmu/etc/admins file is structured to expect one line per user. Spaces and lines beginning with '#' are ignored. The username that begins the line is allowed to log into the Insight CMU GUI. If no permissions keywords are listed next to the username, then this user has full, unrestricted access to modifying the Insight CMU database settings. However, database settings that also require sudo permissions on associated CLI commands will only be successful if sudo is also granted. If one or more permissions keywords are listed next to the username, then this username is only permitted to make those changes to the Insight CMU database.

168 Advanced topics The following permissions keywords can be configured next to the username, separated by spaces: • Node management keywords ◦ NODE_ADD—Permission to add nodes to the DB ◦ NODE_MODIFY—Permission to modify current node settings ◦ NODE_DELETE—Permission to delete nodes from the DB • Network group management keywords ◦ NETWORK_GROUP_ADD—Permission to add new network group ◦ NETWORK_GROUP_MODIFY—Permission to add/delete nodes to/from a network group ◦ NETWORK_GROUP_DELETE—Permission to delete network groups • Image group management keywords ◦ IMAGE_GROUP_ADD—Permission to add new image groups ◦ IMAGE_GROUP_MODIFY—Permission to add/remove nodes to/from image groups ◦ IMAGE_GROUP_DELETE—Permission to delete image groups ◦ IMAGE_GROUP_MAKE_NODE_ACTIVE—Permission to change the active image group of a node • Custom group management keywords ◦ CUSTOM_GROUP_ADD—Permission to add new custom groups ◦ CUSTOM_GROUP_MODIFY—Permission to add/remove nodes to/from custom groups ◦ CUSTOM_GROUP_DELETE—Permission to delete custom groups

8.1.2 Examples of sudo configurations The /etc/sudoers file configures sudo control for non-root users. This file is edited with the visudo command for safety purposes. To grant shutdown and reboot privileges to user 'jsmith' (note that ALL can be replaced with localhost or the host name of the Insight CMU management node): jsmith ALL =(ALL) NOPASSWD: /opt/cmu/bin/cmu_halt The /etc/sudoers file supports creating "command aliases" to refer to a group of commands. For example, the following lines configure useful Insight CMU aliases by grouping the commands into three categories: • Power control • Provisioning • Remaining features Cmnd_Alias CMU_POWER = /opt/cmu/bin/cmu_halt,/opt/cmu/bin/cmu_power,/opt/cmu/bin/cmu_boot Cmnd_Alias CMU_IMAGE = /opt/cmu/bin/cmu_backup,/opt/cmu/bin/cmu_clone,/opt/cmu/bin/cmu_autoinstall_node Cmnd_Alias CMU_ETC = /opt/cmu/bin/cmu_console,/opt/cmu/tools/cmu_cn_install,/opt/cmu/bin/cmu_firmware_mgmt Now, you can grant power control to user 'cjones' as long as he provides a password: cjones ALL =(ALL) CMU_POWER To let user 'bstevens' control power and provisioning without a password:

8.1 Enabling non-root users 169 bstevens ALL =(ALL) NOPASSWD: CMU_POWER, CMU_IMAGE To grant user 'sbarney' full administrative control with these commands without a password: sbarney ALL =(ALL) NOPASSWD: CMU_POWER, CMU_IMAGE, CMU_ETC If a non-root user in the Insight CMU GUI tries to execute a command without sudo privileges, the sudo command prevents it from running and logs the incident. To avoid running into a tty error, verify that Defaults requiretty is commented out in the /etc/sudoers file. 8.2 Modifying the management network configuration Insight CMU is typically configured with a standard, flat 1GB Ethernet network with default settings for the Insight CMU management network. However, some administrators might want to configure this network for use by other software packages and change some of the parameters that affect Insight CMU. For example, some administrators might want to increase the network package size from a default MTU of 1500 to 9000, to enable "Jumbo Frames". This change will affect the ability of Insight CMU PXE-boot unless this change is also added to the Insight CMU DHCP server configuration. Or administrators might need to update the Insight CMU DHCP server configuration with a DNS server or a Gateway server to properly support a diskless cluster setup. Or an administrator might want to split up the Insight CMU management network into different subnets. This section discusses how to add changes to the Insight CMU DHCP server configuration to support these types of changes. Insight CMU provides the self-documented /opt/cmu/etc/cmu_dhcpd_header_addons file for adding changes to the Insight CMU DHCP server configuration. This file can be configured with additional DHCP settings that will be added to the /etc/dhcpd.conf file when the /opt/ cmu/bin/cmu_test_dhcp command is run. If any error occurs when /opt/cmu/bin/ cmu_test_dhcp is run, then the administrator must correct the error in the /opt/cmu/etc/ cmu_dhcpd_header_addons file and rerun the /opt/cmu/bin/cmu_test_dhcp command until it runs successfully. Changes that are added to the /opt/cmu/etc/ cmu_dhcpd_header_addons file are added to the /etc/dhcpd.conf file on the Insight CMU management server, and to other Insight CMU DHCP server configurations during cloning. For more details on making DHCP server configuration changes to this file, see the /opt/cmu/ etc/cmu_dhcpd_header_addons file. 8.3 Customizing Insight CMU netboot kernel arguments When backing up or cloning nodes, Insight CMU PXE-boots each node into an NFS-based diskless operating system provided by Insight CMU. Insight CMU invokes processes within this diskless operating system to perform the backup or cloning operation on the underlying disk. The PXE-boot process includes a standard bootloader (pxelinux.0 for Legacy BIOS, or grub.efi for UEFI) and a boot configuration file that is formatted to be parsed by either the pxelinux.0 or grub.efi binary. Insight CMU provides two stock boot configuration files named default and efidefault which are located in /opt/cmu/etc/bootopts/ on the Insight CMU management node. These boot configuration files contain the minimum Debian kernel boot arguments required to boot the Insight CMU netboot kernel in standard Insight CMU cluster configurations. Insight CMU configures the DHCPD configuration file to automatically detect whether a client compute node is configured in Legacy BIOS mode or in UEFI mode, and it provides the appropriate bootloader for either case. Insight CMU configures both boot configuration files for all netbooting scenarios, so that either bootloader can be used. This is particularly useful in "mixed" clusters consisting of older Legacy BIOS servers and newer UEFI servers. These Insight CMU boot configuration files may require occasional modifications to adapt to specific cluster configurations or specific hardware needs. If a cluster-wide change to /opt/ cmu/etc/bootopts/default and/or /opt/cmu/etc/bootopts/efidefault is required,

170 Advanced topics these files can be edited directly. Insight CMU also provides keywords used in these files that correspond to node-specific settings in the Insight CMU database You may be able to make your changes by updating the Insight CMU node configuration settings. For more information, see “PXE-boot configuration file keywords” (page 171). If the boot configuration file needs to be modified for a subset of the nodes in the cluster, then the default or efidefault file can be copied to another file in the same directory and changes can be applied to the new file. At this point, the subset of nodes must all be in either Legacy BIOS mode or in UEFI mode, and Administrator must replicate the format of the appropriate boot configuration file. The name of the new file determines the scope of the nodes to which the file applies. Insight CMU uses the following process to select a boot file for each node: • If /opt/cmu/etc/bootopts/ exists, Insight CMU uses this file when netbooting . Else; • If /opt/cmu/etc/bootopts/ exists, Insight CMU uses this file when netbooting nodes that are configured with an IP address that falls within the range of . The hex-IP-address can range from the full 8-character address representing a specific IP address, to a single character representing a broad subnet of IP addresses. Else; • The /opt/cmu/etc/bootopts/default or /opt/cmu/etc/bootopts/efidefault file is used. This is the default behavior. For example, to create a node-specific PXE-boot file for a Legacy BIOS server: An Insight CMU compute node (login1) is connected to the lab network (eth0) and the private cluster network (eth1). Depending on the hardware configuration and wiring for that node, the kernel might send the DHCP IP request over the lab network (eth0), causing the kernel to hang. To avoid this, the system administrator can: 1. Copy /opt/cmu/etc/bootopts/default to /opt/cmu/etc/bootopts/login1. 2. Edit the new login1 file by changing the existing ip=::::::bootp to ip=:::::eth1:bootp. This new kernel boot argument instructs the kernel to send the DHCP IP request over the private cluster network (eth1).

NOTE: This change might not be applicable to all of the nodes in the cluster because compute nodes that do not have a lab network connection might have private cluster networks configured on eth0. If three nodes (login[1-3]) require this boot file modification, copy /opt/cmu/etc/bootopts/ login1 to /opt/cmu/etc/bootopts/login2 and /opt/cmu/etc/bootopts/login3. If a subnet of compute nodes require this boot file modification, for example nodes with IP addresses 172.20.0.[1-15], then copy or rename /opt/cmu/etc/bootopts/login1 to /opt/ cmu/etc/bootopts/AC14000. The hexadecimal IP address AC14000 covers IP addresses 172.20.0.1 - 172.20.0.15. 8.3.1 PXE-boot configuration file keywords Insight CMU v8.0 supports the following keywords in the PXE-boot configuration files in /opt/ cmu/etc/bootopts/:

Keyword Default keyword text Related Insight CMU database field

CMU_ARCH x86_64 Node architecture setting

CMU_PLATFORM generic Node platform setting

8.3 Customizing Insight CMU netboot kernel arguments 171 CMU_CONSOLE console=ttyS0 conole=tty1 Node console and speed settings

CMU_VENDOR_ARGS Node vendor args setting

The CMU_CONSOLE keyword attempts to replace console=ttyS0 console=tty1 in the PXE-boot configuration files when the Insight CMU v8.0 rpm is installed. If this attempt fails, a warning is displayed. The other keywords are additions to the file. These keywords are needed to support the new Moonshot servers. They are replaced by the corresponding setting in the Insight CMU database when the nodes are PXE-booted. If the database setting is default, then they are replaced by the 'Default Keyword Text' above. These keywords and the corresponding fields in the Insight CMU database provide Insight CMU with a way to override these 'Default Keyword Text' settings when new servers that require different PXE-boot settings are detected. Insight CMU detects the new server types automatically during the Node Scanning process, or these fields are populated when the nodes are added to Insight CMU manually. During the manual process, the given architecture and platform settings determine the recommended Insight CMU database field settings. The console and console speed fields in the Insight CMU database also provide a method for users to store customized node-specific console settings in the Insight CMU database. For example, if the Virtual Serial Port of "nodeX" is connected to 'COM2', then you can configure ttyS1 in the console field for "nodeX". When "nodeX" is PXE-booted, the CMU_CONSOLE keyword is replaced with console=ttyS1. If the console speed field is set to "9600n8", then the CMU_CONSOLE keyword is replaced with console=ttyS1,9600n8. 8.4 Cloning mechanisms The cloning process requires you to create image groups. An image group is a collection of nodes that share the same operating system image. Each image group is defined under the /opt/ cmu/image directory on the image server. This directory contains the target system disk partition information and compressed image of the selected partitions from a master disk. The backup utility provides a mechanism to back up the master disk image of the image group of a node. An image group can contain nodes from one or more network groups. By default, Insight CMU puts all nodes in the default image group. Insight CMU supports multiple image groups for a cluster. This enables the cluster to be configured to meet the needs of several user types. An image group can only contain nodes of similar server machines because they share the same operating system image. A cluster of nodes generally has several racks. Each rack has one Ethernet switch. For the best network performance during cloning, all nodes belonging to a single rack must be configured in an Insight CMU network group. When you clone a group of nodes belonging to different network groups, one node per network group is elected as the secondary server. Only one secondary server exists in each network group. A copy of the tftp root directory and the cloning database is transferred to each secondary server. All secondary servers boot in a pipeline on the primary server and each one of these servers must boot between 1 (0) and 64 (63) compute nodes. The tftp root directory is stored in memory (ramdrive) on each secondary server and on the hard drive for the primary server which is usually the Insight CMU management node. The tftp root directory of the Insight CMU management node is shared among all the secondary servers. The tftp directory of each secondary server is shared among all nodes of the network group of the secondary server. When a node is booted, it mounts the local system disk and checks the partitions against the clone image partition information. If the descriptions differ, the node changes the local disk partition and rebuilds file systems. When the partitioning is correct, each node waits for the clone image to download. In the first part of the propagation, only secondary servers are enabled to receive the clone image.

172 Advanced topics When all secondary servers own a copy of the clone image, the primary server stops the propagation and each secondary server pushes its clone image to all nodes in the network group. The transfer uses TCP/IP sockets. The clone image is saved to the local disk. The node then asks the image server if any successors are waiting for upload. If any successors are waiting, the node then starts to transfer the image to a group member, while the image server uploads a third one. This process is called the tree propagation algorithm. After a node has received a completed image, it attempts to upload to another node within the entity. This mechanism speeds up the propagation process and takes advantage of the available network bandwidth. Each time a node receives a clone image, the node uncompresses the image on the local disk. This is designed to speed up the cloning process. The process is also performed on all secondary servers. When the Insight CMU list of nodes to be cloned completes, each node within the group has an identical system disk image on the local system disk. To reboot on the local disk, some node-specific files must be adjusted on a node-by-node basis, such as the TCP/IP node address and host name. Insight CMU adjustment is limited to the TCP/IP name and address of the primary network interface. The TCP/IP host name and IP address set for compute nodes is defined by the DHCP configuration file on the image server, for example, /etc/dhcpd.conf. During cloning, secondary servers boot from the primary server.

Then, each compute node boots from its secondary server.

8.4 Cloning mechanisms 173 8.5 Support for Intel Xeon Phi cards Insight CMU can be configured to support cloning the OS image when Intel Xeon Phi cards are present in the compute nodes and the Intel Xeon Phi software is installed. Insight CMU also supports booting a oneSIS diskless OS image to all compute nodes that have Intel Xeon Phi cards installed and the Intel Xeon Phi software installed. However, in both cases Insight CMU is only providing the OS file system.

IMPORTANT: The user is responsible for flashing the Intel Xeon Phi cards with the firmware after the OS file system is deployed, which is required before booting the Intel Xeon Phi cards. By default, the software for the Intel Xeon Phi cards is pre-configured with local network access to the Xeon Phi Linux OS environment from the host server only. This configuration can be cloned by Insight CMU without any tuning to the Insight CMU cloning process. This configuration can also be booted into a oneSIS diskless environment, and the only required customization to the image is configuring the portions of the single image file system where the Intel Xeon Phi software expects to have write access. These customizations can be extracted from the example scripts provided in the /opt/cmu/contrib/xeon_phi/ directory. The Intel Xeon Phi cards can also be configured with independent IP addresses and bridged with the Ethernet network device on the local host enabling Intel Xeon Phi card access directly from any other compute node or Intel Xeon Phi card on the cluster network. In this configuration, the Insight CMU post-cloning process must be scripted to configure the network bridging and the independent IP addresses for each Intel Xeon Phi card in the cluster. This configuration can also be booted into a oneSIS diskless environment, which involves the same network scripting requirements combined with configuring the writeable portions of the file system. See the following sections: • “Intel Xeon Phi card IP address and host name assignment algorithm” (page 175) discusses the general host name and IP address algorithm recommended for the Intel Xeon Phi cards to support both of these Insight CMU OS provisioning methods. • “Cloning an image with Intel Xeon Phi cards configured with independent IP addresses” (page 176) provides details on cloning this configuration. • “Insight CMU oneSIS diskless file system support for independent addressing of Intel Xeon Phi cards” (page 176) provides details on booting a diskless image with this configuration.

174 Advanced topics To configure one or more Intel Xeon Phi cards with a host name and an independent IP address and to bridge with the network device on the local host, the following settings must be implemented. The following examples assume a RHEL OS on the host compute node: • /etc/hosts must contain the host names and IP address(es) of the Xeon Phi cards. • A network bridge must be configured with the IP address of the local compute node (/etc/ sysconfig/network-scripts/ifcfg-br0). • The network device must be associated with the bridge (/etc/sysconfig/ network-scripts/ifcfg-ethX). • Each host-side Xeon Phi card Ethernet device must be associated with the bridge (/etc/ sysconfig/network-scripts/ifcfg-micX). • Each Xeon Phi master configuration file must be configured with the Xeon Phi host name, IP address, and network configuration (/etc/mpss/micX.conf). • Each Xeon Phi card host name must be configured in the Xeon Phi file system (/var/mpss/ micX/etc/hostname). • Each MIC-side Xeon Phi card Ethernet device must be configured with the Xeon Phi IP address (/var/mpss/micX/etc/network/interfaces). The settings listed above must be scripted to configure correctly on each compute node, except for the initial /etc/hosts configuration, which must be done by the user beforehand. However, Insight CMU must be scripted to copy the host compute node /etc/hosts file into each Xeon Phi file system. 8.5.1 Intel Xeon Phi card IP address and host name assignment algorithm Insight CMU cloning and diskless deployment both configure the per-node host name and IP addresses for the cluster management network. When Intel Xeon Phi cards are present on each compute node and configured with independent host names and IP addresses, Insight CMU needs a way to determine the host name and IP address that will be assigned to each Intel Xeon Phi card on each compute node. There are several ways to accomplish this. The recommended way, which is employed in the examples in the /opt/cmu/contrib/xeon_phi/ directory, is to assign -mic0 as the host name of the first Intel Xeon card on each compute node, and -mic1 as the host name of the second Intel Xeon Phi card on each compute node, and so on. These examples only focus on two Intel Xeon Phi cards per compute node, but this algorithm scales to any count. Now that the Intel Xeon Phi card host name can be determined from the local compute node host name, you can easily search the /etc/hosts file (using grep) for the corresponding IP address assigned to this Intel Xeon Phi card. With this method, we can programmatically determine the host name and IP address of any Intel Xeon Phi card on any compute node. The success of this algorithm depends on having the Intel Xeon Phi card host name and IP address pre-configured in the /etc/hosts file on the Insight CMU management node. This must be done before any of the examples provided in the /opt/cmu/contrib/xeon_phi/ directory are employed. This requirement has the following benefits: • The hostname-to-IP address mapping definition in /etc/hosts ensures that any Intel Xeon Phi card host name can be resolved to its IP address from any node or Xeon Phi card within the cluster. • The cloning and diskless provisioning examples in the /opt/cmu/contrib/xeon_phi/ directory will be successful in deploying an OS and configuring the Intel Xeon Phi cards with unique host names and IP addresses.

8.5 Support for Intel Xeon Phi cards 175 IMPORTANT: You must configure the host name (in the format recommended above) and IP address of your Intel Xeon Phi cards in /etc/hosts on the Insight CMU management node before attempting any cloning or diskless image building.

8.5.2 Cloning an image with Intel Xeon Phi cards configured with independent IP addresses The Intel Xeon Phi host names and IP addresses must be determined and configured in the /etc/hosts file on the Insight CMU management node for a successful post-cloning process. For more information on this configuration, see “Intel Xeon Phi card IP address and host name assignment algorithm” (page 175). The golden node targeted for backup must be configured with the network bridge and with the Intel Xeon Phi cards already configured with their independent host names and IP addresses. The cloning process duplicates this image to the other nodes, and the post-cloning process updates the appropriate files with the unique Intel Xeon Phi host names and IP addresses. The /opt/cmu/contrib/xeon_phi/reconf.sh script is an example of an Insight CMU post-cloning script that would be installed in /opt/cmu/image/{IMAGE_GROUP_NAME}/ and would perform two activities: 1. Configure the TCP-over-IB IP address. 2. Configure each Xeon Phi Co-processor with its own hostname and IP address.

IMPORTANT: Be sure to read through the reconf.sh script thoroughly before using it. This script contains several variable settings that must be adjusted to your cluster environment to be successful. This script is provided as-is, and is only intended as an example.

8.5.3 Insight CMU oneSIS diskless file system support for independent addressing of Intel Xeon Phi cards Insight CMU oneSIS diskless support provides two scripts for modifying the oneSIS single system image. The reconf-onesis-image.sh script runs as the last step after the oneSIS diskless image group is created. This script contains all of the general changes to make to the image. The reconf-onesis-snapshot.sh script is run each time a node is added to the oneSIS diskless image group, and performs all node-specific changes to the image. Typically these scripts work together to prepare the image. For example, the reconf-onesis-image.sh script configures a file as being node-specific in oneSIS, and then the reconf-onesis-snapshot.sh script creates the node-specific version of that file for each node. Both scripts are located in /opt/cmu/contrib/xeon_phi/. You might need to add additional scripting to support your specific environment. Also, these examples are only provided to display one way of implementing Intel Xeon Phi support. They are not absolute requirements, and may be subject to change depending on the Intel Xeon Phi software. These scripts were written to work with Intel MPSS software version 3.2.1 on a RHEL v6.2 OS distribution. 8.6 Insight CMU remote hardware control API Insight CMU supports a remote hardware control API. This hardware API makes it possible to integrate Insight CMU power and UID control with any computer that has remote power control capability. The /opt/cmu/bin/cmu_power command interacts with this API to provide remote power and UID control for Insight CMU. The existing hardware APIs are: ILO The most common method of interacting with HPE BL/DL/SL servers.

176 Advanced topics lo100i The legacy method of interacting with low-end servers. None The fallback method of interacting with workstations and other servers that do not provide remote power control. IPMI The industry standard method, useful with non-Hewlett Packard Enterprise (or non-iLO Hewlett Packard Enterprise) hardware. ILOCM The method for integration with ProLiant Moonshot Chassis. The Insight CMU hardware API consists of a collection of programs that reside in /opt/cmu/ hardware// where refers to the name of the hardware API. For example the iLO API programs reside in the /opt/cmu/hardware/ILO/ directory. The name of the API programs in the hardware API directory must conform to the following format: cmu__power_ Where is one of: off Remove power from the server. on Apply power to the server. osoff Attempt a graceful shutdown of the OS before removing power. uid_off Turn off the UID LED. uid_on Turn on the UID LED. These are the five basic actions required by Insight CMU.

NOTE: A boot command from Insight CMU is composed of the osoff action followed by the on action.

The following additional actions are supported by the /opt/cmu/bin/cmu_power command but are not required by Insight CMU: status Provide a power status for the given node, either on or off. press Simulate a "momentary press" of the power button. To recap: the following programs are required when implementing a new FOO hardware API: /opt/cmu/hardware/FOO/cmu_FOO_power_off /opt/cmu/hardware/FOO/cmu_FOO_power_on /opt/cmu/hardware/FOO/cmu_FOO_power_osoff /opt/cmu/hardware/FOO/cmu_FOO_power_uid_off /opt/cmu/hardware/FOO/cmu_FOO_power_uid_on All of these programs are invoked with the following arguments: -n -i -e Where:

8.6 Insight CMU remote hardware control API 177 The host name of the target server. The IP address of the management card for the target server. The name of the file to log any errors. After a new Insight CMU hardware API is developed and tested, it must be added to Insight CMU as a valid hardware type. To do this, add the name of the new hardware API to the CMU_VALID_HARDWARE_TYPES variable in /opt/cmu/etc/cmuserver.conf. The setting of this variable is: CMU_VALID_HARDWARE_TYPES=ILO:lo100i:ILOCM To add the IPMI hardware API, add IPMI to the list of valid hardware types: CMU_VALID_HARDWARE_TYPES=ILO:lo100i:ILOCM:IPMI After this is done, then you can configure servers in the Insight CMU database with this new "management card type". 8.7 Insight CMU diskless API Insight CMU supports a diskless API. This diskless API provides hooks into the creation, management, and booting of diskless image groups in Insight CMU. This diskless API enables the development, integration, or both of different "diskless OS" implementations within Insight CMU. In this context, the term "diskless" refers to any OS image that can be created and prepared locally on the Insight CMU management server and then served over the network to a PXE-booted set of compute nodes. Implementations of "diskless" OS images include the following: • stateful NFS-root—All reads and writes from the target compute nodes occur on the central NFS server. • stateless NFS-root—Reads occur from the central NFS server, but writes occur in memory (in a tmpfs file system). • stateless ramdisk—The entire OS image is transferred during the PXE-boot process, and unpacked in memory. The diskless integration programs are installed in the /opt/cmu/diskless/ directory. The diskless API requires that the following programs exist.

NOTE: The term here refers to the name of the implementation "toolkit". /opt/cmu/diskless//cmu__build_image /opt/cmu/diskless//cmu__delete_image /opt/cmu/diskless//cmu__register_node /opt/cmu/diskless//cmu__unregister_node /opt/cmu/diskless//cmu__boot_node /opt/cmu/diskless//cmu__diskless_check /opt/cmu/diskless//cmu__configure_node /opt/cmu/diskless//cmu__unconfigure_node /opt/cmu/diskless//cmu__boot_node The following programs are optional. /opt/cmu/diskless//cmu__diskless_check /opt/cmu/diskless//cmu__post_node_config

178 Advanced topics 8.7.1 Build diskless image The build_image program is called when an Insight CMU diskless image group of type is created. This program is called with the following arguments: -l The name of the new image group. -g <"golden node"> The host name/IP of the compute node from which to extract the diskless OS file system. -k <"kernel version"> The version string of the kernel that resides on the "golden node" and is to be the diskless kernel. The build_image program is expected to install and prepare the diskless OS file system in /opt/cmu/image//. This program may also prepare the kernel and any initrd required in order to PXE-boot this diskless image. If this program returns successfully (zero exit code), then the new Insight CMU diskless image group is created in the Insight CMU database. If this program returns with a non-zero exit code, then the new Insight CMU diskless image group is not created. 8.7.2 Delete diskless image The delete_image program is called when an Insight CMU diskless image group of type is deleted. This program is called with the following argument: -l The name of the image group to delete. The delete_image program is expected to delete everything related to the diskless OS in /opt/cmu/image//. 8.7.3 Configure diskless node The configure_node program is called when a compute node is added to the Insight CMU diskless image group of type . This program is called with the following arguments: -l The name of the diskless image group. -s The IP address of the NFS server. -n The host name of the target node to configure. -i The IP address of the target node to configure. -m The MAC address of the target node to configure. -e The active Ethernet device of the target node to configure. The configure_node program is expected to perform any node-specific tasks related to the diskless image. This may include calling /opt/cmu/tools/cmu_add_node_to_dhcp to configure the node to PXE-boot and to produce a PXE-boot file for the compute node.

8.7 Insight CMU diskless API 179 8.7.4 Post node configuration The post_node_config program is called after all of the configure_node programs have run. This program is called with the following arguments: -l The name of the diskless image group. -f A file containing the list of nodes that were just configured. The original purpose of the post_node_config program was to sychronize the NFS servers with any node-specific changes made by the configure_node programs, but any global actions that occur after the configure_node programs are run can be configured here. 8.7.5 Unconfigure diskless node The unconfigure_node program is called when a compute node is removed from the Insight CMU diskless image group of type . This program is called with the following arguments: -l The name of the diskless image group. -n The host name of the target node to unconfigure. -i The IP address of the target node to unconfigure. The unconfigure_node program is expected to unconfigure and/or remove anything related to the given node. This may include calling /opt/cmu/tools/cmu_remove_node_from_dhcp to prevent the node from PXE-booting, and to delete any PXE-boot file related to the given compute node. 8.7.6 Boot diskless node The boot_node program is called when a compute node is booted from Insight CMU into the Insight CMU diskless image group of type (as opposed to a reboot command from the OS). This program is called with the following arguments: -l The name of the diskless image group. -s The IP address of the NFS server. -n The host name of the target node to boot. -i The IP address of the target node to boot. -m The MAC address of the target node to boot. -e The active Ethernet device of the target node to boot. The boot_node program is typically a subset of the configure_node program, and ensures that the given node is ready to be PXE-booted. It may call /opt/cmu/tools/cmu_add_node_to_dhcp again, and may check that the correct PXE-boot file is in place for the given node.

180 Advanced topics 8.7.7 Diskless check The diskless_check program is called before the configure_node, unconfigure_node, or boot_node programs are called. This program is called with the following arguments: -l The name of the diskless image group. The diskless_check program is intended to contain any sanity checks that should be performed before the configure_node, unconfigure_node, or boot_node programs are called. It can also be called directly by the build_image program, to ensure that any prerequisites are met before the diskless image is created. After these programs are created, tested, and installed in the /opt/cmu/ diskless// directory, then you are ready to add this implementation "toolkit" to Insight CMU. To do this, add the implementation type to the CMU_VALID_DISKLESS_TOOLKITS variable in /opt/cmu/etc/cmuserver.conf.

NOTE: If more than one valid diskless type will be available in Insight CMU, this variable must be comma-separated. 8.8 Insight CMU REST API 8.8.1 Overview This release introduces support for Insight CMU REST API v1. This section describes the Insight CMU implementation. Standard REST API concepts (such as HTTP verbs, return codes, JSON, etc.) are not covered here.

8.8.1.1 Base path Insight CMU REST API is accessible at https://localhost:port/cmu/v?, where "v?" is the API version number. All paths in this section are relative to this base path. To get the current value, run service cmu status.

8.8.1.2 Version information

Current version v1

Supported versions v1

8.8.1.3 Feature coverage This implementation of Insight CMU REST API covers node and group management operations: • List/add/modify/delete nodes • List/add/modify/delete groups • Add/remove nodes from groups • List/add/modify/delete resource features Equivalent Insight CMU commands covered by REST API: • Nodes ◦ cmu_show_nodes ◦ cmu_add_node ◦ cmu_mod_mode ◦ cmu_del_node

8.8 Insight CMU REST API 181 • Image groups ◦ cmu_show_image_group ◦ cmu_add_image_group ◦ cmu_rename_image_group ◦ cmu_del_image_group ◦ cmu_add_to_image_group_candidates ◦ cmu_del_from_image_group_candidates ◦ cmu_change_active_image_group • Network groups ◦ cmu_show_network_group ◦ cmu_add_network_group ◦ cmu_del_network_group ◦ cmu_change_network_group ◦ cmu_del_from_network_group • Custom groups ◦ cmu_show_custom_groups ◦ cmu_add_custom_group ◦ cmu_del_custom_group ◦ cmu_add_to_custom_group ◦ cmu_del_from_custom_group • Resource features ◦ cmu_show_features ◦ cmu_add_feature ◦ cmu_del_feature

8.8.2 Technical considerations

8.8.2.1 API doctype Insight CMU REST API consumes and produces JSON payload only (application/json).

8.8.2.2 Authentication The user must be authenticated before issuing REST API calls. If the user is not authenticated, most of the REST API calls will result in a "403 Forbidden" return code: > curl -k -X GET "https://localhost:8080/cmu/v1/nodes" { "status" : 403, "reason" : "Forbidden",

182 Advanced topics "message" : "User Anonymous with role ANONYMOUS_ROLE is NOT allowed for NODE_GET" } To authenticate, the user submits login/password to the /session entry point: > curl -k -X POST --header 'Content-Type: application/json' -d '{ \ "login": "user", \ "password": "password" \ }' 'https://localhost:8080/cmu/v1/sessions' Upon success, the server sends back authentication details including a list of permissions granted for this user and a token code: { "name" : "username", "role" : { "permissions" : [ "NODE_GET", "NODE_ADD" ] }, "token" : { "code" : "ge26a1harsjscivp", "validity" : "2042-01-01T12:00:00.000+0000" } } This token code can now be specified in the X-Auth-Token header of subsequent queries to identify the user: > curl -k -X GET --header "X-Auth-Token: ge26a1harsjscivp" "https://localhost:8080/cmu/v1/nodes" [ { "name" : "node1", "id" : 1, […]

8.8.2.3 Resource identifier A resource identifier can be its name, ID, or UUID. Condensed name format is most commonly supported. For example, GET /nodes/node[1-4] can be used to get the representation of node1, node2, node3, and node4 (if they all exist). For more information about MutlipleIdentifierDto, see “MultipleIdentifierDto” (page 186).

8.8.2.4 Field filtering All calls support a field filtering query parameter named fields, which allows retrieval of only some fields of a resource instead of a full representation. For example, GET /nodes/node[1-2]/?fields=name,network.ipAddress can be used to get only the name and the IP address of node1 and node2: > curl -k -X GET "https://localhost:8080/cmu/v1/nodes/node[1-2]?fields=name,network.ipAddress" [ { "name" : "node1", "network" : { "ipAddress" : "1.1.1.1" } }, { "name" : "node2", "network" : { "ipAddress" : "1.1.1.2" } } ] To filter-out given fields, use "!": GET /customgroups/group1?fields=!nodes This returns "group1" representation, excluding the nodes section.

TIP: Filtering-out non-needed fields can significantly improve performance for a large resource set.

8.8 Insight CMU REST API 183 In particular, filtering-out nodes in a group is a good practice if nodes information is not needed. To get only the node names instead of their full representation, use ?fields=nodes.name. 8.8.3 Definitions

8.8.3.1 Node Table 7 Node

Name Required Schema Default

name true string

network false NetworkSettings

image false ImageSettings

platform false PlatformSettings

management false ManagementSettings

primaryId false integer

secondaryId false integer

biosBootMode false string AUTO

iscsiRoot false string none

features false Map

8.8.3.1.1 NetworkSettings Table 8 NetworkSettings

Name Required Schema Default

name true string

ipAddress true string

subnetMask true string

macAddress true string

mgmtServerIp false string default

defaultGateway false string default

8.8.3.1.2 ImageSettings Table 9 ImageSettings

Name Required Schema Default

name true string

cloningBlockDevice false string default

cloningDate false string (date-time)

184 Advanced topics 8.8.3.1.3 PlatformSettings Table 10 PlatformSettings

Name Required Schema Default

name false string generic

architecture false string x86_64

serialPort false string default

serialPortSpeed false string default

vendorsArgs false string default

8.8.3.1.4 ManagementSettings Table 11 ManagementSettings

Name Required Schema Default

cardType false string none

cardIpAddress false string

8.8.3.2 CustomGroup Table 12 CustomGroup

Name Required Schema Default

name true string

nodes false Node array

features false Map

8.8.3.3 ImageGroup Table 13 ImageGroup

Name Required Schema Default

name true string

nodes false Node array

candidates false Node array

type false string

backupImageDevice false string

backupImageDate false string (date-time)

features false Map

8.8.3.4 NetworkGroup Table 14 NetworkGroup

Name Required Schema Default

name true string

nodes false Node array

features false Map

8.8 Insight CMU REST API 185 8.8.3.5 MultipleIdentifierDto Table 15 MultipleIdentifierDto

Name Required Schema Default

identifiers true string array

MutlipleIdentifierDto is used when resource identifiers need to be transmitted into the payload of a query. This is a not a resource representation by itself. For example: { "identifiers": [ "node1", "node2" ] } 8.8.4 Resources

8.8.4.1 Custom group operations

8.8.4.1.1 Gets a single group GET /customgroups/{identifier}

8.8.4.1.1.1 Parameters Table 16 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.1.1.2 Responses Table 17

HTTP Code Description Schema

200 Successful operation CustomGroup

403 User is not allowed for No Content CUSTOM_GROUP_GET

404 No group with the given identifier No Content

8.8.4.1.2 Updates an existing custom group PUT /customgroups/{identifier}

8.8.4.1.2.1 Parameters Table 18 Parameters

Type Name Description Required Schema

PathParameter identifier Custom group identifier true string

BodyParameter body Updated custom group true CustomGroup definition

186 Advanced topics 8.8.4.1.2.2 Responses Table 19

HTTP Code Description Schema

200 Successful operation CustomGroup

400 Invalid input supplied No Content

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

404 No custom group found with the given No Content identifier

8.8.4.1.3 Deletes or archives an existing custom group DELETE /customgroups/{identifier}

8.8.4.1.3.1 Parameters Table 20 Parameters

Type Name Description Required Schema Default

PathParameter identifier Custom group true string identifier

QueryParameter archive Archive instead of false boolean false delete

QueryParameter archivedName Archived group name false string to be used

QueryParameter minDuration Minimal duration to false integer archive instead of delete

8.8.4.1.3.2 Responses Table 21

HTTP Code Description Schema

403 User is not allowed for No Content CUSTOM_GROUP_DELETE

404 No custom group found with the given No Content identifier

8.8.4.1.4 Gets all features of a single group GET /customgroups/{identifier}/features

8.8.4.1.4.1 Parameters Table 22 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

8.8 Insight CMU REST API 187 8.8.4.1.4.2 Responses Table 23

HTTP Code Description Schema

200 Successful operation object

403 User is not allowed for No Content CUSTOM_GROUP_GET

404 No group with the given identifier No Content

8.8.4.1.5 Adds or modifies features of an existing group PUT /customgroups/{identifier}/features

8.8.4.1.5.1 Parameters Table 24 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Features to be true Map added/modified

8.8.4.1.5.2 Responses Table 25

HTTP Code Description Schema

200 Successful operation object

400 Invalid input supplied No Content

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

8.8.4.1.6 Remove all features of an existing group DELETE /customgroups/{identifier}/features

8.8.4.1.6.1 Parameters Table 26 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

8.8.4.1.6.2 Responses Table 27

HTTP Code Description Schema

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

404 No group with the given identifier No Content

8.8.4.1.7 Gets one node of an existing group GET /customgroups/{identifier}/nodes/{node_id}

188 Advanced topics 8.8.4.1.7.1 Parameters Table 28 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

PathParameter node_id Node identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.1.7.2 Responses Table 29

HTTP Code Description Schema

200 Successful operation Node

403 User is not allowed for No Content CUSTOM_GROUP_GET

404 No group with the given identifier; No Content given node not found in given group

8.8.4.1.8 Removes one node from an existing group DELETE /customgroups/{identifier}/nodes/{node_id}

8.8.4.1.8.1 Parameters Table 30 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

PathParameter node_id Node identifier true string

8.8.4.1.8.2 Responses Table 31

HTTP Code Description Schema

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

8.8.4.1.9 Gets all nodes of an existing group GET /customgroups/{identifier}/nodes

8.8.4.1.9.1 Parameters Table 32 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

QueryParameter fields Fields to display false multi string array

8.8 Insight CMU REST API 189 8.8.4.1.9.2 Responses Table 33

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content CUSTOM_GROUP_GET

404 No group with the given identifier No Content

8.8.4.1.10 Adds nodes to an existing group POST /customgroups/{identifier}/nodes

8.8.4.1.10.1 Parameters Table 34 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Nodes identifier true MultipleIdentifierDto

8.8.4.1.10.2 Responses Table 35

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

404 No group with the given identifier; at No Content least one node not found

8.8.4.1.11 Removes some or all nodes from an existing group DELETE /customgroups/{identifier}/nodes

8.8.4.1.11.1 Parameters Table 36 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Nodes identifier, or empty false MultipleIdentifierDto to remove all nodes

8.8.4.1.11.2 Responses Table 37

HTTP Code Description Schema

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

404 No group with the given identifier; at No Content least one node not found in given group

190 Advanced topics 8.8.4.1.12 Lists all groups GET /customgroups

8.8.4.1.12.1 Parameters Table 38 Parameters

Type Name Description Required Schema

QueryParameter fields Fields to display false multi string array

8.8.4.1.12.2 Responses Table 39

HTTP Code Description Schema

200 Successful operation CustomGroup array

403 User is not allowed for No Content CUSTOM_GROUP_GET

8.8.4.1.13 Updates a set of existing custom groups PUT /customgroups

8.8.4.1.13.1 Parameters Table 40 Parameters

Type Name Description Required Schema Default

BodyParameter body Updated custom true CustomGroup groups definition array

QueryParameter nameAsId Use name instead of false boolean false UUID as identifier

8.8.4.1.13.2 Responses Table 41

HTTP Code Description Schema

200 Successful operation CustomGroup array

400 Invalid input supplied No Content

403 User is not allowed for No Content CUSTOM_GROUP_MODIFY

404 At least one supplied custom group No Content is not found

8.8.4.1.14 Creates one or multiple new custom groups POST /customgroups

8.8 Insight CMU REST API 191 8.8.4.1.14.1 Parameters Table 42 Parameters

Type Name Description Required Schema Default

BodyParameter body Custom groups true CustomGroup definition array

QueryParameter skipUnkownNode Do not fail on false boolean false unknown nodes

8.8.4.1.14.2 Responses Table 43

HTTP Code Description Schema

201 Custom groups Successfully added No Content

400 Invalid input supplied No Content

403 User is not allowed for No Content CUSTOM_GROUP_ADD

8.8.4.1.15 Deletes or archives a set of existing custom groups DELETE /customgroups

8.8.4.1.15.1 Parameters Table 44 Parameters

Type Name Description Required Schema Default

BodyParameter body Custom groups true MultipleIdentifierDto identifier

QueryParameter archive Archive instead of false boolean false delete

QueryParameter minDuration Minimal duration to false integer archive instead of delete

8.8.4.1.15.2 Responses Table 45

HTTP Code Description Schema

403 User is not allowed for No Content CUSTOM_GROUP_DELETE

404 At least one supplied custom groups No Content is not found

8.8.4.2 Image group operations

8.8.4.2.1 Gets one candidate of an existing image group GET /imagegroups/{identifier}/candidates/{node_id}

192 Advanced topics 8.8.4.2.1.1 Parameters Table 46 Parameters

Type Name Description Required Schema

PathParameter identifier Image group identifier true string

PathParameter node_id Candidate identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.2.1.2 Responses Table 47

HTTP Code Description Schema

200 Successful operation Node

403 User is not allowed for No Content IMAGE_GROUP_GET

404 No image group with the given No Content identifier; given candidate not found in given group

8.8.4.2.2 Removes one candidate from an existing image group DELETE /imagegroups/{identifier}/candidates/{node_id}

8.8.4.2.2.1 Parameters Table 48 Parameters

Type Name Description Required Schema

PathParameter identifier Image group identifier true string

PathParameter node_id Candidate identifier true string

8.8.4.2.2.2 Responses Table 49

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No image group with the given No Content identifier; given candidate not found in given image group

8.8.4.2.3 Lists all groups GET /imagegroups

8.8.4.2.3.1 Parameters Table 50 Parameters

Type Name Description Required Schema

QueryParameter fields Fields to display false multi string array

8.8 Insight CMU REST API 193 8.8.4.2.3.2 Responses Table 51

HTTP Code Description Schema

200 Successful operation ImageGroup array

403 User is not allowed for No Content IMAGE_GROUP_GET

8.8.4.2.4 Updates a set of existing image groups PUT /imagegroups

8.8.4.2.4.1 Parameters Table 52 Parameters

Type Name Description Required Schema Default

BodyParameter body Updated image true ImageGroup groups definition array

QueryParameter nameAsId Use name instead of false boolean false UUID as identifier

8.8.4.2.4.2 Responses Table 53

HTTP Code Description Schema

200 Successful operation ImageGroup array

400 Invalid input supplied No Content

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 At least one supplied image group is No Content not found

8.8.4.2.5 Creates one or multiple new image groups POST /imagegroups

8.8.4.2.5.1 Parameters Table 54 Parameters

Type Name Description Required Schema

BodyParameter body Image group definitions true ImageGroup array

8.8.4.2.5.2 Responses Table 55

HTTP Code Description Schema

201 Image groups Successfully added No Content

400 Invalid input supplied No Content

403 User is not allowed for No Content IMAGE_GROUP_ADD

194 Advanced topics 8.8.4.2.6 Deletes a set of existing image groups DELETE /imagegroups

8.8.4.2.6.1 Parameters Table 56 Parameters

Type Name Description Required Schema

BodyParameter body false MultipleIdentifierDto

8.8.4.2.6.2 Responses Table 57

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_DELETE

404 At least one supplied image groups No Content is not found

8.8.4.2.7 Gets a single group GET /imagegroups/{identifier}

8.8.4.2.7.1 Parameters Table 58 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.2.7.2 Responses Table 59

HTTP Code Description Schema

200 Successful operation ImageGroup

403 User is not allowed for No Content IMAGE_GROUP_GET

404 No group with the given identifier No Content

8.8.4.2.8 Updates an existing image group PUT /imagegroups/{identifier}

8.8.4.2.8.1 Parameters Table 60 Parameters

Type Name Description Required Schema

PathParameter identifier Image group identifier true string

BodyParameter body Updated image group true ImageGroup definition

8.8 Insight CMU REST API 195 8.8.4.2.8.2 Responses Table 61

HTTP Code Description Schema

200 Successful operation ImageGroup

400 Invalid input supplied No Content

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No image group found with the given No Content identifier

8.8.4.2.9 Deletes an existing image group DELETE /imagegroups/{identifier}

8.8.4.2.9.1 Parameters Table 62 Parameters

Type Name Description Required Schema

PathParameter identifier Image group identifier true string

8.8.4.2.9.2 Responses Table 63

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_DELETE

404 No image group found with the given No Content identifier

8.8.4.2.10 Gets all nodes of an existing group GET /imagegroups/{identifier}/nodes

8.8.4.2.10.1 Parameters Table 64 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.2.10.2 Responses Table 65

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content IMAGE_GROUP_GET

404 No group with the given identifier No Content

196 Advanced topics 8.8.4.2.11 Adds nodes to an existing group POST /imagegroups/{identifier}/nodes

8.8.4.2.11.1 Parameters Table 66 Parameters

Type Name Description Required Schema

PathParameter identifier true string

BodyParameter body false MultipleIdentifierDto

8.8.4.2.11.2 Responses Table 67

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No group with the given identifier; at No Content least one node not found

8.8.4.2.12 Removes some or all nodes from an existing group DELETE /imagegroups/{identifier}/nodes

8.8.4.2.12.1 Parameters Table 68 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Nodes identifier, or empty false MultipleIdentifierDto to remove all nodes

8.8.4.2.12.2 Responses Table 69

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No group with the given identifier; at No Content least one node not found in given group

8.8.4.2.13 Lists all candidates of an existing image group GET /imagegroups/{identifier}/candidates

8.8.4.2.13.1 Parameters Table 70 Parameters

Type Name Description Required Schema

PathParameter identifier Image group identifier true string

QueryParameter fields Fields to display false multi string array

8.8 Insight CMU REST API 197 8.8.4.2.13.2 Responses Table 71

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content IMAGE_GROUP_GET

404 No image group found with the given No Content identifier

8.8.4.2.14 Adds candidates to an existing image group POST /imagegroups/{identifier}/candidates

8.8.4.2.14.1 Parameters Table 72 Parameters

Type Name Description Required Schema

PathParameter identifier true string

BodyParameter body Candidates identifier true MultipleIdentifierDto

8.8.4.2.14.2 Responses Table 73

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No image group with the given No Content identifier; at least one node not found

8.8.4.2.15 Removes some or all candidates from an existing image group DELETE /imagegroups/{identifier}/candidates

8.8.4.2.15.1 Parameters Table 74 Parameters

Type Name Description Required Schema

PathParameter identifier Image group identifier true string

BodyParameter body Candidates identifier, or false MultipleIdentifierDto empty to remove all candidates

8.8.4.2.15.2 Responses Table 75

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No image group with the given No Content identifier; at least one candidate not found in given group

198 Advanced topics 8.8.4.2.16 Gets one node of an existing group GET /imagegroups/{identifier}/nodes/{node_id}

8.8.4.2.16.1 Parameters Table 76 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

PathParameter node_id Node identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.2.16.2 Responses Table 77

HTTP Code Description Schema

200 Successful operation Node

403 User is not allowed for No Content IMAGE_GROUP_GET

404 No group with the given identifier; No Content given node not found in given group

8.8.4.2.17 Removes one node from an existing group DELETE /imagegroups/{identifier}/nodes/{node_id}

8.8.4.2.17.1 Parameters Table 78 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

PathParameter node_id Node identifier true string

8.8.4.2.17.2 Responses Table 79

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No group with the given identifier; No Content given node not found in given group

8.8.4.2.18 Gets all features of a single group GET /imagegroups/{identifier}/features

8.8.4.2.18.1 Parameters Table 80 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

8.8 Insight CMU REST API 199 8.8.4.2.18.2 Responses Table 81

HTTP Code Description Schema

200 Successful operation object

403 User is not allowed for No Content IMAGE_GROUP_GET

404 No group with the given identifier No Content

8.8.4.2.19 Adds or modifies features of an existing group PUT /imagegroups/{identifier}/features

8.8.4.2.19.1 Parameters Table 82 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Features to be true Map added/modified

8.8.4.2.19.2 Responses Table 83

HTTP Code Description Schema

200 Successful operation object

400 Invalid input supplied No Content

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No group with the given identifier No Content

8.8.4.2.20 Remove all features of an existing group DELETE /imagegroups/{identifier}/features

8.8.4.2.20.1 Parameters Table 84 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

8.8.4.2.20.2 Responses Table 85

HTTP Code Description Schema

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 No group with the given identifier No Content

200 Advanced topics 8.8.4.3 Network group operations

8.8.4.3.1 Gets all features of a single group GET /networkgroups/{identifier}/features

8.8.4.3.1.1 Parameters Table 86 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

8.8.4.3.1.2 Responses Table 87

HTTP Code Description Schema

200 Successful operation object

403 User is not allowed for No Content NETWORK_GROUP_GET

404 No group with the given identifier No Content

8.8.4.3.2 Adds or modifies features of an existing group PUT /networkgroups/{identifier}/features

8.8.4.3.2.1 Parameters Table 88 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Features to be true Map added/modified

8.8.4.3.2.2 Responses Table 89

HTTP Code Description Schema

200 Successful operation object

400 Invalid input supplied No Content

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 No group with the given identifier No Content

8.8.4.3.3 Remove all features of an existing group DELETE /networkgroups/{identifier}/features

8.8.4.3.3.1 Parameters Table 90 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

8.8 Insight CMU REST API 201 8.8.4.3.3.2 Responses Table 91

HTTP Code Description Schema

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 No group with the given identifier No Content

8.8.4.3.4 Gets all nodes of an existing group GET /networkgroups/{identifier}/nodes

8.8.4.3.4.1 Parameters Table 92 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.3.4.2 Responses Table 93

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content NETWORK_GROUP_GET

404 No group with the given identifier No Content

8.8.4.3.5 Adds nodes to an existing group POST /networkgroups/{identifier}/nodes

8.8.4.3.5.1 Parameters Table 94 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Nodes identifier true MultipleIdentifierDto

8.8.4.3.5.2 Responses Table 95

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 No group with the given identifier; at No Content least one node not found

8.8.4.3.6 Removes some or all nodes from an existing group DELETE /networkgroups/{identifier}/nodes

202 Advanced topics 8.8.4.3.6.1 Parameters Table 96 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

BodyParameter body Nodes identifier, or empty false MultipleIdentifierDto to remove all nodes

8.8.4.3.6.2 Responses Table 97

HTTP Code Description Schema

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 No group with the given identifier; at No Content least one node not found in given group

8.8.4.3.7 Lists all groups GET /networkgroups

8.8.4.3.7.1 Parameters Table 98 Parameters

Type Name Description Required Schema

QueryParameter fields Fields to display false multi string array

8.8.4.3.7.2 Responses Table 99

HTTP Code Description Schema

200 Successful operation NetworkGroup array

403 User is not allowed for No Content NETWORK_GROUP_GET

8.8.4.3.8 Updates a set of existing network groups PUT /networkgroups

8.8.4.3.8.1 Parameters Table 100 Parameters

Type Name Description Required Schema Default

BodyParameter body Updated network true NetworkGroup groups definition array

QueryParameter nameAsId Use name instead of false boolean false UUID as identifier

8.8 Insight CMU REST API 203 8.8.4.3.8.2 Responses Table 101

HTTP Code Description Schema

200 Successful operation NetworkGroup array

400 Invalid input supplied No Content

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 At least one supplied network group No Content is not found

8.8.4.3.9 Creates one or multiple new network groups POST /networkgroups

8.8.4.3.9.1 Parameters Table 102 Parameters

Type Name Description Required Schema

BodyParameter body Network group definitions true NetworkGroup array

8.8.4.3.9.2 Responses Table 103

HTTP Code Description Schema

201 Network groups successfully added No Content

400 Invalid input supplied No Content

403 User is not allowed for No Content NETWORK_GROUP_ADD

8.8.4.3.10 Deletes a set of existing network groups DELETE /networkgroups

8.8.4.3.10.1 Parameters Table 104 Parameters

Type Name Description Required Schema

BodyParameter body Network groups identifier true MultipleIdentifierDto

8.8.4.3.10.2 Responses Table 105

HTTP Code Description Schema

403 User is not allowed for No Content NETWORK_GROUP_DELETE

404 At least one supplied network group No Content is not found

204 Advanced topics 8.8.4.3.11 Gets one node of an existing group GET /networkgroups/{identifier}/nodes/{node_id}

8.8.4.3.11.1 Parameters Table 106 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

PathParameter node_id Node identifier true string

QueryParameter fields Fields to display false multi string array

8.8.4.3.11.2 Responses Table 107

HTTP Code Description Schema

200 Successful operation Node

403 User is not allowed for No Content NETWORK_GROUP_GET

404 No group with the given identifier; No Content given node not found in given group

8.8.4.3.12 Removes one node from an existing group DELETE /networkgroups/{identifier}/nodes/{node_id}

8.8.4.3.12.1 Parameters Table 108 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

PathParameter node_id Node identifier true string

8.8.4.3.12.2 Responses Table 109

HTTP Code Description Schema

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 No group with the given identifier; No Content given node not found in given group

8.8.4.3.13 Gets a single group GET /networkgroups/{identifier}

8.8.4.3.13.1 Parameters Table 110 Parameters

Type Name Description Required Schema

PathParameter identifier Group identifier true string

QueryParameter fields Fields to display false multi string array

8.8 Insight CMU REST API 205 8.8.4.3.13.2 Responses Table 111

HTTP Code Description Schema

200 Successful operation NetworkGroup

403 User is not allowed for No Content NETWORK_GROUP_GET

404 No group with the given identifier No Content

8.8.4.3.14 Updates a existing network group PUT /networkgroups/{identifier}

8.8.4.3.14.1 Parameters Table 112 Parameters

Type Name Description Required Schema

PathParameter identifier Network group identifier true string

BodyParameter body Updated network group true NetworkGroup definition

8.8.4.3.14.2 Responses Table 113

HTTP Code Description Schema

200 Successful operation NetworkGroup

400 Invalid input supplied No Content

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 No network group found with the No Content given identifier

8.8.4.3.15 Deletes an existing network group DELETE /networkgroups/{identifier}

8.8.4.3.15.1 Parameters Table 114 Parameters

Type Name Description Required Schema

PathParameter identifier Network group identifier true string

8.8.4.3.15.2 Responses Table 115

HTTP Code Description Schema

403 User is not allowed for No Content NETWORK_GROUP_DELETE

404 No network group found with the No Content given identifier

206 Advanced topics 8.8.4.4 Node operations

8.8.4.4.1 Lists all nodes that are not in any image group GET /nodes/no_image

8.8.4.4.1.1 Parameters Table 116 Parameters

Type Name Description Required Schema

QueryParameter fields Fields to display false multi string array

8.8.4.4.1.2 Responses Table 117

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for NODE_GET No Content

8.8.4.4.2 Remove a set of nodes from their current image group POST /nodes/no_image

8.8.4.4.2.1 Parameters Table 118 Parameters

Type Name Description Required Schema

BodyParameter body Nodes identifier true MultipleIdentifierDto

8.8.4.4.2.2 Responses Table 119

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content IMAGE_GROUP_MODIFY

404 At least one supplied node is not No Content found

8.8.4.4.3 Gets a single node GET /nodes/{identifier}

8.8.4.4.3.1 Parameters Table 120 Parameters

Type Name Description Required Schema

PathParameter identifier Node identifier true string

8.8 Insight CMU REST API 207 8.8.4.4.3.2 Responses Table 121

HTTP Code Description Schema

200 Successful operation Node

403 User is not allowed for NODE_GET No Content

404 No node found with the given No Content identifier

8.8.4.4.4 Updates an existing node PUT /nodes/{identifier}

8.8.4.4.4.1 Parameters Table 122 Parameters

Type Name Description Required Schema

PathParameter identifier Node identifier true string

BodyParameter body Updated node definition true Node

8.8.4.4.4.2 Responses Table 123

HTTP Code Description Schema

200 Successful operation Node

400 Invalid input supplied No Content

403 User is not allowed for No Content NODE_MODIFY

404 No node found with the given No Content identifier

8.8.4.4.5 Deletes an existing node DELETE /nodes/{identifier}

8.8.4.4.5.1 Parameters Table 124 Parameters

Type Name Description Required Schema

PathParameter identifier Node identifier true string

8.8.4.4.5.2 Responses Table 125

HTTP Code Description Schema

403 User is not allowed for No Content NODE_DELETE

404 No node found with the given No Content identifier

208 Advanced topics 8.8.4.4.6 Lists all nodes GET /nodes

8.8.4.4.6.1 Parameters Table 126 Parameters

Type Name Description Required Schema

QueryParameter fields Fields to display false multi string array

8.8.4.4.6.2 Responses Table 127

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for NODE_GET No Content

404 At least one supplied node is not No Content found

8.8.4.4.7 Updates a set of existing nodes PUT /nodes

8.8.4.4.7.1 Parameters Table 128 Parameters

Type Name Description Required Schema Default

BodyParameter body Updated nodes true Node array definition

QueryParameter nameAsId Use name instead of false boolean false UUID as identifier

8.8.4.4.7.2 Responses Table 129

HTTP Code Description Schema

200 Successful operation Node array

400 Invalid input supplied No Content

403 User is not allowed for No Content NODE_MODIFY

404 At least one supplied node is not No Content found

8.8.4.4.8 Creates one or multiple new nodes POST /nodes

8.8.4.4.8.1 Parameters Table 130 Parameters

Type Name Description Required Schema

BodyParameter body Node definitions true Node array

8.8 Insight CMU REST API 209 8.8.4.4.8.2 Responses Table 131

HTTP Code Description Schema

201 Nodes successfully added No Content

400 Invalid input supplied No Content

403 User is not allowed for NODE_ADD No Content

8.8.4.4.9 Deletes a set of existing nodes DELETE /nodes

8.8.4.4.9.1 Parameters Table 132 Parameters

Type Name Description Required Schema

BodyParameter body Nodes identifier true MultipleIdentifierDto

8.8.4.4.9.2 Responses Table 133

HTTP Code Description Schema

403 User is not allowed for No Content NODE_DELETE

404 At least one supplied node is not No Content found

8.8.4.4.10 Gets all features of a single node GET /nodes/{identifier}/features

8.8.4.4.10.1 Parameters Table 134 Parameters

Type Name Description Required Schema

PathParameter identifier true string

8.8.4.4.10.2 Responses Table 135

HTTP Code Description Schema

200 Successful operation object

403 User is not allowed for NODE_GET No Content

404 No node found with the given No Content identifier

8.8.4.4.11 Adds or modifies features of an existing node PUT /nodes/{identifier}/features

210 Advanced topics 8.8.4.4.11.1 Parameters Table 136 Parameters

Type Name Description Required Schema

PathParameter identifier Node identifier true string

BodyParameter body Features to be true Map added/modified

8.8.4.4.11.2 Responses Table 137

HTTP Code Description Schema

200 Successful operation object

400 Invalid input supplied No Content

403 User is not allowed for No Content NODE_MODIFY

404 No node found with the given No Content identifier

8.8.4.4.12 Remove all features of an existing node DELETE /nodes/{identifier}/features

8.8.4.4.12.1 Parameters Table 138 Parameters

Type Name Description Required Schema

PathParameter identifier Node identifier true string

8.8.4.4.12.2 Responses Table 139

HTTP Code Description Schema

403 User is not allowed for No Content NODE_MODIFY

404 No node found with the given No Content identifier

8.8.4.4.13 Lists all nodes that are not in any network group GET /nodes/no_network

8.8.4.4.13.1 Parameters Table 140 Parameters

Type Name Description Required Schema

QueryParameter fields Fields to display false multi string array

8.8 Insight CMU REST API 211 8.8.4.4.13.2 Responses Table 141

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for NODE_GET No Content

8.8.4.4.14 Remove a set of nodes from their current network group POST /nodes/no_network

8.8.4.4.14.1 Parameters Table 142 Parameters

Type Name Description Required Schema

BodyParameter body Nodes identifier true MultipleIdentifierDto

8.8.4.4.14.2 Responses Table 143

HTTP Code Description Schema

200 Successful operation Node array

403 User is not allowed for No Content NETWORK_GROUP_MODIFY

404 At least one supplied node is not No Content found

8.9 Support for ScaleMP Insight CMU can be integrated to work with ScaleMP. To enable support for ScaleMP, add the following variable and setting to the /opt/cmu/etc/cmuserver.conf file: CMU_vSMP_PREFIX=vSMP_ This setting configures the prefix that is used to identify Insight CMU image group nodes that can be pxe-booted into the virtual SMP environment. The images that are associated with these image groups can be created with normal Insight CMU methods (such as autoinstall and cloning), or made diskless. The ScaleMP support is activated when nodes that are active members of an image group named "vSMP_*" are pxe-booted. If an image group named "vSMP_*" is a diskless image, then those nodes pxe-boot the diskless image into the virtual SMP environment. If an image group named "vSMP_*" is a disk-based image, then those nodes must be actively pxe-booted by selecting network boot in the boot menu. The first node listed as "active" in the image group named "vSMP_*" is the primary node in the virtual SMP environment. The remaining nodes are the secondary nodes. For more information, see the ScaleMP documentation.

212 Advanced topics 9 Support and other resources 9.1 Accessing Hewlett Packard Enterprise Support • For live assistance, go to the Contact Hewlett Packard Enterprise Worldwide website: www.hpe.com/assistance

• To access documentation and support services, go to the Hewlett Packard Enterprise Support Center website: www.hpe.com/support/hpesc

Information to collect • Technical support registration number (if applicable) • Product name, model or version, and serial number • Operating system name and version • Firmware version • Error messages • Product-specific reports and logs • Add-on products or components • Third-party products or components 9.2 Accessing updates • Some software products provide a mechanism for accessing software updates through the product interface. Review your product documentation to identify the recommended software update method. • To download product updates, go to either of the following: ◦ Hewlett Packard Enterprise Support Center Get connected with updates page: www.hpe.com/support/e-updates

◦ Software Depot website: www.hpe.com/support/softwaredepot

• To view and update your entitlements, and to link your contracts and warranties with your profile, go to the Hewlett Packard Enterprise Support Center More Information on Access to Support Materials page: www.hpe.com/support/AccessToSupportMaterials

IMPORTANT: Access to some updates might require product entitlement when accessed through the Hewlett Packard Enterprise Support Center. You must have an HP Passport set up with relevant entitlements.

9.3 Websites

Website Link

Hewlett Packard Enterprise Information Library www.hpe.com/info/enterprise/docs

Hewlett Packard Enterprise Support Center www.hpe.com/support/hpesc

9.1 Accessing Hewlett Packard Enterprise Support 213 Website Link

Contact Hewlett Packard Enterprise Worldwide www.hpe.com/assistance

Subscription Service/Support Alerts www.hpe.com/support/e-updates

Software Depot www.hpe.com/support/softwaredepot

Customer Self Repair www.hpe.com/support/selfrepair

Insight Remote Support www.hpe.com/info/insightremotesupport/docs

9.4 Customer self repair Hewlett Packard Enterprise customer self repair (CSR) programs allow you to repair your product. If a CSR part needs to be replaced, it will be shipped directly to you so that you can install it at your convenience. Some parts do not qualify for CSR. Your Hewlett Packard Enterprise authorized service provider will determine whether a repair can be accomplished by CSR. For more information about CSR, contact your local service provider or go to the CSR website: www.hpe.com/support/selfrepair 9.5 Remote support Remote support is available with supported devices as part of your warranty or contractual support agreement. It provides intelligent event diagnosis, and automatic, secure submission of hardware event notifications to Hewlett Packard Enterprise, which will initiate a fast and accurate resolution based on your product’s service level. Hewlett Packard Enterprise strongly recommends that you register your device for remote support. For more information and device support details, go to the following website: www.hpe.com/info/insightremotesupport/docs 9.6 Documentation feedback Hewlett Packard Enterprise is committed to providing documentation that meets your needs. To help us improve the documentation, send any errors, suggestions, or comments to Documentation Feedback ([email protected]). When submitting your feedback, include the document title, part number, edition, and publication date located on the front cover of the document. For online help content, include the product name, product version, help edition, and publication date located on the legal notices page. 9.7 Related information • The Insight CMU website: http://www.hpe.com/info/icmu

• Installation and user guides for your specific operating system. All Insight CMU documents are available from the Insight CMU website: http://www.hpe.com/info/icmu. Under Related Links, click Technical Support / Manuals→Manuals. and from: http://www.hpe.com/info/linux-cluster-docs. Click HP Cluster Management Utility Software→HP Insight Cluster Management Utility→Manuals. Insight CMU patches can be downloaded from: https://h20566.www2.hpe.com/portal/site/hpsc/public (Login via HP Passport)

NOTE: Patches must be installed in the proper order (ascending order).

214 Support and other resources 9.8 Support term For customers with support contracts for Insight CMU, Hewlett Packard Enterprise supports the current version and the previous version for a minimum of one year from the release of each version. For older versions of Insight CMU, customers must upgrade to obtain Hewlett Packard Enterprise support.

9.8 Support term 215 A Troubleshooting

Issues encountered while using Insight CMU can be classified as: • Network boot issues which affect cloning and backup • Backup specific issues • Cloning specific issues • Administration command issues • GUI specific issues A.1 Insight CMU logs Every Insight CMU command logs information in a dedicated log file. All log files are available in /opt/cmu/log. A.1.1 cmuserver log files When using the GUI, all actions executed on the cluster are sent to the cmuserver daemon running on the management node. The /opt/cmu/log/cmuserver-0.log file on the management node contains the current output of the cmuserver process. Results of the command /etc/init.d/cmu option are logged to /var/log/cmuservice_.log where hostname is the host name of the Insight CMU administrator node. A.1.2 Cloning log files All cloning log files are available on the management node. After cloning, the /opt/cmu/log/ cmucerbere-.log file contains the output of the cloning daemon running on the management node. This file contains a summary of the entire cloning process. The /opt/cmu/log/cmucerbere--.log file contains detailed cloning output for each node. NOTE: The node-specific cloning logs cmucerbere--.log are copied to the management node ~3 minutes after cloning completes.

A.1.3 Backup log files All backup log files are available on the management node. After backup, the /opt/cmu/log/ cmudolly-.log file contains the output of the backup client running on the management node. The /opt/cmu/log/cmudolly--.log file contains the output of the backup daemon running on the compute node. A.1.4 Monitoring log files On the management node, the /opt/cmu/log/MainMonitoringDaemon_MGTXXX.log file contains the output of the monitoring daemon running on the management node. On each compute node, the /opt/cmu/log/SmallMonitoringDaemon_NodeXXX.log file contains the output of the monitoring daemon running on the compute node. Designate one of the compute nodes as the secondary server for the network group. The /opt/cmu/log/ SecondaryServerMonitoring_NodeXXX.log file contains the output of the Insight CMU Secondary Server process. A.2 Network boot issues Nodes might not boot in the network mode due to one of the following causes: • A hardware issue on the node • A network issue with the network cable, switch port, or the NIC

216 Troubleshooting • An incorrect MAC address in the Insight CMU database • The Insight CMU configuration on the management node is lost. Troubleshooting switch issues 1. Verify that the management node pings the iLO and the nodes. 2. Verify that broadcast is enabled and is redirected to the switch. 3. Verify that the spanning tree is disabled on all ports connected to a node. 4. Verify that « multicast IGMP snoop loop » is disabled on the switch. During cloning, if the management node boots the secondary server in network mode but the secondary server cannot boot nodes in the network group: 1. Boot one node in network mode. 2. Log in using Cmuteam as the password. 3. Verify that the node can ping the management card of other nodes. 4. Verify that the node can ping other nodes. 5. If the network boot image is damaged, reinstall the Insight CMU rpm. A.2.1 Troubleshooting network boot 1. Open a console to the management card of the node. 2. Select the node in the Insight CMU main window and boot in network mode. 3. Does the node receive a DHCP response from the server? • If yes—Did the correct server respond? Check the IP address received by node. If so, proceed to the next step. • If no—Shut down the other server. Verify the configuration of DHCP and PXE. Verify that spanning tree is not enabled on the switch connected to the node. NOTE: For more information on selecting which network sends the node its DHCP address, see “Customizing Insight CMU netboot kernel arguments” (page 170).

4. Can the node download its kernel? • If yes—Proceed to the next step. • If no—Verify the tftp daemon configuration. 5. Can the node mount / with NFS? • If yes—The network image might be corrupted. Reinstall the Insight CMU rpm. • If no—Verify the NFS server is started. 6. Verify that /opt/cmu/ntbt/rp is exported to all nodes with NFS. A.2.2 The anatomy of the PXE boot process Table 144 (page 218) provides the detail for the PXE boot process. Most of the activity for the Insight CMU server can be found in /var/log/messages. Most of the activity for the compute nodes can be seen in the iLO Virtual Serial Port (VSP). Time starts at the first row then continues down.

A.2 Network boot issues 217 Table 144 PXE boot process

Process Insight CMU server Compute node Process

CMU DHCP server configuration and BMC restart Send boot next PXE to compute node BMC

Server BMC configures next boot action to PXE boot

Send power cycle

BIOS boots up, checks hardware, goes to boot menu BIOS PXE boot sends DHCPRequest with MAC address

dhcpd Receive DHCPRequest Match MAC address with DHCP entry Respond with IP, hostname

Receive DHCP response Send bootp request

Receive bootp request Respond with next-server, filename

tftpd Request filename from next-server through tftp

Send filename

exec() filename Bootloader Request PXE boot file through tftp

Send PXE boot file

Parse PXE boot file Request kernel through tftp

Send kernel

Request initrd through tftp (Optional—Insight CMU netboot has no initrd)

Send initrd

dhcpd exec()s kernel with initrd (or not) according to PXE Kernel boot file Kernel boots up, discovers hardware, enables NIC Send DHCPRequest with MAC address

Receive DHCPRequest Match MAC address with DHCP entry Respond with IP, hostname

Receive DHCP response Configure IP, and hostname (hostname may change later from filesystem config)

218 Troubleshooting NOTE: The filename program is the bootloader executable: • pxelinux.0 for legacy PXE boot • grub.efi for UEFI PXE boot A.3 Backup issues If you get the Could not get fstab message: 1. Verify that you can boot in network mode. 2. Verify that the device associated with the image is correct by running: # /opt/cmu/bin/cmu_show_image_groups -d The devices are as follows: • "sda" for SCSI or SATA disks • cciss/c*d* for Smart Array. If you have RHEL 6, see “HPE Smart Array warning with RHEL 7 and future Linux releases” (page 22). 3. Verify that the correct partition was chosen while launching the backup. 4. Reboot the node and run « df » to check the root partition number. 5. If you get the Backup failure message, verify that the hard drive on the management node is not full. A.4 Cloning issues If only one node cannot be cloned: 1. Verify that you can boot in network mode. 2. Verify that the node has the same hardware as other nodes. 3. Verify that the node does not have a hardware problem. 4. Power off manually, then relaunch cloning. If no nodes in a network group can be cloned: 1. Clone all nodes except the first node in the network group again. 2. Verify that you can boot in network mode. If no node in the cluster can be cloned: 1. Verify that you can boot in network mode. 2. Verify that the image is not corrupted using the tar –jtvf --force-local command on the archive files located in the /opt/cmu/image/ directory. 3. Repeat the backup to replace the existing image. 4. Verify that the partition is not full. IMPORTANT: Prior to backup, half the partition must be free. Otherwise, the cloning fails. A /boot partition is mandatory to avoid grub-install errors on some Red Hat releases.

5. Verify that an error is not in the reconf.sh file associated with the image. A.4.1 Error when cloning a Windows node while part of a Windows domain After deploying a Windows node (available only on specific Moonshot cartridges), the network configuration might not be applied, rendering the node unreachable. At startup, Insight CMU uses a group policy to automatically run the sysprep.exe process to customize the node. The \Windows\System32\GroupPolicy\Machine\Scripts\Startup\ CMUHostConfig.bat script executes this task. The configuration of the target node is not applied if the golden node was part of a domain while the golden image was captured and the domain is enforcing group policies. To solve this issue, the node must be removed from the domain before creating the golden image.

A.3 Backup issues 219 To workaround the issue of a domain-enabled golden image not deployed to a node, execute the following command from the cmd channel of the SAC: %SYSTEMROOT%\System32\sysprep\sysprep.exe /reboot /oobe /generalize /unattend:%SYSTEMROOT%\System32\unattend.xml A.5 Administration command problems To fix problems associated with administration commands: 1. Verify that rsh, telnet, or ssh is configured on nodes. 2. Verify that the rsh or ssh root is enabled with the node password to all nodes of the cluster. 3. Verify that the Insight CMU database contains the correct IP address and host name. A.6 GUI problems A.6.1 The Insight CMU GUI cannot be launched from browser 1. Clear the browser cache. 2. Clear the java cache using the Java Control Panel applet. • On Windows GUI client nodes: Control panel→java→General Tab→Temporary internet files→Settings→Delete files • On Linux GUI client nodes, run the appropriate tool (for example, jcontrol) to access the Java Control Panel. 3. If you receive a "Certificate Validation" error while launching the GUI, check the network settings in the Java Control Panel applet. If it is set to use "Browser Settings", browser proxy settings may be blocking the Insight CMU GUI launch. Try using "Direct connection" in the Java Control Panel.

• On Windows GUI client nodes: Control panel→java→General Tab→Network settings • On Linux GUI client nodes, run the appropriate tool (for example, jcontrol) to access the Java Control Panel. If you receive a "certificate expired" error while launching the Insight CMU GUI, add the Insight CMU management node IP address to the exception list in the Java Control Panel applet and launch the GUI again. 1. Control panel→java→Security→Exception Site list 2. Enter the management node IP address in the location field. 3. Click Add. 4. A security warning for HTTP location will appear. Click OK.

220 Troubleshooting 5. The site IP address is added to the exception list. Click OK. If the GUI cannot contact the remote Insight CMU service: 1. Verify that the Insight CMU service is running properly on the management node: # /etc/init.d/cmu status If not, restart the Insight CMU service: # /etc/init.d/cmu restart 2. Verify that the Insight CMU GUI on the remote workstation is connected to the correct server. 3. Verify the GENERAL_RMI_HOST setting in the cmu_gui_local_settings file on the remote workstation. 4. If cmuserver is running properly on the management node, verify the firewall configuration on the management node and the GUI workstation. Verify that RMI connections are allowed between the two hosts. If the GUI is running but the monitoring sensors are not updated: 1. Verify that the Insight CMU service is running properly on the management node: # /etc/init.d/cmu status If not, restart the Insight CMU service: # /etc/init.d/cmu restart 2. Verify that the host file of the nodes is configured properly. Each node must have access to the IP address of all nodes in the cluster. 3. Verify that rsh or ssh is enabled between all nodes of the cluster and the management node. All nodes must be able to execute commands as root for any other node without needing a password 4. Verify that the Insight CMU rpm is properly installed on all nodes. If the Insight CMU GUI is unable to start, the message "Failed to validate certificate" appears:

A.6 GUI problems 221 Figure 68 Certificate error

The detailed Java exception is: java.security.cert.CertPathValidatorException: java.security.InvalidKeyException: Wrong key usage Change the Java setting value. (The default value changed between Java 1.6u31 and Java 1.7u12.) If you are connected to the Internet, then activate Enable online certificate validation. If you are not connected to the Internet, then deactivate Enable online certificate validation.

Figure 69 Java control panel

On Windows GUI client nodes, go to System Preferences→Other→Java→Advanced→Enable online certificate validation. On Linux, run javaws -viewer in a shell, click the Advanced tab, then Enable online certificate validation.

TIP: If you still encounter problems, try toggling the setting.

222 Troubleshooting A.6.2 The GUI fails to launch when java.binfmt_misc service is running on SLES11 management nodes The java.binfmt_misc service is provided by jpackage-utils package. This jpackage-utils package is a dependency for the ibm java rpm. When the ibm java rpm is installed on SLES 11 distros, verify that the java.binfmt_misc service is not running. The java.binfmt_misc service must be disabled during boot-up on the management node, otherwise the GUI may fail to launch on SLES11 management nodes with the following java exception:

A.6.3 Launching the Insight CMU GUI on nodes with RHEL5u11 On GUI clients running RHEL 5u11, the GUI launch from Firefox may fail silently. In such cases, ensure that the following setting is enabled. 1. Run the appropriate tool (for example jcontrol ) to access the Java Control Panel. 2. Control Panel→Security 3. Enable Enable Java content in the browser. 4. Click Ok.

A.6 GUI problems 223 Insight CMU manpages

IMPORTANT: The terms logical group, network entity, and user group, have been replaced in both the CLI and GUI. • Logical Group is now Image Group. • Network Entity is now Network Group. • User Group is now Custom Group. In the CLI, the previously-named routines still exist. However, they will be deprecated in a future release. Table 145 (page 225) maps the previous routines to the renamed routines.

224 Insight CMU manpages Table 145 Component renaming

Previous routine Renamed routine

cmu_add_network_entity cmu_add_network_group

cmu_del_network_entity cmu_del_network_group

cmu_change_network_entity cmu_change_network_group

cmu_show_network_entities cmu_show_network_group

cmu_del_from_network_entity cmu_del_from_network_group

cmu_add_logical_group cmu_add_image_group

cmu_del_logical_group cmu_del_image_group

cmu_add_to_logical_group_candidates cmu_add_to_image_group_candidates

cmu_rename_logical_group cmu_rename_image_group

cmu_change_active_logical_group cmu_change_active_image_group

cmu_show_logical_groups cmu_show_image_groups

cmu_del_from_logical_group_candidates cmu_del_from_image_group_candidates

cmu_add_to_user_group cmu_add_to_custom_group

cmu_del_user_group cmu_del_custom_group

cmu_add_user_group cmu_add_custom_group

cmu_rename_archived_user_group cmu_rename_archived_custom_group

cmu_del_archived_user_groups cmu_del_archived_custom_groups

cmu_show_archived_user_groups cmu_show_archived_custom_groups

cmu_del_from_user_group cmu_del_from_custom_group

cmu_show_user_groups cmu_show_custom_groups

225 cmu_boot(8) NAME cmu_boot -- Boot nodes. SYNOPSIS # /opt/cmu/bin/cmu_boot -a | -n | -f | -g [-d ] DESCRIPTION Boot Insight CMU nodes. OPTIONS -a Boot all nodes. Nodes that are active in a diskless image group boot diskless by default. All other nodes boot to disk by default. -n Boot the given node or nodelist expression. cmu_expand_nodes must be able to parse the expression. -f A file containing the list of nodes to boot. -g Boot all nodes in the given group. The given group can be an Insight CMU network group, an image group, or a custom group. Nodes that are active in a diskless image group boot diskless by default. All other nodes boot to disk by default. This is true even if the given group is a diskless image group. -d Optional. Boot the given nodes into the given diskless image. The given nodes must be members of the given image. The keyword "CMU_NETBOOT" means to boot the nodes into the Insight CMU netboot image. The keyword "DISK" means boot the nodes to disk. Without this option, nodes that are active in a diskless image group boot diskless and all other nodes boot to disk.

226 cmu_show_nodes(8) NAME cmu_show_nodes -- Display a list of nodes and node attributes. SYNOPSIS # /opt/cmu/bin/cmu_show_nodes [-a | -n ] [-i] [-d] [-f ] [-o ] DESCRIPTION Display a list of Insight CMU nodes and node attributes. OPTIONS -a|--all show info for all nodes (default behavior) -n|--node= show info for a given node or set of nodes. To specify a range, use node[1-4]. -d|--display display column header -i|--info same as –o "%n %i %k %m %l %b %t %r %c %N %p %s %S %v %d" -f|--conffile Display info from a specified Insight CMU database file. The default is /opt/cmu/etc/ cmu.conf. -o|--output= display info in printf format where valid field specifications are: %n nodename (This is the default format) %l image group %i IP address %k netmask (address form) %K netmask (bitmask form) %m MAC address ('-' delimiter) %M MAC address (':' delimiter) %b BMC IP address %t BMC type %r architecture

227 %c (ILOCM only) cartridge number %N (ILOCM only) node number %p Platform %s Serial Port %S Serial Port Speed %v Vendor Args %d Cloning Block Device EXAMPLES Default behavior: # /opt/cmu/bin/cmu_show_nodes cn0004 n01 n02 n03 n04 To show details for a specific node: In this example, the string "default" is added into the output in the position of the image group. This generates output suitable for reading into the Insight CMU database using cmu_add_node in file mode using "default" as the image group. # /opt/cmu/bin/cmu_show_nodes -n n01 -o "%n %i %k %m default %b %t %r %c %N %p" n01 10.117.21.1 255.255.0.0 38-ea-a7-0f-2a-1a default 10.117.21.101 ILOCM x86_64 1 1 generic To show significant details for all nodes: # /opt/cmu/bin/cmu_show_nodes -a -o "%n %i %k %m %l %b %t %r %c %N %p" n01 10.117.21.1 255.255.0.0 38-ea-a7-0f-2a-1a RHEL 10.117.21.101 ILOCM x86_64 1 1 generic n02 10.117.21.2 255.255.0.0 38-ea-a7-0f-28-de RHEL 10.117.21.101 ILOCM x86_64 2 1 generic n03 10.117.21.3 255.255.0.0 38-ea-a7-0f-30-f8 ubuntu 10.117.21.101 ILOCM x86_64 3 1 generic n04 10.117.21.4 255.255.0.0 38-ea-a7-0f-48-2a SLES 10.117.21.101 ILOCM x86_64 4 1 generic

228 cmu_show_image_groups(8) NAME cmu_show_image_groups -- Display image groups or the nodes or device associated with a specific image group. SYNOPSIS # /opt/cmu/bin/cmu_show_image_groups # /opt/cmu/bin/cmu_show_image_groups [-d|-a|-t] # /opt/cmu/bin/cmu_show_image_groups -h DESCRIPTION Without arguments, cmu_show_image_groups displays a list of all image groups defined in Insight CMU. If is provided, the list of all candidate nodes in that image group is displayed. Specifying -d along with displays the backup and cloning device associated with that image group (for example, "sda"). OPTIONS -h|--help Show usage help -d Display the device associated with the image group (for example, "sda") -a Display all active nodes for the image group -t Display the backup date for the image group EXAMPLES To show all existing image groups in the Insight CMU database: # /opt/cmu/bin/cmu_show_image_groups default RH6U5 SLES11SP1 rhel7 To show all the candidate nodes of the image group named "rhel7". These nodes are candidates to receive the image held in that image group. If the image group holds an OS disk image (rather than "autoinstall" or "diskless"), the node images can be captured (backed up) to replace the current image group image. # /opt/cmu/bin/cmu_show_image_groups rhel7 node1 node2 node3 node4

229 cmu_show_network_groups(8) NAME cmu_show_network_groups -- Displays a list of Insight CMU network groups or the nodes in a specified network group. SYNOPSIS # /opt/cmu/bin/cmu_show_network_groups # /opt/cmu/bin/cmu_show_network_groups # /opt/cmu/bin/cmu_show_network_groups -h DESCRIPTION Without any options, cmu_show_network_groups displays a list of currently defined Insight CMU network groups. If a network group name is specified, a list of all the nodes in that network group is displayed. OPTIONS -h Display usage help. A network group name. If this is specified, a list of all the nodes in the network group is displayed. EXAMPLES To display a list of all the network groups defined in Insight CMU: To display a list of all the nodes in the network group rack1. # /opt/cmu/bin/cmu_show_network_groups rack1

230 cmu_show_custom_groups(8) NAME cmu_show_custom_groups -- Displays a list of Insight CMU custom groups or the nodes in a specified custom group. SYNOPSIS # /opt/cmu/bin/cmu_show_custom_groups # /opt/cmu/bin/cmu_show_custom_groups # /opt/cmu/bin/cmu_show_custom_groups -h DESCRIPTION Without any options, cmu_show_custom_groups displays a list of currently defined Insight CMU custom groups. If a custom group name is specified, a list of all the nodes in that custom group is displayed. OPTIONS -h Display usage help A custom group name. If this is specified, a list of all the nodes in that custom group is displayed. EXAMPLES To display a list of all the custom groups defined in Insight CMU: # /opt/cmu/bin/cmu_show_custom_groups To display a list of all the nodes in the custom group "group1". # /opt/cmu/bin/cmu_show_custom_groups group1

231 cmu_show_archived_custom_groups(8) NAME cmu_show_archived_custom_groups -- Show archived custom groups. SYNOPSIS # /opt/cmu/bin/cmu_show_archived_custom_groups [-h] | [-p] [-H] [-c] [-s separator] [-f] [-w width] DESCRIPTION Show archived custom groups. OPTIONS -h show help -p pretty print (add blank lines for readability) -H print headers -c print in column mode -s change separator (not available in column mode) -f dumps all information, not just custom group names -w column width for the displayed custom group names EXAMPLES # /opt/cmu/bin/cmu_show_archived_custom_groups job_allnodes custom_group_1 custom_group_2 custom_group_3 # /opt/cmu/bin/cmu_show_archived_custom_groups -p -f 2;job_allnodes;2013-02-09 06:22:50;2013-02-09 06:31:25 4;custom_group_1;2013-02-11 15:40:10;2013-02-11 15:44:35 5;custom_group_2;2013-02-11 15:40:10;2013-02-11 15:44:35 6;custom_group_3;2013-02-11 15:42:20;2013-02-11 15:44:35 # /opt/cmu/bin/cmu_show_archived_custom_groups -f -s" " 2 job_allnodes 2013-02-09 06:22:50 2013-02-09 06:31:25 4 custom_group_1 2013-02-11 15:40:10 2013-02-11 15:44:35 5 custom_group_2 2013-02-11 15:40:10 2013-02-11 15:44:35 6 custom_group_3 2013-02-11 15:42:20 2013-02-11 15:44:35 Return code is 0 if operation succeeds, or 1 if not successful.

232 cmu_add_node(8) NAME cmu_add_node -- Add nodes to the Insight CMU database. SYNOPSIS # /opt/cmu/bin/cmu_add_node <-h | -s | -i | -f filename> # /opt/cmu/bin/cmu_add_node -H|--hostname hostname -I|--ip ipaddress [-M|--mask netmask] [-A|--mac macaddress] [-L|--lg imagegroup] [-G|--mgt-ip mgtcardip] [-T|--mgt-card ILO|lo100i|ILOCM] [-R|--arch architecture] [-P|--platform platform] [-e|--serial-port serial_port ] [-E|--serial-port-speed serial_port_speed] [-V|--vendor-args vendor_args] [-D|--cloning-block-device cloning_block_device] [-C|--cartridge num] [-N|--node-number num] [-B|--bios-boot-mode ] [-Q|--cmumgt-node-ip ] [-g|--default-gw ] [-q|--iscsi-root ] DESCRIPTION Adds one or more nodes to the Insight CMU database. A node can be added interactively using the -i option or from the command line. Multiple nodes can be added using the -f filename option. OPTIONS -h|--help show help -s|--help-file-syntax show help for file syntax -i|--interactive specific node information is provided interactively -f|--filename inputfile Adds the nodes specified in inputfile to Insight CMU. To view the required file syntax, run /opt/cmu/bin/cmu_add_node with the –s option. Blank lines are not permitted in the file. -H|--hostname hostname compute node host name -I|--ip ipaddress compute node IP address -M|--mask netmask compute node netmask -A|--mac macaddress compute node MAC address -L|--lg imagegroup compute node image group -G|--mgt-ip mgtcardip compute node management card IP address -T|--mgt-card ILO|lo100i|ILOCM specifies the management card ilo type. ILO, LO100i, or ILOCM.

233 -R|--arch architecture node architecture (for example, "x86_64") -P|--platform platform node platform type (for example, "generic") -e|--serial-port serial_port node serial port -E|--serial-port-speed serial_port_speed node serial port speed -V|--vendor-args vendor_args node vendor args -D|--cloning-block-device cloning_block_device node cloning block device -C|--cartridge num (ILOCM only) cartridge number within chassis -N|--node-number num (ILOCM only) node number within the cartridge -B|--bios-boot-mode Indicates the node’s boot mode: pxe, uefi, or auto. If auto is specified, Insight CMU determines the boot mode. -Q|--cmumgt-node-ip Indicates the IP address that this node should use to access the Insight CMU management node. If default is specified, the address defaults to the management node IP address. -g|--default-gw Indicates the IP address that this node should use as the default gateway. If cmumgt is specified, the value specified in -Q|--mgt-node-ip is used. If default is specified, Insight CMU sets the IP address to the gateway of the Insight CMU management node, if configured. Otherwise it uses the IP address of the management node. -q|--iscsi-root Indicates the iscsi root boot string for this node. (Reserved for future iSCSI root support.) EXAMPLES Command-line mode: # /opt/cmu/bin/cmu_add_node -H cn0006 -I 16.16.184.116 -M 255.255.254.0 -A 00-02-A5-52-EB-F8 -L default -G 192.168.0.1 -T ILO -R x86_64 -P generic processing 1 node ... Interactive mode: In interactive mode, you are prompted for node parameters: # /opt/cmu/bin/cmu_add_node -i hostname> n10 ip address> 10.10.184.116 netmask> 255.255.0.0 mac address> []> 00-1C-C4-79-35-83 architecture [x86_64]> platform [generic]> serial port [default]> serial port speed [default]> vendor args [default]> cloning block device [default]> bios boot mode [auto]> CMU server IP address [default]>

234 default gateway [default]> ISCSI root [none]> mgtcard> ILO mgtcard ip address> 10.20.184.116 processing 1 node ... Adding multiple nodes using a file: The input file must have one line per node. #1 is the hostname #2 is the IP address #3 is the netmask #4 is the ethernet mac address #5 is the active image group (usually default) #6 is the management IP address or 0.0.0.0 if None #7 is the management card type (ex. ILO, lo100i, ILOCM, or None) #8 is the architecture (ex. x86_64) #9 is the cartridge/metadata number (ILOCM and user-provided BMCs only) #10 is the node number (ILOCM and user-provided BMCs only) #11 is the platform (ex. generic) #12 is the serial port (ex. ttyS0) #13 is the serial port speed #14 is the vendor args. Enclose in double quotes if spaces or tabs are used #15 is the cloning block device #16 is the bios boot mode (auto|pxe|uefi) #17 is the CMU server IP address as seen from the compute node (default|plain IP address) #18 is the default gateway IP address (default|cmumgt|valid IP address) #19 is the iscsi root device (none| ISCSI string) # cat nodes.txt cn0001 10.117.20.64 255.255.0.0 C8-CB-B8-CD-20-EA default 10.117.20.164 ILO x86_64 -1 -1 generic default default default default pxe default default none cn0002 10.117.20.66 255.255.0.0 C8-CB-B8-CD-20-12 default 10.117.20.166 ILO x86_64 -1 -1 generic default default default default pxe default default none cn0003 10.117.20.68 255.255.0.0 C8-CB-B8-CB-E0-EA default 10.117.20.168 ILO x86_64 -1 -1 generic default default default default pxe default default none cn0004 10.117.20.70 255.255.0.0 C8-CB-B8-CD-20-6A default 10.117.20.170 ILO x86_64 -1 -1 generic default default default default pxe default default none # /opt/cmu/bin/cmu_add_node -f nodes.txt processing 4 nodes...

235 cmu_add_network_group(8) NAME cmu_add_network_group -- Creates one or more empty network groups in Insight CMU. SYNOPSIS # /opt/cmu/bin/cmu_add_network_group [...] # /opt/cmu/bin/cmu_add_network_group -f|--filename # /opt/cmu/bin/cmu_add_network_group -h DESCRIPTION Creates one or more empty network groups in Insight CMU. Use cmu_change_network_group to add nodes to a network group. Network groups are used by Insight CMU to define the topology of the network connecting the compute nodes. All nodes attached to the same leaf switch are typically added to the same network group. This allows Insight CMU to efficiently manage node imaging and monitoring. OPTIONS -h Display usage help. -f|--filename A file containing a list of network groups to create. The names must be one per line. [...] A white-space separated list of network groups to create. Must be used separately from the –f option. EXAMPLES To create a new network group named rack5: # cmu_add_network_group rack5 To create the network groups rack1, rack2, rack3 and rack4: # cat net-groups.txt rack1 rack2 rack3 rack4 # /opt/cmu/bin/cmu_add_network_group -f net-groups.txt

236 cmu_add_image_group(8) NAME cmu_add_image_group -- Add a new image group to Insight CMU. SYNOPSIS # /opt/cmu/bin/cmu_add_image_group —n -d # /opt/cmu/bin/cmu_add_image_group —n -d diskless<--toolkit> -k |CURRENT -l # /opt/cmu/bin/cmu_add_image_group —n -d autoinstall -r -t