Compute Node Linux
Total Page:16
File Type:pdf, Size:1020Kb
Redpaper Brant Knudson Jeff Chauvin Jeffrey Lien Mark Megerian Andrew Tauferner IBM System Blue Gene Solution: Compute Node Linux Overview This IBM® Redpaper publication describes the use of compute node Linux® on the IBM System Blue Gene® Solution. Readers of this paper need to be familiar with general Blue Gene/P™ system administration and application development concepts. The normal boot process for a Blue Gene/P partition loads Linux on the I/O nodes and loads the Blue Gene/P Compute Node Kernel (CNK) on the compute nodes. This standard configuration provides the best performance and reliability for running applications on Blue Gene/P. The lightweight CNK provides a subset of system calls and has tight control over the threads and processes that run on the node. Thus, using the CNK provides very little interference with the applications that run on the compute nodes. Blue Gene/P release V1R3 provides compute node Linux, a new feature that allows users to run Linux on the compute nodes. This feature provides a new means for research and experimentation by allowing all of the compute nodes in a Blue Gene/P partition to operate independently with a full Linux kernel. While this environment is not optimal for running high-performance applications or running highly parallel applications that communicate using the Message Passing Interface (MPI), you might have an environment where you want Blue Gene/P to act like a large cluster of nodes that are each running Linux. The term High-Throughput Computing (HTC) is used to describe applications that are running in this environment. HTC was introduced as a software feature in V1R2 of Blue Gene/P. Using HTC allows all nodes to act independently, each capable of running a different job. The new compute node Linux feature provides benefits that can make it easier to develop or port applications to Blue Gene/P. The biggest benefit that application developers might notice when running Linux on the compute nodes is that there is no restriction on the number of threads that are running on a node. There is also no restriction on the system calls, because the compute nodes are running Linux rather than the CNK. This feature opens Blue Gene/P to potential new applications that can take advantage of the full Linux kernel. © Copyright IBM Corp. 2009. All rights reserved. ibm.com/redbooks 1 Because this feature is enabled for research and is not considered a core feature of the Blue Gene/P software stack, it has not been tested under an exhaustive set of circumstances. IBM has performed extensive testing of the compute node Linux functionality that is within the core Blue Gene/P software stack. However, because having a completely functional Linux kernel on the compute nodes allows many new types of applications to run on the compute nodes, we cannot claim to have tested every scenario. To be specific, IBM has not formally tested this environment with external software such as IBM General Parallel File System (GPFS™), XL compilers, Engineering Scientific Subroutine Library (ESSL), or the HPC Toolkit. Because of the experimental nature of compute node Linux, it is disabled by default. You cannot boot a partition to run Linux on the compute nodes until you receive an activation key from IBM. To request an activation key, contact your assigned Blue Gene/P Technical Advocate. The Technical Advocate can help determine whether you qualify for the key and, upon qualification, can assist with the additional agreements that you need in place. This function is not restricted, and there is no fee. So, each customer can individually contact IBM for a key. This paper discusses the following topics: How compute node Linux works System administration Using compute node Linux Application development Job scheduler interfaces Performance results How compute node Linux works The same Linux image that is used on the I/O nodes is loaded onto the compute nodes when a block is booted in compute node Linux mode. This kernel is a 32-bit PPC SMP kernel having a version of 2.6.16.46 (later releases of Blue Gene/P might have a newer version of the kernel). Other than a runtime test that determines the node type in the network device driver that uses the collective hardware, the kernel behaves identically on I/O nodes and compute nodes. The same init scripts execute on both the compute nodes and I/O nodes when the node is booted or shut down. To indicate to the Midplane Management Control System (MMCS) that the same kernel is running on the compute nodes as is running on the I/O nodes, the partition’s compute node images must be set to the Linux images. The Linux boot images are located in the cns, linux, and ramdisk files in the directory path /bgsys/drivers/ppcfloor/boot/. When booting the partition, you also need to tell MMCS to handle the boot differently than it does with CNK images. You can indicate to MMCS to use compute node Linux by setting the partition’s options field to l (lower-case L). You can make both of these changes from either the MMCS console or the job scheduler interface as we describe in “Job scheduler interfaces” on page 14. When you boot a partition to use compute node Linux, the compute nodes and I/O node in each pset have an IP interface on the collective network through which they can communicate. Because the compute nodes are each assigned a unique IP address, each can be addressed individually by hosts outside of the Blue Gene/P system. This number of IP addresses can be potentially a very large number of IP addresses, given that each rack of Blue Gene/P hardware contains between 1032 and 1088 nodes. To accommodate a high number of addresses, a private class A network is recommended, although a smaller network 2 IBM System Blue Gene Solution: Compute Node Linux might suffice depending on the size of your Blue Gene/P system. When setting up the IP addresses, the compute nodes and I/O nodes must be on the same subnet. Unlike I/O nodes, the compute nodes have physical access only to the collective network. To provide compute nodes access to the functional network, the I/O nodes use proxy ARP to intercept IP packets that are destined for the compute nodes. Every I/O node acts as a proxy for the compute nodes in its pset, replying to ARP requests that are destined to compute nodes. This response establishes automatically a proper routing to the pset for every compute node. With proxy ARP, both IP interfaces on the I/O nodes have the same IP address. The I/O node establishes a route for every compute node in its pset at boot time. Compute node Linux uses the Blue Gene/P HTC mode when jobs run on a partition. Using HTC allows all nodes to act independently, where each node can potentially run a different executable. HTC is often contrasted with the long-accepted term High Performance Computing (HPC). HPC refers to booting partitions that run a single program on all of the compute nodes, primarily using MPI to communicate to do work in parallel. Because the compute nodes are accessible from the functional network after the block is booted and because it is running the services that typically run on a Linux node, a user can ssh to the compute node and run programs from the command prompt. Users can also run jobs on the compute nodes using Blue Gene/P’s HTC infrastructure through the submit command. System administration This section describes the installation and configuration tasks for compute node Linux that system administrators must perform and the interfaces that are provided for these tasks. Installation To install compute node Linux, you (as the system administrator) must enter the activation key into the Blue Gene/P database properties file. The instructions that explain how to enter the key are provided with the activation key, which is available by contacting your Blue Gene Technical Advocate). If the database properties file does not contain a valid activation key, then blocks will fail to boot in Linux mode with the following error message: boot_block: linux on compute nodes not enabled You need to set the IP addresses of the compute nodes using the Blue Gene/P database populate script, commonly referred to as DB populate. If the IP addresses are not set, then blocks will fail to boot in Linux mode. When setting the compute node IP addresses, you need to tell DB populate the IP address at which to start. You can calculate this value from the last IP address for the I/O node by adding 1 to both the second and fourth octets in the IP address (where the first octet is on the left). To get the IP address for the last I/O node, run the following commands on the service node: $ . ~bgpsysb/sqllib/db2profile $ db2 connect to bgdb0 user bgpsysdb (type in the password for the bgpsysdb user) $ db2 "select ipaddress from bgpnode where location = (select max(location) from bgpnode where isionode = 'T')" IBM System Blue Gene Solution: Compute Node Linux 3 The output of the query looks similar to the following example: IPADDRESS -------------- 172.16.100.64 So, if you add 1 to the second and fourth octets of this IP address, you have an IP address for DB populate of 172.17.100.65. After you have the IP address, invoke the DB populate script using the following commands: $ . ~bgpsysb/sqllib/db2profile $ cd /bgsys/drivers/ppcfloor/schema $ ./dbPopulate.pl --input=BGP_config.txt --size=<size> --dbproperties /bgsys/local/etc/db.properties --cniponly --proceed --ipaddress <ipaddress> In this command, replace <size> with the dimensions of your system in racks using <columns> x <rows>, and replace <ipaddress> with the IP address that you just calculated.