SunTM HPC Software, Edition 2.0.1

Release Notes

Sun Microsystems, Inc. www.sun.com Part No. 821-0376-10 July 2009

Copyright © 2009 , Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.

U.S. Government Rights - Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and applicable provisions of the FAR and its supplements.

This distribution may include materials developed by third parties.

Sun, Sun Microsystems, the Sun logo, and are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.

Products covered by and information contained in this service manual are controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries. Nuclear, missile, chemical biological weapons or nuclear maritime end uses or end users, whether direct or indirect, are strictly prohibited. Export or reexport to countries subject to U.S. embargo or to entities identified on U.S. export exclusion lists, including, but not limited to, the denied persons and specially designated nationals lists is strictly prohibited.

DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

This product includes source code for the Berkeley Database, a product of Sleepycat Software, Inc. Your development of software that uses the Berkeley Database application programming interfaces is subject to additional licensing conditions and restrictions imposed by Sleepycat Software Inc. Sun Microsystems, Inc 3

Table of Contents

Overview...... 5 New in Release 2.0.1...... 5 New in Release 2.0...... 5 List of Contents...... 7 Tested Platforms...... 9 User Documentation...... 9 Support...... 10 Additional Information...... 10 Notes and Open Issues...... 11 Resolved Issues...... 13 4 Sun Microsystems, Inc Sun Microsystems, Inc 5

Overview

Sun HPC Software, Linux Edition (“Sun HPC Software”), is an integrated open-source software solution for Linux-based HPC clusters running on Sun hardware. It provides a framework of software components to simplify the process of deploying and managing large-scale Linux HPC clusters. For more information, visit the Sun HPC Software, Linux Edition product page at www.sun.com/software/products/hpcsoftware.

New in Release 2.0.1 Package upgrades. The following packages have been upgraded in this release:

• perfctr 2.6.39

• ClusterTools 8.2

• OFED 1.4.1

• Sun Grid Engine 6.2u3

• PowerMan 2.3.5

New in Release 2.0 Support for SLES 10 sp2. SUSE Linux Enterprise Server (SLES) is now supported for use on the head node and for provisioning to the nodes in the cluster.

Sun Grid Engine available. Sun Grid Engine (SGE) is an open source batch-queuing system developed and supported by Sun Microsystems. SGE 6.2-2 is available on the Sun HPC Software ISO and through the Sun HPC Software repository.

Sun Cluster Tools integration. Sun HPC ClusterTools 8.1 software is an integrated toolkit that allows developers to create and tune Message-passing Interface (MPI) applications that run on high performance clusters and SMPs. The ClusterTools libraries have been pre-compiled for most of the commonly used and included in the Sun HPC Software distribution.

Sun Studio support. Sun Studio 12u1 software delivers high-performance , C++, and compilers for Linux and other operating systems, including support for multi-core architectures. All MPI libraries included with the Sun HPC Software have been pre-compiled with the Sun Studio compilers for customer deployment convenience. The Sun Studio installation file has been included in the Sun HPC Software ISO image for convenience. However, it is not available through the Sun HPC Software online repository. 6 Sun Microsystems, Inc

MPI libraries pre-built with Pathscale and Sun Studio x86 compilers. All MPI libraries included in Sun HPC Software have been pre-built with Pathscale and Sun Studio x86 compilers for those customers who use the Pathscale and Sun Studio products.

Sun HPC Software Management Database for easier configuration and management. A cluster management database and related tools have been added in this release. This database can be used to hold cluster configuration information. Tools have been developed to generate configuration files automatically for several key components in the Sun HPC Software. These include configuration files for Cfengine, Cobbler, ConMan, Genders, ntp, PowerMan, SLURM, and hosts files. More configuration files will be added in future releases.

Simplified installation and setup. Two new scripts, sunhpc_install and sunhpc_setup, have been added to eliminate the mundane aspects of deploying and configuring the management (head) node. These scripts can be used to prepare the cluster for both diskful and diskless compute node deployments.

Package upgrades. Many of the base packages included in the Sun HPC Software have been updated to the latest version. See the “List of Contents” in the next section for the specific version of each package included.

Lustre 1.8.0.1. The latest Lustre release has been included. This version of Lustre has added OST Pools support, server-side read caches, version-based recovery, and many performance improvements. Sun Microsystems, Inc 7

List of Contents The Sun HPC Software, Linux Edition 2.0.1 release has been tested on Linux-based HPC clusters with a head node running RHEL 5.3, SLES 10 SP2, and CentOS 5.3. Group Name Version Operating Systems and Kernels Lustre 1.8.0.1 perfctr 2.6.39 User Space and Library Environment Switcher 1.0.13 fakeroot 1.6.4 genders 1.11 git 1.6.1.3 Heartbeat 2.1.4-2.1 Sun HPC Software, Linux Management 0.0.7 Database ("gtdb") Modules 3.2.6 MVAPICH for gcc, PGI and 1.1 MVAPICH2 for gcc, PGI and Intel 1.2p1 compiler OFED 1.4.1 RRDtool 1.2.30 Sun Clustertools 8.2 Sun Studio 12u1 Verification Modules HPCC Bench Suite 1.3.1 IOKit (Lustre) - IOR 2.10.2 LNET self test (Lustre) - NetPIPE 3.7.1

Schedulers Slurm/MUNGE 1.3.13/0.5.8 Sun Grid Engine 6.2u3 Monitoring Ganglia 3.1.1 Nagios 3.1.0 Provisioning OneSIS 2.0.1 Cobbler 1.4.1 8 Sun Microsystems, Inc

Schedulers Slurm/MUNGE 1.3.13/0.5.8 Management CFEngine 2.2.9 Conman 0.2.3 FreeIPMI 0.7.5 IPMItool 1.8.10 lshw B.02.14 Mellanox Firmware Tools 2.5.0 OpenSM 3.1.11 pdsh 2.18 Powerman 2.3.5

Compatible Tools The table below lists software tools that will work with the Sun HPC Software, Linux Edition 2.0. These tools are not included in the software stack but can be used as alternatives to the tools provided in this release of the Sun HPC Software. Group Name Version Operating Systems and Kernels - - User Space Library Allinea DDT 2.2.1 Intel Compiler 10.1 PathScale Compiler 3.2 PGI Compiler 7.1-4 TotalView 8.6 Verification Modules - - Schedulers LSF - MOAB - PBS - Management IBSRM - xVM Ops Center 2.1 Sun Microsystems, Inc 9

Tested Platforms The Sun HPC Software, Linux Edition 2.0.x has been tested on the latest Sun high-performance platforms including:

V20z Server: www.sun.com/servers/entry/v20z/

• Sun Fire V40z Server: www.sun.com/servers/entry/v40z/

• Sun Fire X4150 Server: www.sun.com/servers/x64/x4150/

• Sun Fire X4200 M2 Server: www.sun.com/servers/entry/x4200/

• Sun Fire X4250 Server: www.sun.com/servers/x64/x4250/

• Sun Fire X4270 Server: www.sun.com/servers/x64/x4270/

• Sun Fire X4440 Server: www.sun.com/servers/x64/x4440/

• Sun Fire X4540 Storage Server: www.sun.com/servers/x64/x4540/

• Sun Fire X4600 Server: www.sun.com/servers/x64/x4600/

• Sun Storage J4400 Array: www.sun.com/storage/disk_systems/expansion/4400/

• Sun Storage J4500 Array: www.sun.com/storage/disk_systems/expansion/4500/

8000 Modular System: www.sun.com/servers/blades/8000/

• Sun Blade X6220 Server Module: www.sun.com/servers/blades/x6220/

• Sun Blade X6250 Server Module: www.sun.com/servers/blades/x6250/

• Sun Blade X6275 Server Module: www.sun.com/servers/blades/x6275/

• Sun Blade X6440 Server Module: www.sun.com/servers/blades/x6440/

• Sun Blade X6450 Server Module: www.sun.com/servers/blades/x6450

User Documentation For installation instructions, see the Sun HPC Software, Linux Edition 2.0.1: Deployment and User Guide in the /docs directory. For updated versions of the Deployment and User Guide, go to http://docs.sun.com/app/docs/prod/hpc.grid#hic and select "Sun HPC Software". 10 Sun Microsystems, Inc

Support Standard Service is now available for the Sun HPC Software, Linux Edition 2.0.x product. This service is designed specifically to support several open source communication, management, and test tools and schedulers that are part of the Sun HPC Software. HPC customers can download the software and then contract with Sun Services for standard support for these products. Please contact your Sun representative for details and pricing for this service.

Community-based support through mailing lists and a defect tracking system is also still available. The mailing lists and defects are monitored by development teams within Sun to facilitate problem resolution and provide answers to product questions.

To subscribe to [email protected], go to http://lists.lustre.org/mailman/listinfo and complete the subscription form.

If you believe an issue is a defect in the Sun HPC Software product, go to https://bugzilla.lustre.org/ and enter a new defect against the Linux Software Stack. This website requires that you create a user account to enter a new defect or modify an existing defect.

Additional Information Visit the following sites for more information about these Sun products.

• Sun Grid Engine: www.sun.com/software/gridware/index.xml

• Sun Studio 12: http://developers.sun.com/sunstudio/

• Sun HPC Cluster Tools 8: www.sun.com/software/products/clustertools/index.xml

• Sun xVM Ops Center: www.sun.com/software/products/xvmopscenter/index.jsp Sun Microsystems, Inc 11

Notes and Open Issues The following issues have been deferred until a future release of the Sun HPC Software. For details, enter the bug number shown below in https://bugzilla.lustre.org/.

• 17462: Disable NCQ and write cache for disks used for SW RAID

• 18461: Inclusion of syslog-ng package

• 18583: Add dstat to hpc stack

• 18952: sunhpc_setup needs to take ib interfaces from cluster database and automatically configure interfaces

• 19049: Add support of InfiniBand interface on gtdb

• 19050: Support IB and Ethernet on different subnets

• 19330: Add nagios support to gtdb

• 19463: Consider improving stacks interaction with iptables

• 19468: SLES diskless client failed to boot over IB

• 19619: PowerMan query does not appear to work on SLES Sun 4600 box

• 19632: Format of conman.pswd is unclear

• 19637: ConMan doesn't work on older nodes

• 20185: CT8.2 renaming

• 20196: oneSIS warning if apparmor is not installed

• 20250 CentOS 5.3 diskless provisioned clients do not have correct time

Bug 500965: Anaconda does not start network properly when IP was set with dhcp and hostname was set with a static name

Boot over Infiniband - There are known issues with the gPXE implementation used in the Boot over IB (BoIB) solution. gPXE-over-IB is not robust enough on the Sun x6275 (Vayu) hardware to boot groups of nodes larger than 50–100 at the same time. The retry functionality also seems to frequently fail, thus exacerbating the original issue. This is a Mellanox issue, and can be reproduced at a Sun customer facility. At the time this document was written, Mellanox is attempting to reproduce the issue in their lab.

SLES error when apparmor not installed. If apparmor, which should be installed by default, is not installed on the SLES headnode, oneSIS may generate the following error: 12 Sun Microsystems, Inc

Converting sles10sp2-onesis-lustre to oneSIS rootfs image ... oneSIS: Warning(1)! Path doesn't exist: /var/lib/oneSIS/image/sles10sp2-onesis- lustre/etc/apparmor * * * Error preparing ramdisk element: /etc/apparmor

Tests have shown that this error has no performance impact. An alternate approach to configuring the SLES image is under development and planned for a future release of the Sun HPC Software, Linux Edition.

Username/password needed to access Sun ILOM Service Processor. To correctly access Sun ILOM Service Processors, the correct SP username/password must be added to /etc/ipmipower.conf after the Sun HPC Software has been installed. Ensuring that a valid set of credentials is present will enable ConMan and Powerman to work correctly. Sample contents of ipmipower.com are shown below:

#hostname yournode0[2-3]-ilom # client's hostname #workaround-flags "endianseq" # work-around flags for ILOM #on-if-off enable # #wait-until-on enable # #wait-until-off enable # username root # ILOM's username password changeme # ILOM's password

Manual setup required for ntpd for diskless clients. Currently, automatic setup of ntpd for diskless clients is not provided. This means that when a diskless client is booted, the time on that system may be incorrect. To complete a manual setup, copy /var/lib/sunhpc/cfengine/ntpd.conf into the diskless image directory /var/lib/oneSIS/image/. If a server has not been defined, define a server that the clients can access when booting to get the correct time. For more information, see the man pages for ntp and ntp.conf Sun Microsystems, Inc 13

Resolved Issues The following bugs were fixed in Release 2.0. For details, enter the bug number shown below in https://bugzilla.lustre.org/.

•17012: OpenSM doesn't work with chkconfig

•17013: Defaults for ipoib interface setup aren't ideal

•18163: Please provide a checksum of some sort for the downloadable ISO

•18299: Need openmpi 1.3.1 to have mpi use deadlock-free routes

•18322: Sun HPC Software, Linux Edition Installation Fails

•18330: Add a comment when autogenerating pxelinux config

•18384: Support more Ethernet driver in onesis_setup

•18407: Provisioning scalability investigation for cobbler

•18523: Need to explain the procedure when there is no RHN license

•18527: PowerMan does not report status correctly

•18529: open-icsci RPM cannot be removed

•18642: SLURM built with support

• 19648: yum search is broken on SLES

• 20176: oneSIS upgrades clobber sysimage.conf

• 20182: load ClusterTools_gnu module by default

• 20197: add support for RHEL/CentOS kernel 2.6.18-128.2.1.el5

Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 USA * Phone: 1-650-960-1300 or 1-800-555-9SUN (9786) Web: sun.com