Compute Node OS for the XT3 Jim Harrell Cray Inc

Compute Node OS for the XT3 Jim Harrell Cray Inc ABSTRACT: Cray is working on different kernels for the compute node operating system. This talk will describe the rationale, requirements, and progress. KEYWORDS: Light Weight Kernel, Linux, XT a review of options for a CNOS in the 1. Introduction near future. The talk upon which this paper is based A review of the current CNOS, was designed to discuss the purpose and Catamount, provides the basis for issues around a small project working judging alternatives. Catamount is well on common Compute Node Operating understood and has set the standard on Systems, CNOS, for Several Cray XT for what a CNOS can be expected to products. The goal is to apprise the do. audience of the status of the work and next steps, but also to ensure that The current work on an alternative CNOS there is an understanding of the has focused on a Linux kernel. This has complexities that this project faces. caused a lot of discussion both For the purpose of this discussion we positive and negative. The use of a will use the CNOS term for the kernel Linux kernel for this project is not to that runs on a compute node. There are be construed as an intention to run a variety of other terms like Light full Linux on compute nodes. The Linux Weight Kernel, LWK, that would be kernel is modified to only support equally useful – but rather than use application processes. This differs multiple terms in a single discussion sharply from what a Linux user would we will use just the CNOS term to avoid expect on a standard server. In confusion. This talk was broken into a general, talking about CNOS and Linux number of sections. We will provide an appears to be less productive than overview of each section. talking about a different CNOS without explaining its derivation. We may The basis for a discussion about actually pursue this approach in the Compute Node Operating Systems, CNOS, future as it helps focus on the purpose is built around the software of the CNOS. Architecture of MPP style systems. The first segment of this paper will review There are some other alternative CNOSs the Architecture to re-establish the that could be used. Several are either importance of the CNOS in the System under development or currently more and help to define the functionality prototypes than a kernel Cray could use that is required of the CNOS. in a production environment. However, there are good reasons to track the The next section will describe a research and development in this area. rationale and set of requirements for a Promising developments might be useful CNOS. This project was started because in the future and should not be there is a set of new hardware ignored. We will review a few of these requirements in the future that will be CNOS possibilities. difficult to meet with the current CNOS. There is also a set of software Finally, we will discuss the current features that are requested to provide status of the project and the next a broader scope of applications for XT. steps that are being pursued. The These software features are considered project does have some commitments and difficult to add to the current CNOS. these will be ddiscussed. Together these new requirements started CUG 2006 Proceedings 1 of 8 2. A Review of XT System Architecture •Provides limited set of system call functions – most system calls forwarded The basic organization of XT is very to Service nodes familiar to anyone who has worked on an MPP in the past decades. An MPP is a •Small memory footprint and CPU form of distributed system where the utilization (when not servicing operating system services are provided requests) from a set of nodes that have complete systems installed including the servers Overall, a Compute Node Operating for all the distributed services. These System takes little of the processor nodes are called Service Nodes in the time available and performs the minimum XT Architecture. These nodes are set of functions required to support specialized to provide specific application process(es) assigned to the functions and to scale specific node. All of this ensure the forward functions, such as the number of login progress of a distributed application – nodes to scale the access for which is the purpose of the system. individual users or the number of I/O nodes to scale the access and quantity of disks for the system. The Compute 4. Rationale for a Different Compute Node Nodes, where all the user application processes are run, have a minimal Operating System kernel that provides a small set of There are three basic components of the functions and uses the Service Nodes rationale for a different CNOS – for all the application processes’ changes in processor/node technology, service requests – such as I/O. new features and functions required in the software, and the ability to 3. Compute Node Definition leverage a different CNOS between multiple Cray products. A definition of a Compute Node is often based on describing what the node does There are changes in Socket technology not do as opposed to what the node that are adding more and more cores per does. This is because a Compute Node is socket. The ability to take advantage designed to provide support only for an of these additional cores does factor application process and not for all the into a decision about a new CNOS. services that a normal operating system might provide. For instance, a Compute There are a number of applications that Node will support virtual memory would benefit from some additional CNOS because all processors today implement functionality. These include, memory support using Virtual memory networking access, support for new or techniques. However, Compute Node different file systems, support for Operating Systems will not support Global Arrays, and support for OpenMP. Demand Paging because the cost of the Support for OpenMP is obviously related I/O and the performance lost for a to the number of cores available in a distributed application because of the socket. OpenMP support would be limited time involved in managing the demand to within a node. paging. Cray has other products that use A Compute Node has the following Compute Node style Operating Systems. characteristics: It would be a benefit to Cray to be able to share the development of a CNOS •Provides support for the machine across several products. It would also dependent aspects of the node help move Cray software towards the longer-term goal of Adaptive •Supports the High Speed network (HSN) SuperComputing if the CNOS could be connection – HSN is “allocated” to operated and managed in the same way application usage across products. •Provides support for application processes allocated to the node CUG 2006 Proceedings 2 of 8 the application process. The following 5. Cray Requirements for a New CNOS diagram shows the structure of Catamount. The requirements for a new CNOS are based on the current CNOS, Catamount. Catamount has been successful at setting the standard by which a new CNOS will be judged. The areas Catamount has most influenced are boot time, application start time, and I/O performance. Boot time is an area of influence because Catamount can boot on several thousand nodes in seconds. The second major area of influence is application start time, because the start up time of an application is also a matter of small numbers of seconds for thousands of nodes. The area of I/O performance is a bar set by Catamount that any new CNOS must clear. Catamount support for I/O is sufficient for most applications and scales well. Catamount has also had influence in the number of nodes a CNOS can be expected to support. By supporting more than 10 thousand nodes the expectation is that Catamount memory footprint is minimal. a compute node operating system should Much of the space not left for user be able to support 30 thousand nodes. process(es) is buffer space for Portals. The following graph shows a There are features that are required or 4GB node and the Catamount kernel with desired by customers for their all components broken down to show applications that add to the space used. Obviously the majority of capabilities now supplied by Catamount. the space is left to the user These are support for sockets or process(es). networking, support for SHMEM, UPC, CAF, and Global arrays, signal support, and as already mentioned support for OpenMP. Each of these additional Catamount Memory Usage features brings some issues as well. Support for networking means that each Compute Node will have to have an IP address if the implementation adds a network stack to each compute node. 1 page tables 1.3 QK text and data 1.7 alignment pool (page tables) 1 I/O mapping 10 Page tables (4GB node) 10 Network buffer 6. Catamount Review 64 Shared Memory and Event Queues 1 Alignment space 2 PCT code and data The Catamount kernel is currently used 2 Alignment space 2 PCT Stack on all XT3 systems. The basic structure 16 PCT Buffers 3 mimimum program and runtime librart of Catamount is similar to many 3885 User memory operating systems. There is a low level machine dependent kernel which supports the hardware and system call functionality. There is a set of libraries and support services. In the case of Catamount the support services 7. Catamount Positives are reduced to just the Process Control Thread (PCT) which provides support for Catamount obviously has positive attributes.

Load more