Making Applications Mobile Under Linux

Making Applications Mobile Under Linux Cédric Le Goater, Daniel Lezcano, Clément Calmels IBM France {clg, dlezcano, clement.calmels}@fr.ibm.com Dave Hansen, Serge E. Hallyn Hubertus Franke IBM Linux Technology Center IBM T.J. Watson Research Center {haveblue, serue}@us.ibm.com [email protected] Abstract a failure, a system upgrade, or a need to rede- ploy hardware resources. This ability to checkpoint context is most common in what is re- Application mobility has been an operating sys- ferred to as the High Performance Computing tem research topic for many years. Many (HPC) environment, which is often composed approaches have been tried and solutions are of large numbers of computers working on a found across the industry. However, perfor- distributed, long running computation. The ap- mance remains the main issue and all the ef- plications often run for days or weeks at a time, forts are now focused on performant solutions. some even as long as a year. In this paper, we will discuss a prototype which minimizes the overhead at runtime and the Even outside the HPC arena, there are many amount of application state. We will examine applications which have long start up times, constraints and requirements to enhance perfor- long periods of processing configuration files, mance. Finally, we will discuss features and en- pre-computing information and so on. Histor- hancements in the Linux kernel needed to im- ically, emacs was built with a script which in- plement migration of applications. cluded undump—the ability to checkpoint the full state of emacs into a binary which could then be started much more quickly. Some enter- 1 Introduction and Motivation prise class applications have thirty minute start up times, and those applications continue to build complex context as they continue to run. Applications increasingly run for longer periods of time and build more context over time as Increasingly we as users tend to expect that our well. Recovering that context can be time con- applications will perform quickly, start quickly, suming, depending on the application, and usu- re-start quickly on a failure, and be always ally requires that the application be re-run from available. However, we also expect to be able the beginning to reconstruct its context. A few to upgrade our operating system, apply secu- applications now provide the ability to check- rity fixes, add components, memory, some- point their data or context to a file, enabling that times even processing power without losing all application to be restarted later in the case of of the context that our applications have ac- 348 • Making Applications Mobile Under Linux quired. Application virtualization is a means of ab- stracting, or virtualizing, the software resources This paper discusses a generic mechanism for of the system. These include such things as pro- saving the state of an application at any point, cess id’s, IPC ids, network connections, mem- with the ability to later restart that applica- ory mappings, etc. It is also a means to contain tion exactly where it left off. This ability to and isolate resources required by the applica- save status and restart an application is typi- tion to enable its mobility. Compared to the cally referred to as checkpoint/restart, abbre- virtual machine approach, application virtual- viated throughout as CPR. This paper focuses ization approach minimizes the state of the ap- on the key areas for allowing applications to plication to be transferred and also allows for a be virtualized, simplifying the ability to check- higher degree of resource sharing between ap- point and later restart an application. Further, plications. On the other hand, it has limited the technologies covered here would allow ap- fault containment, when compared to the vir- plications to potentially be restarted on a differ- tual machine approach. ent operating system image than the one from which it was checkpointed. This provides the We built a prototype, called MCR, by modify- ability to move an application (or even a set of ing the Linux kernel and creating such a layer applications) dynamically from one machine or of containment. They are various other projects virtual operating system image to another. with similar goals, for instance VServer [8] and OpenVZ [7] and dated Linux implementa- Once the fundamental mechanisms are in tion of BSD Jails [4]. In this paper, we will place, this technology can be used for such describe our experiences from implementing more advanced capabilities such as check- MCR and examine the many communalities of pointing a cluster wide application—in other these projects. words synchronizing and stopping a coordi- nated, distributed applications, and restarting them. Cluster wide CPR would allow a site 2 Related Work in CPR administrator to install a security update or perform scheduled maintainance on the entire cluster without impacting the application run- CPR is theoretically simple. Stop execution ning. of the task and store the state of all memory, registers, and other resources. To restart, Also, CPR would enable applications to be reload the executable image, load the state moved from host to host depending on sys- saved during the checkpoint, and restart exe- tem load. For instance, an overloaded machine cution at the location indicated by the instruc- could have its workload rebalanced by moving tion pointer register. In practice, complications an application set from one machine to another arise due to issues like inter-processes sharing, that is otherwise underutilized. Or, several sys- security implications, and the ways that the ker- tems which are underloaded could have their nel transparently manages resources. This sec- applications consolidated to a single machine. tion groups some of the existing solutions and CPR plus migration will henceforth be referred reviews their shortcomings. to as CPRM. Virtual machines control the entire system Most of the capabilities we’ve highlighted here state, making CPR easy to implement. The are best enabled via application virtualization. state of all memory and resources can simply be 2006 Linux Symposium, Volume One • 349 stored into a file, and recreated by the machine cess’ original process id. This method is also emulator or the operating system itself. In- potentially very inefficient as described in Sec- deed, the two most commonly mentioned VMs, tion 5.2. The user space approaches also fall VMware [9] and Xen [10], both enable live mi- short by requiring applications to be rewritten gration of their guest operating systems. The and by exhibiting poor resource sharing. drawbacks of CPRM of an entire virtual machine is the increased overhead of dealing with CPR becomes much easier given some help all resources defining the VM. This can make from the kernel. Kernel-based CPR solutions the approach unsuitable for load balancing ap- include zap [12], crak [2], and our MCR pro- plications, since a requirement to add the over- totype. We will be analyzing MCR in detail in head of a full VM and associated daemons to Section 4, followed by the requirements for a each migrateable application can have tremen- consolidated application virtualization and mi- dous performance implications. This issue is gration kernel approach. further explored in Section 6. A more lighter weight CPRM approach can be achieved by isolating applications, which 3 Concepts and principles is predicated on the safe and proper isolation and migration of its underlying resources. In A user application is a set of resources—tasks, general, we look at these isolated and migrate- files, memory, IPC objects, etc.—that are ag- able units as containers around the relevant pro- gregated to provide some features. The general cesses and resources. We distinguish conceptu- concept behind CPR is to freeze the application ally system containers, such as VServer [8] or and save all its state (in both kernel and user OpenVZ [7], and application containers, such spaces) so that it can be resumed later, possibly as Zap [12] and our own prototype MCR. Since on another host. Doing this transparently with- containers share a single OS instance, many re- out any modification to the application code is sources provided by the OS must be specially quite an easy task, as long as you maintain a isolated. These issues are discussed in detail single strong requirement: consistency. in Section 5. Common to both container approaches is their requirement to be able to CPR an isolated set of individual resources. 3.1 Consistency For many applications CPR can be completely achieved from user space. An example imple- The kernel ensures consistency for each re- mentation is ckpt [5]. Ckpt teaches applica- source’s internal state and also provides system tions to checkpoint themselves in response to identifiers to user space to manipulate these re- a signal by either preloading a library, or in- sources. These identifiers are part of the ap- jecting code after application startup. The new plication state, and as such are critical to the code, when triggered, writes out a new exe- application’s correct behavior. For example, a cutable file. This executable reloads the appli- process waiting for a child will use the known cation and resets its state before continuing exe- process id of that child. If you were to re- cution where it left off. Since this method is im- sume a checkpointed application, you would plemented with no help from the kernel, there recreate the child’s pid to ensure that the par- is state which cannot easily be stored, such as ent would wait on the correct process. Unfortu- pending signals, or recreated, such as a pro- nately, Linux does not provide such control on 350 • Making Applications Mobile Under Linux how pids, or many other system identifiers, are tualization space according to the implied con- associated with resources.

Making Applications Mobile Under Linux

Shorten Device Boot Time for Automotive IVI and Navigation Systems

Proceedings of the Linux Symposium Volume

Linux Kernel User Documentation V4.20.0

Optimization Techniques for Memory Virtualization-Based Resource Management

Power Management on DICE Desktops Chris Cooke [email protected] What I’Ll Talk About

Reducing the Boot Time of Embedded Linux Systems

Linux Kernel Power Management

Cloning L4linux

Openvz Forum

Swsusp and U-Boot

Linux Power Documentation

Kernel Configuration Option Reference