Multiple Instances of the Global Linux Namespaces
Total Page:16
File Type:pdf, Size:1020Kb
Multiple Instances of the Global Linux Namespaces Eric W. Biederman Linux Networx [email protected] Abstract to create a clean implementation that can be merged into the Linux kernel. The discussion has begun on the linux-kernel list and things are Currently Linux has the filesystem namespace slowly progressing. for mounts which is beginning to prove use- ful. By adding additional namespaces for pro- cess ids, SYS V IPC, the network stack, user ids, and probably others we can, at a trivial 1 Introduction cost, extend the UNIX concept and make novel uses of Linux possible. Multiple instances of a namespace simply means that you can have two 1.1 High Performance Computing things with the same name. For servers the power of computers is growing, and it has become possible for a single server I have been working with high performance to easily fulfill the tasks of what previously re- clusters for several years and the situation is quired multiple servers. Hypervisor solutions painful. Each Linux box in the cluster is ref- like Xen are nice but they impose a perfor- ered to as a node, and applications running or mance penalty and they do not easily allow re- queued to run on a cluster are jobs. sources to be shared between multiple servers. Jobs are run by a batch scheduler and, once For clusters application migration and preemp- launched, each job runs to completion typically tion are interesting cases but almost impossibly consuming 99% of the resources on the nodes hard because you cannot restart the application it is running on. once you have moved it to a new machine, as usually there are resource name conflicts. In practice a job cannot be suspended, swapped, or even moved to a different set of For users certain desktop applications interface nodes once it is launched. This is the oldest with the outside world and are large and hard and most primitive way of running a computer. to secure. It would be nice if those applications Given the long runs and high computation over- could be run on their own little world to limit head of HPC jobs it isn’t a bad fit for HPC en- what a security breach could compromise. vironments, but it isn’t a really good fit either. Several implementations of this basic idea have Linux has much more modern facilities. What been done succsessfully. Now the work is prevents us from just using them? 102 • Multiple Instances of the Global Linux Namespaces HPC jobs currently can be suspended, but that uses. Process ids, SYS V IPC identifiers, file- just takes them off the cpu. If you have suffi- names, and the like. When you restore an appli- cient swap there is a chance the jobs can even cation on another machine there is no guarantee be pushed to swap but frequently these applica- that it can reuse the same global identifiers as tion lock pages in memory so they can be en- another process on that machine may be using sured of low latency communication. those identifiers. The key problem is simply having multiple ma- There are two general approaches to solving chines and multiple kernels. In general, how this problem. Modifying things so these global to take an application running under one Linux machine identifiers are unique across all ma- kernel and move it completely to another kernel chines in a cluster, or modifying things so these is an unsolved problem. machine global identifiers can be repeated on a single machine. Many attempts have been The problem is unsolved not because it is fun- made to scale global identifiers cluster-wide— damentally hard, but simply because it has Mosix, OpenSSI, bproc, to name a few—and not be a problem in the UNIX environment. all of them have had to work hard to scale. So Most applications don’t need multiple ma- I choose to go with an implementation that will chines or even big machines to run on (espe- easily scale to the largest of clusters, with no cially with Moore’s law exponentially increas- communication needed to do so. ing the power of small machines). For many of the rest the large multiprocessor systems have This has the added advantage that in a cluster been large enough. it doesn’t change the programming model pre- sented to the user. Just some machines will What has changed is the economic observation now appear as multiple machines. As the rise that a cluster of small commodity machines is of the internet has shown building applications much cheaper and equally as fast as a large su- that utilize multiple machines is not foreign to percomputer. the rest of the computing world either. The other reason this problem has not been To make this happen I need to solve the solved (besides the fact that most people work- challenging problem of how to refactor the ing on it are researchers) is that it is not im- UNIX/Linux API so that we can have multi- mediately obvious what a general solution is. ple instances of the global Linux namespaces. Nothing quite like it has been done before so The Plan 9 inspired mount/filesystem name- you can’t look into a text book or into the space has already proved how this can be done archives of history and know a solution. Which and is slowly proving useful. in the broad strokes of operating system theory is a rarity. 1.2 Jails The hard part of the problem also does not lie in the obvious place people first look— how to save all of the state of an application. Instead, Outside the realm of high perfomance comput- the problem is how do you restore a saved ap- ing people have been restricting their server ap- plication so it runs successfully on another ma- plication to chroot jails for years. The problems chine. with chroot jails have become well understood and people have begun fixing them. First BSD The problem with restoring a saved applica- jails, and then Solaris containers are some of tion is all of the global resources an application the better known examples. 2006 Linux Symposium, Volume One • 103 Under Linux the open source community has cations when they log out. Vulnerable or un- not been idle. There is the linux-jail project, trusted applications like a web browser or an irc Vserver, Openvz, and related projects like client could be contained so that if they are at- SELinux, UML, and Xen. tacked all that is gained is the ability to browse the web and draw pictures in an X window. Jails are a powerful general purpose tool use- ful for a great variety of things. In resource There would finally be answer to the age old utilization jails are cheap, dynamically load- question: How do I preserve my user space ing glibc is likely to consume more memory while upgrading my kernel? than the additional kernel state needed to track a jail. Jails allow applications to be run in sim- All this takes is an eye to designing the inter- ple stripped down environments, increasing se- faces so they are general purpose and nestable. curity, and decreasing maintenance costs, while It should be possible to have a jail inside a jail leaving system administrators with the familiar inside a jail forever, or at least until there don’t UNIX environment. exist the resources to support it. The only problem with the current general pur- pose implementation of jails under Linux is nothing has been merged into the mainline ker- 2 Namespaces nel, and the code from the various projects is not really mergable as it stands. The closest I have seen is the code from the linux-jail project, How much work is this? Looking at the ex- and that is simply because it is less general pur- isting patches it appears that 10,000 to 20,000 pose and implemented completely as a Linux lines of code will ultimately need to be touched. security module. The core of Linux is about 130,000 lines of code, so we will need to touch between 7% and Allowing multiple instances of global names 15% of the core kernel code. Which clearly which is needed to restore a migrated applica- indicates that one giant patch to do everything tion is a more constrained problem than that of even if it was perfect would be rejected simply simply implementing a jail. But a jail that en- because it is too large to be reviewed. sures you can have multiple instances of all of the global names is a powerful general purpose In Linux there are multiple classes of global jail that you can run just about anything in. So identifiers (i.e. process id, SYS V IPC keys, the two problems can share a common kernel user ids). Each class of identifier can be thought solution. of living in its own namespace. 1.3 Future Directions This gives us a natural decomposition to the problem, allowing each namespace to be mod- ified separately so we can support multiple in- A cheap and always available jail mecha- stances of that namespace. Unfortunately this nism is also potentially quite useful outside also increases the difficulty of the problem, as the realm of high performance computing we need to modify the kernel’s reporting and and server applications. A general purupose configuration interfaces to support multiple in- checkpoint/restart mechanism can allow desk- stances of a namespace instead of having them top users to preserve all of their running appli- tightly coupled.