Seminar HPC Trends Winter Term 2017/2018 New Operating System Concepts for High Performance Computing

Fabian Dreer Ludwig-Maximilians Universit¨atM¨unchen [email protected] January 2018

Abstract 1 The Impact of System Noise

When using a traditional operating system kernel When running large-scale applications on clusters, in high performance computing applications, the the noise generated by the operating system can cache and interrupt system are under heavy load by greatly impact the overall performance. In order to e.g. system services for housekeeping tasks which is minimize overhead, new concepts for HPC OSs are also referred to as noise. The performance of the needed as a response to increasing complexity while application is notably reduced by this noise. still considering existing API compatibility. Even small delays from cache misses or interrupts can affect the overall performance of a large scale In this paper we study the design concepts of het- application. So called jitter even influences collec- erogeneous kernels using the example of mOS and tive communication regarding the synchronization, the approach of library operating systems by ex- which can either absorb or propagate the noise. ploring the architecture of Exokernel. We sum- Even though asynchronous communication has a marize architectural decisions, present a similar much higher probability to absorb the noise, it is project in each case, Interface for Heterogeneous not completely unaffected. Collective operations Kernels and Unikernels respectively, and show suffer the most from propagation of jitter especially benchmark results where possible. when implemented linearly. But it is hard to anal- yse noise and its propagation for collective oper- ations even for simple algorithms. Hoefler et al. Our investigations show that both concepts have [5] also suggest that “at large-scale, faster networks a high potential, reduce system noise and outper- are not able to improve the application speed sig- form a traditional in tasks they are already nificantly because noise propagation is becoming a able to do. However the results are only proven by bottleneck.” [5] micro-benchmarks as most projects lack the matu- rity for comprehensive evaluations at the point of Hoefler et al. [5] also show that synchronization this writing. of point-to-point and collective communication and

1 OS noise are tightly entangled and can not be dis- geneous architectures with different kinds of mem- cussed in isolation. At full scale, when it becomes ory and multiple memory controllers, like the recent the limiting factor, it eliminates all advantages of a Intel Xeon Phi architecture, or chips with different faster network in collective operations as well as full types of cores and coprocessors, specialized kernels applications. This finding is crucial for the design might help to use the full potential available. of large-scale systems because the noise bottleneck We will first have an in-depth look at mOS as at must be considered in system design. this example we will be able to see nicely what as- Yet very specialized systems like BlueGene/L [8] pects have to be taken care of in order to run dif- help to avoid most sources of noise [5]. Ferreira ferent kernels on the same node. et al. [3] show that the impact is dependent on parameters of the system, the already widely used concept of dedicated system nodes alone is not suf- 2.1 mOS ficient and that the placement of noisy nodes does matter. To get a light and specialized kernel there are two methods typically used: The first one is to take a generic Full-Weight-Kernel (FWK) and stripping This work gives an overview of recent developments away as much as possible; the second one is to build and new concepts in the field of operating systems a minimal kernel from scratch. Either of these two for high performance computing. The approaches approaches alone does not yield a fully Linux com- described in the following sections are, together patible kernel, which in turn won’t be able to run with the traditional Full-Weight-Kernel approach, generic Linux applications [4]. the most common ones. The rest of this paper is structured as follows. We Thus the key design parameters of mOS are: will in the next section introduce the concept of run- full linux compatibility, limited changes to Linux, ning more than one kernel on a compute or service and full Light-Weight-Kernel scalability and perfor- node while exploring the details of that approach at mance, where performance and scalability are pri- the example of mOS and the Interface for Hetero- oritized. geneous Kernels. Section 3 investigates the idea of To avoid the tedious maintenance of patches to the library operating systems by having a look at Ex- , an approach inspired by FUSE has okernel, one of the first systems designed after that been taken. Its goal is to provide internal APIs concept, as well as a newer approach called Uniker- to coordinate resource management between Linux nels. Section 4 investigates Hermit Core which is a and Light-Weight-Kernels (LWK) while still allow- combination of the aforementioned designs. After ing each kernel to handle its own resources indepen- a comparison in Section 5 follows the conclusion in dently. Section 6. “At any given time, a sharable resource is either private to Linux or the LWK, so that it can be managed directly by the current owner.” [11] The 2 Heterogeneous Kernels resources managed by LWK must meet the follow- ing requirements:i) to benefit from caching and The idea about the heterogeneous kernel approach reduced TLB misses, memory must be in phys- is to run multiple different kernels side-by-side. ically contiguous regions, ii) except for the ones Each kernel has its spectrum of jobs to fulfill and its of the applications no interrupts are to be gener- own dedicated resources. This makes it possible to ated, iii) full control over must be pro- have different operating environments on the parti- vided, iv) memory regions are to be shared among tioned hardware. Especially with a look to hetero- LWK processes,v) efficient access to hardware must

2 be provided in userspace, which includes well-per- Linux [11]. forming MPI and PGAS runtimes, vi) flexibility in The capability to direct system calls to the cor- allocated memory must be provided across cores rect implementor (referred to as triage). The idea (e.g. let rank0 have more memory than the other behind this separation is that performance critical ranks) and, vii) system calls are to be sent to the system calls will be serviced by the LWK to avoid Linux core or operating system node. jitter, less critical calls, like signaling or /proc re- mOS consists of six components which will be in- quests handles the local Linux kernel and all opera- troduced in one paragraph each: tions on the file system are offloaded to the operat- ing system node (OSN). But this hierarchy of sys- According to Wisniewski et al. [11], the Linux run- tem call destinations does of course add complexity ning on the node can be any standard HPC Linux, not only to the triaging but also to the synchroniza- configured for minimal memory usage and without tion of the process context over the nodes [11]. disk paging. This component acts like a service providing Linux functionality to the LWK like a An offloading mechanism to an OSN. To remove TCP/IP stack. It takes the bulk of the OS adminis- the jitter from the compute node, avoid cache pol- tration to keep the LWK streamlined, but the most lution and make better use of memory, using a dedi- important aspects include: boot and configuration cated OSN to take care of I/O operations is already of the hardware, distribution of the resources to the an older concept. Even though the design of mOS LWK and provision of a familiar administrative in- would suggest to have file system operations han- terface for the node (e.g. job monitoring). dled on the local linux, the offloading mechanism improves resource usage and client scaling [11]. The LWK which is running (possibly in multiple instantiations) alongside the compute node Linux. The capability to partition resources is needed for The job of the LWK is to provide as much hardware running multiple kernels on the same node. Mem- as possible to the applications running, as well as ory partitioning can be done either statically by managing its assigned resources. As a consequence manipulating the memory maps at boot time and the LWK does take care of memory management registering reserved regions; or dynamically mak- and scheduling [11]. ing use of hotplugging. These same possibilities are valid for the assignment of cores. Physical devices A transport mechanism in order to let the Linux will in general be assigned to the Linux kernel in and LWK communicate with each other. This order to keep the LWK simple [11]. mechanism is explicit, labeled as function ship- ping, and comes in three different variations: via We have seen the description of the mOS architec- shared memory, messages or inter-processor inter- ture which showed us many considerations for run- rupts. For shared memory to work without major ning multiple kernels side-by-side. As the design modifications to Linux, the designers of mOS de- of mOS keeps compatibility with Linux core data cided to separate the physical memory into Linux- structures, most applications should be supported. managed and LWK-managed partitions; and to al- This project is still in an early development stage, low each kernel read access to the other’s space. therefore an exhaustive performance evaluation is Messages and interrupts are inspired by a model not feasible at the moment. generally used by device drivers; thus only send- ing an interrupt in case no messages are in the queue, otherwise just queueing the new system call 2.2 Interface for Heterogeneous request which will be handled on the next poll. This Kernels avoids floods of interrupts in bulk-synchronous pro- gramming. To avoid jitter on compute cores, com- This project is a general framework with the goal munication is in all cases done on cores running to ease the development of hybrid kernels on many-

3 core and accelerator architectures; therefore at- “While IHK/McKernel is in a more advanced phase tached (coprocessor attached to multi-core host) than mOS at this moment, both projects are too and builtin (standalone many-core platform) con- early in their development cycle for doing an ex- figurations are possible. It follows the two design haustive performance study.” [4] principles of keeping the interface minimal on the To still have an idea what can be expected from one hand, and providing a requisite utility library the heterogeneous kernel approach, Figure 1 shows for kernels on the other hand. benchmark results from FusedOS [9], which was the Similar to mOS yet less strict, IHK defines the first prototype incorporating the idea. The work of requirements for a hybrid kernel approach to be Wisniewski et al. [11] is also based on FusedOS, i) management of kernels and an interface to al- therefore the overall tendency should be compara- locate resources, ii) resource partitoning and, iii) a ble. communication mechanism among kernels. In IHK The x-axis of Figure 1 shows time while the y-axis it is assumed that one kernel manages at most one shows the number of iterations performed during a processor. certain quantum of time. We see the performance of the FusedOS PECs (Power-Efficient-Cores) — The framwork consists of the following components: which can be thought of as the LWKs of mOS — in IHK-Master has the ability to boot other kernels. purple above the red Linux. High-frequency noise The mechanisms needed to do so are the same as as well as occasional large spikes can be seen in discussed in the architecture description of mOS the Linux curve. Especially these large spikes are about partitioning resources. The master kernel detrimental to the performance on large-scale clus- also makes the user interface available. ters. In comparison, the FusedOS PEC curve has the form of a straight line, thus not displaying any IHK-Slave defines an interface for slave kernels to spikes; for that reason we would tend to believe that work with each other. Kernels of this type only run the behavior of the application running on FusedOS in their assigned space, retrieve that information is deterministic [9]. from, and are booted by, the master kernel. To sum up, even though some prototypes of het- The IHK-IKC (communication model) provides erogeneous kernels are still in their early phases, rudimentary functions for the use of channels, the concept itself looks promising. Light-Weight- where a channel is a pair of message queues. Mas- Kernels run almost completely deterministically ter and slave kernels have an interface to use the and show superior noise properties. IKC. This slightly differs from what we’ve seen in mOS, as IHK provides a library for using inter- rupts and queuing where the exact implementation 3 Library Operating Systems is free. The included IKC library provides func- tions to setup a client-server layout among the ker- The job of a traditional OS is to abstract away the nels with a master channel to share control mes- hardware and isolate different processes, owned by sages [10]. potentially multiple users, from one another, as well For easier development of LWKs, a bootstrap li- as from the kernel. This abstraction is commonly brary is part of IHK. Currently an implementation realized by the differentiation, and therefore sepa- for x86 64 is available. The delegation of system ration, into kernel space and . calls works in concept exactly like we’ve seen in But as this makes the abstraction fix, this concept mOS, with the difference that there is no operating can also limit performance and freedom of imple- system node where file system operations can be mentation. As a result applications are denied the sent to. possibility of domain-specific optimizations; this

4 presetting also discourages changes to the abstrac- of an exokernel the decisions to make are all about tions. resource allocation and are handled in a traditional Conceptually a library operating system (libOS) manner with e.g. reservation schemes or quotas. In is built around an absolutely minimalistic kernel, the following paragraphs we will have a closer look which exports all hardware resources directly via at some other mechanisms of the exokernel [2] ar- a secure interface. The operating system, which chitecture. implements higher level abstractions, uses this in- Exokernel uses a technique referred to as secure terface. bindings in order to multiplex resources so that Therefore the (untrusted) OS lives entirely in user they are protected against unintended use by dif- space, which effectively moves the whole resource ferent libOSs. The point in time where a libOS management to user space. This has the advantage requests allocation of a resource is called bind time, that parts like e.g. virtual memory are user defined subsequent use of that resource is known as access and offer more flexibility as well as specialization time. By doing authorization checks only at bind to the application. Another strong point of this time this mechanism improves efficiency. Another design is the reduction of context switches for priv- aspect is that this way of handling checks strength- ileged operations the kernel would normally have to ens the separation of management and protection as execute [2]. the kernel does not need to know about the complex In contrast to the previously described heteroge- semantics of resources at bind time. Secure bind- neous kernel approach, libraryOS concepts work ings are implemented with hardware mechanisms, with exactly one kernel which is then used by mul- caching in software and the download of application tiple libOSs. Similar to our investigation of the het- code into the kernel. As this downloading mecha- erogeneous kernel approach, we will discuss first the nism is not as common as the other two, an example concept of Exokernel in detail, then take a look at can be found in the paragraph about network mul- the younger variants named Unikernels. tiplexing in this section. The multiplexing of physical memory is done with 3.1 Exokernel secure bindings as well. When the libOS requests a page of memory, the kernel creates a binding for The challenge for the exokernel approach is to give that page with additional information on the ca- the libOSs maximal freedom while still secluding pabilities of this page. The owner of a page is al- them in such a way that they do not affect each lowed to manipulate its capabilities, and these ca- other. A low-level interface is used to separate pro- pabilities are used to determine the access rights tection from management. The kernel performs the for that memory page. Therefore applications can important tasks ofi) keeping track of resource own- grant memory access to other applications which ership, ii) guarding all resource binding points and makes resource sharing easier. usage as well as, iii) revoking resource access. Multiplexing of network resources efficiently is In order to protect resources without managing rather hard with the design philosophy requiring them at all, the designers of Exokernel decided to separation of protection, which includes delivering make the interface in such a way that all hardware packets to the correct libOS, and management, e.g. resources could be accessed as directly as possible creating connections and sessions. To deliver the because the libOS knows best about its needs. This packets correctly it is necessary to understand the is supposed to be possible by exposing allocation, logic of their contents. This can be done by either physical names, bookkeeping data structures and requesting each possible recipient (every libOS), or, revocation. Additionally a policy is needed to han- more efficiently, with the use of downloaded appli- dle competing requests of different libOSs. In case cation code in the packet filter to handle the packet.

5 This code can be run with immediate execution on single-address-space applications that are, at kernel events which avoids costly context switches compile-time, specialized into standalone kernels. or otherwise required scheduling of each applica- To make the deployment of unikernels easier, the tion. As this code is untrusted, it should be com- configuration is integrated into the compilation pro- bined with security mechanisms like sandboxing [2]. cess. In comparison, Linux distributions rely e.g. Finally there must be a way to reclaim allocated on complex shell scripting to pack components into resources. For this a resource revocation protocol packages. When deployed to e.g. a cloud platform, has been implemented. Typically a kernel does not unikernels get sealed against modifications. In re- inform the OS when physical memory is allocated turn they offer significant reduction in image size, or deallocated. But the design of exokernels strives improved efficiency and security, and should also to give the libOS the most direct access to hard- reduce operational costs [7]. ware possible. Therefore the revocation is visible In the following paragraphs we will now have a for most resources which allows the libOS to react closer look at the architecture of such a single- to a revocation request accordingly by e.g. saving purpose application, often referred to as an appli- a certain state. In cases where the application be- ance. comes unresponsive there is an abort protocol which can be understood as issuing orders instead of re- Application configurations are usually stored in quests. Still if the libOS cannot comply, secure dedicated files, one for each service. The view of bindings to allocated resources must be broken by services in the unikernel architecture is not the one force. To complement this behavior, the libOS ac- of independent applications but are seen as libraries tively releases resources no longer needed. of one single application. As a result the configura- tion is either done at build time for static param- In sum we have seen that exokernel approaches do eters or with library calls for dynamic ones. This not provide the same functionality as a traditional concept eases the configuration of complex layouts OS, but offer a way of running specialized systems and also makes configurations programmable, ana- with high performance implemented mostly in user lyzable and explicit. space. Engler et al. [2] already show that the con- cept of exokernels and their implementation can be Another advantage of linking everything as a li- very efficient, and that it is efficient as well to build brary, even functions that would normally by pro- traditional OS abstractions on application level. vided by an operating system, results in very com- Yet, as the experiments are quite a few years old pact binary images. The system can by optimized already, we will investigate more recent approaches as a whole without including unnecessary function- based on this concept in the next subsection. alities. And the static evaluation of linked configu- rations helps to eliminate dead code segments. This compile time specialization is also a measure of se- 3.2 Unikernels curity, especially effective in the combination with the isolating and possibly, as it is the As it is difficult to support a wide range of case for the work of Madhavapeddy et al. [7], type- real-world hardware with the exokernel approach, safe languages. libOSs have never been widely deployed. This prob- A special protection mechanism made possible by lem can be solved with the use of hy- the design of unikernels is sealing. When the appli- pervisors which are already very popular today es- ance starts up, it allocates all the memory it needs pecially in cloud environments [7]. in a set of page tables with the policy that no page The key difference between the previously shown is writable and executable. After this a call to the exokernel architecture and a unikernel is that hypervisor is used to seal these pages, which in turn unikernels are single-purpose, single-image and makes the heap size fixed. The hypervisor has to be

6 100 80 60

40 Mirage 20 Linux native Linux PV 0

Cumulative frequency (%) 0 0.05 0.1 0.15 0.2 Jitter (ms) Figure 1: Fixed Time Quanta benchmark for Figure 2: Figure 2: CDF of jitter for 106 parallel Linux and FusedOS; adapted from Park et al. threads in Linux and Mirage OS; adapted from [9]. Madhavapeddy et al. [7]. modified in order to provide the sealing mechanism. by this design it provides “type-safety and static This action makes all code injection attacks inef- analysis at compile-time.” [1] fective. The use of this technique is optional. The second security mechanism for unikernels is possible because most of the time a reconfiguration requires recompilation. Therefore the address space layout 4 Hermit Core can be randomized at compile time [7]. Madhavapeddy et al. [7] show with their Mirage Hermit Core is the combination of the approaches prototype that CPU-heavy applications are not af- we have seen above. It combines a Linux kernel fected by the virtualization “as the hypervisor ar- with a unikernel and promises maximum perfor- chitecture only affects memory and I/O.” [7] mance and scalability. Common interfaces and non- performance critical tasks are realized by Linux [6]. According to Briggs et al. [1] the performance of But as this project is focused on HPC programming Mirage OS, the prototype of Madhavapeddy et al. models (e.g. MPI, OpenMP), performance has been [7], is not easily evaluated as e.g. the DNS server as improved in exchange for full POSIX compliance. well as the HTTP server are still example skeletons. They are missing important features or are unsta- Hermit Core is extremely versatile. It can be run ble. But the evaluation of the Mirage DNS server as a heterogeneous kernel, standalone like a pure nevertheless showed that it is able to handle much unikernel, in a virtualized environment as well as higher request rates than a regularly used one on directly on hardware as single- or multi-kernel. It Linux. Mirage OS might need some more time to can be booted without a Linux kernel directly by mature but shows other advantageous results, such virtualization proxies, but in multi-kernel mode a as lower and also more predictable latency which special loader is required. can be seen in Figure 2 [7]. When running as a multi-kernel, one instance of In conclusion, the drawback of the Mirage proto- Hermit Core runs on each NUMA node abstracting type is that it only runs specifically ported applica- this fact so the application is presented a traditional tions written in OCaml, like the system itself. But UMA architecture.

7 ·105 ·105 ·105 1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Gap in T icks Gap in T icks Gap in T icks 0.2 0.2 0.2

0 0 0 0 100 200 300 0 100 200 300 0 100 200 300 T ime in s T ime in s T ime in s (a) Linux (b) Hermit Core (c) Hermit Core without IP thread

Figure 3: Scatter plots observing OS noise in different configurations adapted from Lankes et al. [6].

The communication between the kernel instances maintenance tasks. are realized either by using a virtual IP device or via In comparison to a standard Linux, which can be a message passing library. Also the communication seen in Figure 3a, the Hermit Core with a network- with the Linux kernel in the heterogeneous setup ing thread, seen in Figure 3b, shows significantly happens over an IP connection as well. smaller gaps than the Linux. But Hermit Core is Just as in the unikernel design previously seen, designed to spawn only one thread for handling IP some parameters, like the number of system packets, therefore all computation threads run with threads, are set at compile time. As a result internal a profile similar to Figure 3c which shows the small- data structures can be built as static arrays, which est noise distribution. Lankes et al. [6] also show provides fast accesses and good cache-usage. For that their prototype is approximately twice as fast, tasks like garbage collectors in the runtime of man- with regard to basic system services on the Intel aged languages it is still necessary for Hermit Core Haswell architecture, as Linux. to provide a scheduler. Yet the scheduling overhead We have seen Hermit Core, which is a versatile, is reduced by the fact that the Hermit Core kernel well performing symbiosis of Linux and unikernels does not interrupt computation threads. Computa- designed for HPC. The benchmark results still show tion threads run on certain cores and are not using OS noise to be present, but on a much smaller scale a timer which would be a source for OS noise. than on Linux. For building applications a slightly modified version of the GNU binutils and gcc is used with the build target x86 64-hermit. By this design, hermit core 5 Comparison applications can be built in any language supported by gcc. But it is even possible to use a different As the mOS project is still in a prototype phase, C-compiler by including the special Hermit Core and the Interface for Heterogeneous Kernels as well header files instead of the Linux ones. is in an early stage, the stability of the projects Figure 3 shows scatter plots from the Hourglass has still to show as they both progress. As the benchmark. The gaps in the execution time are concept of library operating systems is much older, used to indicate points in time where the operating the systems investigated in this work seem to be system took time from an application process for stable yet are facing different issues.

8 The development of applications to run on a het- up, both approaches show great promise for per- erogeneous kernel should be not much different than formance and the reduction of OS noise. Even for traditional Full-Weight-Kernel systems as both their combination is possible and the so constructed projects presented set Linux compatibility as one prototype system performs equally well. Yet all of their goals. Additionally the symbiotic system projects are missing comprehensive evaluation re- presents itself as a single system to applications. sults as a consequence of their youth. Heteroge- Even with a stable exokernel already present, the neous kernel concepts seem to have high poten- implementation of an application together with a tial yet are not mature enough to be considered specialized OS suited for it involves much more at the moment. Application development should manual work than with the heterogeneous kernel be straightforward and compatibility with already concept. Virtualization helps to cope with hard- existing ones should be provided. The concept of li- ware abstractions for the exokernel, but the libOS brary operating systems has been around for a long has to be written for every application. Hermit time and it might be a good option if the perfor- Core provides the combination of the aforemen- mance boost compensates the cost for virtualization tioned concepts by uniting a libOS in form of a layers, but more manual work is involved in order unikernel with a Full-Weight-Kernel. It makes it to write a tailored libOS in addition to the desired possible to use high-level languages for application application. A combination of both concepts is pos- development and provides an adapted software col- sible and seems to have excellent properties. lection with familiar build tools. If we take the performance evaluation of FusedOS into account, as mOS as well as IHK are not References ready yet for macro-benchmarks, the Light-Weight- Kernels run with much less jitter and deterministic [1] Ian Briggs, Matt Day, Yuankai Guo, Peter behavior. Results of the unikernel prototype Mi- Marheine, and Eric Eide. A performance eval- rage OS benchmarks are all at the micro-level as uation of unikernels. 2015. most applications for it are still stubs or have lim- ited functionality. Yet this approach as well shows [2] Dawson R Engler, M Frans Kaashoek, et al. a clear reduction in jitter and more predictable be- Exokernel: An operating system architecture havior can be expected. As can be seen in Figure 3, for application-level resource management, vol- Hermit Core offers a considerable reduction in OS ume 29. ACM, 1995. noise. Additionally the performance for basic sys- tem services and scheduling has been shown to be [3] Kurt B Ferreira, Patrick G Bridges, Ron higher compared to Linux. Brightwell, and Kevin T Pedretti. The im- pact of system design parameters on applica- tion noise sensitivity. Cluster computing, 16 6 Conclusion (1):117–129, 2013.

[4] Balazs Gerofi, Masamichi Takagi, Yutaka On the way to exascale computing there is a need Ishikawa, Rolf Riesen, Evan Powers, and for new concepts in HPC OS design in order to make Robert W Wisniewski. Exploring the design full use of the hardware. One step in this direction space of combining linux with lightweight ker- is the elimination of jitter. nels for extreme scale computing. In Pro- In this paper we introduced the two currently most ceedings of the 5th International Workshop on popular concepts for new operating system designs Runtime and Operating Systems for Supercom- focusing on high performance computing. To sum puters, page 5. ACM, 2015.

9 [5] Torsten Hoefler, Timo Schneider, and An- [10] Taku Shimosawa, Balazs Gerofi, Masamichi drew Lumsdaine. Characterizing the influ- Takagi, Gou Nakamura, Tomoki Shirasawa, ence of system noise on large-scale applica- Yuji Saeki, Masaaki Shimizu, Atsushi Hori, tions by simulation. In Proceedings of the 2010 and Yutaka Ishikawa. Interface for heteroge- ACM/IEEE International Conference for High neous kernels: A framework to enable hybrid Performance Computing, Networking, Storage os designs targeting high performance comput- and Analysis, SC ’10, pages 1–11, Washing- ing on manycore architectures. In High Perfor- ton, DC, USA, 2010. IEEE Computer Soci- mance Computing (HiPC), 2014 21st Interna- ety. ISBN 978-1-4244-7559-9. doi: 10.1109/SC. tional Conference on, pages 1–10. IEEE, 2014. 2010.12. URL https://doi.org/10.1109/ SC.2010.12. [11] Robert W Wisniewski, Todd Inglett, Pardo Keppel, Ravi Murty, and Rolf Riesen. mos: An [6] Stefan Lankes, Simon Pickartz, and Jens Bre- architecture for extreme-scale operating sys- itbart. Hermitcore: A unikernel for extreme tems. In Proceedings of the 4th International scale computing. In Proceedings of the 6th In- Workshop on Runtime and Operating Systems ternational Workshop on Runtime and Oper- for , page 2. ACM, 2014. ating Systems for Supercomputers, ROSS ’16, pages 4:1–4:8, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4387-9. doi: 10.1145/ 2931088.2931093. URL http://doi.acm.org/ 10.1145/2931088.2931093. [7] Anil Madhavapeddy, Richard Mortier, Char- alampos Rotsos, David Scott, Balraj Singh, Thomas Gazagnaire, Steven Smith, Steven Hand, and Jon Crowcroft. Unikernels: Li- brary operating systems for the cloud. In ACM SIGPLAN Notices, volume 48, pages 461–472. ACM, 2013. [8] Jos´e Moreira, Michael Brutman, Jose Cas- tano, Thomas Engelsiepen, Mark Giampapa, Tom Gooding, Roger Haskin, Todd Inglett, Derek Lieber, Pat McCarthy, et al. Designing a highly-scalable operating system: The blue gene/l story. In SC 2006 Conference, Proceed- ings of the ACM/IEEE, pages 53–53. IEEE, 2006. [9] Yoonho Park, Eric Van Hensbergen, Marius Hillenbrand, Todd Inglett, Bryan Rosenburg, Kyung Dong Ryu, and Robert W Wisniewski. Fusedos: Fusing lwk performance with fwk functionality in a heterogeneous environment. In Computer Architecture and High Perfor- mance Computing (SBAC-PAD), 2012 IEEE 24th International Symposium on, pages 211– 218. IEEE, 2012.

10