Single System Image in a Linux-Based Replicated Operating System Kernel

Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Binoy Ravindran, Chair Robert P. Broadwater Antonio Barbalace February 24, 2015 Blacksburg, Virginia Keywords: Linux, Multikernel, Thread synchronization, Signals, Process Management Copyright 2015, Akshay Giridhar Ravichandran Single System Image in a Linux-based Replicated Operating System Kernel Akshay Giridhar Ravichandran (ABSTRACT) Recent trends in the computer market suggest that emerging computing platforms will be increasingly parallel and heterogeneous, in order to satisfy the user demand for improved performance and superior energy savings. Heterogeneity is a promising technology to keep growing the number of cores per chip without breaking the power wall. However, existing system software is able to cope with homogeneous architectures, but it was not designed to run on heterogeneous architectures, therefore, new system software designs are necessary. One innovative design is the multikernel OS deployed by the Barrelfish operating system (OS) which partitions hardware resources to independent kernel instances that communi- cate exclusively by message passing, without exploiting the shared memory available amongst different CPUs in a multicore platform. Popcorn Linux implements an extension of the multikernel OS design, called replicated-kernel OS, with the goal of providing a Linux-based single system image environment on top of multiple kernels, which can eventually run on different ISA processors. A replicated-kernel OS replicates the state of various OS sub-systems amongst kernels that cooperate using message passing to distribute or access various services uniquely available on each kernel. In this thesis, we present mechanisms to distribute signals, namespaces, inter-thread syn- chronizations and socket state replication. These features are built on top of the existing messaging layer, process or thread migration and address space consistency protocol to pro- vide the application with an illusion of single system image and developers with the SMP programming environment they are most familiar with. The mechanisms developed were unit tested with micro benchmarks to validate their cor- rectness and to measure the gained speed up or additive overhead. Real-world applications were also used to benchmark the developed mechanisms on homogeneous and on heterogeneous architectures. It is found that the contributed Popcorn synchronization mechanism exhibits overhead compared to vanilla Linux on multicore as Linux's equivalent mechanisms is tightly coupled with underlying hardware cache coherency protocol, therefore, faster than software message passing. On heterogeneous platforms, the developed mechanisms allow to transparently map each portion of the application to the processor in the platform on which the execution is faster. Optimizations are recommended to further improve the performance of the proposed synchronization mechanism. However, optimizations may force the user- space application and libraries to be rewritten in order to decouple their synchronization mechanisms of shared memory, therefore losing transparency, which is one of the primary goals of this work. Dedication This work is dedicated to my parents. iii Acknowledgments This work is supported in part by ONR under grant N00014-13-1-0317. Any opinions, find- ings, and conclusions expressed in this thesis are those of the author and do not necessarily reflect the views of ONR. iv Contents 1 Introduction1 1.1 Scalability Problem on Homogeneous ISA Platforms..............2 1.2 Scalability Problem on Heterogeneous ISA Platforms.............3 1.3 Research Contributions..............................3 1.4 Thesis Scope....................................4 1.5 Thesis Organization................................4 2 Related Work5 2.1 Single System Image on Multikernels......................5 2.1.1 K2.....................................5 2.1.2 Cerberus..................................6 2.1.3 Barrelfish Operating System.......................6 2.1.4 Factored Operating System.......................7 2.2 Single System Image on Multicellular Operating Systems...........8 2.2.1 Cellular IRIX...............................8 2.2.2 Disco....................................8 2.2.3 Hive....................................9 2.3 Single System Image on Clusters........................ 10 2.3.1 Kerrighed................................. 10 2.3.2 MOSIX.................................. 10 2.3.3 OpenSSI.................................. 11 v 2.3.4 Bproc................................... 11 2.3.5 Helios................................... 11 3 Popcorn Background 13 3.1 Popcorn System Overview............................ 13 3.2 Multikernel Boot................................. 13 3.3 CPU Partitioning................................. 14 3.4 Memory Partitioning............................... 14 3.5 Message Passing.................................. 14 3.6 Device Drivers................................... 15 3.7 Thread Migration................................. 15 3.8 Distributed File System............................. 16 4 Signals 17 4.1 Introduction.................................... 17 4.1.1 Signals During Start of a Thread.................... 17 4.1.2 Signals Delivery.............................. 17 4.2 Design Motivation................................ 18 4.3 Implementation.................................. 19 4.3.1 Signals Migration............................. 19 4.3.2 Signals Delivery.............................. 19 4.4 Evaluation..................................... 20 5 Futex 22 5.1 Introduction.................................... 22 5.1.1 Futex Wait................................ 24 5.1.2 Futex Wake................................ 24 5.1.3 Futex Requeue.............................. 24 5.2 Design Motivation................................ 24 5.3 Implementation.................................. 25 vi 5.3.1 Global Ticketing Mechanism....................... 27 5.3.2 Granting Entry into Critical Section.................. 28 5.3.3 Process Exit................................ 30 5.3.4 Real-time (Priority Inheritance) Support................ 32 5.3.5 Parallel Messaging Layer......................... 32 5.3.6 Issues in Scheduling............................ 32 5.3.7 Process Synchronization......................... 33 5.3.8 Issue of False Sharing........................... 33 5.4 Evaluation..................................... 33 5.4.1 Experimental Setup............................ 33 5.4.2 Microbenchmarks............................. 34 5.4.3 Macro Benchmarks............................ 40 6 Process Management 48 6.1 Introduction.................................... 48 6.2 Design Motivation................................ 48 6.3 Implementation.................................. 49 6.3.1 Kernel Discovery............................. 49 6.3.2 Single IP Address............................. 49 6.3.3 Unique Process ID............................ 50 6.3.4 Proc File System............................. 51 6.3.5 Namespaces................................ 52 7 Conclusion 53 7.1 Future Work.................................... 54 7.1.1 Shared Memory Synchronization Variants............... 54 7.1.2 Alternative Approach to Synchronization................ 54 7.1.3 Process Management........................... 54 Bibliography 56 vii List of Figures 3.1 Popcorn System Overview............................ 13 4.1 Distributed Signals control flow......................... 20 5.1 Kernel Space design of Futex in Linux. http://lwn.net/Articles/360699/ Used under fair use, 2015................................ 23 5.2 Data structure organization on the server side................. 27 5.3 Distributed Wait Control Flow.......................... 29 5.4 Distributed Wake Control Flow......................... 30 5.5 Distributed Requeue Control Flow....................... 31 5.6 Spinlock Vs Linux Vs Popcorn.......................... 36 5.7 Critical Section Access.............................. 38 5.8 Relative comparison of costs in heterogeneous and homogeneous....... 40 5.9 Performance comparison for IS-BOMP..................... 42 5.10 Performance comparison for CG-BOMP.................... 43 5.11 Performance comparison for FT-BOMP..................... 44 viii List of Tables 4.1 Test cases executed for evaluating signals.................... 21 4.2 Cost of signals sent by kill............................ 21 4.3 Cost of signals sent by killall........................... 21 5.1 Futex's user space variable state......................... 23 5.2 Test cases executed for evaluating Futex.................... 34 5.3 Lock Scalability.................................. 35 5.4 Perf results for scalability............................ 35 5.5 Barriers executed for 100 Rounds........................ 36 5.6 Producer-Consumer: Number of access into critical section.......... 37 5.7 Average syscalls and retries for producer-consumer.............. 37 5.8 Perf results for mutex between two threads................... 39 5.9 Mutex between two threads serialized by barriers............... 39 5.10 Mutex between two threads serialized by barriers in Heterogeneous..... 39 5.11 Average syscalls and retries for Mutex between two threads.......... 39 5.12 Syscall count comparison for IS-BOMP..................... 41 5.13 Cost breakdown for IS-BOMP.........................

Single System Image in a Linux-Based Replicated Operating System Kernel

User Guide Laplink® Diskimage™ 7 Professional

Ada Departmental Supercomputer Shared Memory GPU Cluster

Sprite File System There Are Three Important Aspects of the Sprite ®Le System: the Scale of the System, Location-Transparency, and Distributed State

Vis: Virtualization Enhanced Live Forensics Acquisition for Native System

A User Guide for the FRED Family of Forensic Systems Thank You for Your Recent Order

Network RAM Based Process Migration for HPC Clusters

Clustering with Openmosix

Distributed Virtual Machines: a System Architecture for Network Computing

Dynamic Scheduling with Process Migration*

Workstation Operating Systems Mac OS 9

Distributed Operating Systems

Containers: a Sound Basis for a True Single System Image