Process Process Process Process Process
OS Interconnect OS
Host A Host B
Migrating LinuX Containers Using CRIU Simon Pickartz, Niklas Eiling, Stefan Lankes, Lukas Razik, and Antonello Monti
RWTH Aachen University, Aachen, Germany Why do we need Migration?
Resiliency Increasing hard- and software failures with growing cluster sizes Evacuation of faulty nodes ⇒ no whole job aborts
Load balancing Applications’ scalability is usually limited by a single resource Co-scheduling can improve the overall cluster utilization Revocation of an exclusive node assignment Dynamic schedules necessary
2 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 App App App
Abstraction Layer
Shared Kernel
Hardware
⇒ Containers are fast and light-weight
Why should we care about containers?
Virtual Maschines App App App Virtualization of hardware Guest Guest Multiple kernels Kernel Kernel
Device emulation Hypervisor Requires special hardware support for nearly native performance Host Kernel
Flexibel in terms of kernel / OS choice Hardware
3 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 App App App
Guest Guest Kernel Kernel
Hypervisor
Host Kernel
⇒ Containers are fast and light-weight
Why should we care about containers?
Containers Re-use host kernel
User-space instances defined and App App App separated by so-called cgroups Abstraction Layer namespaces No special hardware support Shared Kernel required Hardware Restricts kernel / OS choice
3 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 App App App
Guest Guest Kernel Kernel
Hypervisor
Host Kernel
Why should we care about containers?
Containers Re-use host kernel
User-space instances defined and App App App separated by so-called cgroups Abstraction Layer namespaces No special hardware support Shared Kernel required Hardware Restricts kernel / OS choice
⇒ Containers are fast and light-weight
3 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Agenda
Background Container-based Virtualization in terms of HPC Libvirt Checkpoint / Restore in Userspace (CRIU)
Lxctools Driver Integration into Libvirt Migration Based on CRIU
Evaluation lxctools Driver Overhead Migration Time
Conclusion
4 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Container-based Virtualization in terms of HPC
Isolation at OS level which is lower than with hypervisors ⇒ Container may crash whole system App App App No overhead introduced by multiple kernels Abstraction Layer
Flexible resource allocation Shared Kernel Potentially better I / O and CPU performance than common Hardware hypervisor-based approaches ⇒ Complies with the goals of HPC
5 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Namespaces and Cgroups
Namespaces and Cgroups are Linux Vanilla Kernel features Namespaces separate processes by groups Cgroups control, limit, and observe the resource usage of processes
Namespaces cgroup subsystems
pid Process IDs cpu CPU time and utilization net Network configuration cpuset Amount of CPU cores ipc Inter-process communication blkio Limit / observe I / O accesses mnt Mountpoints memory Limit memory uts Hostname consumption / utilization user User and group IDs
6 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 LinuX Containers
Infrastructure for container projects (e. g. Docker) Linux Conainers (LXC) use cgroups and namespaces User-space solution ⇒ needs no customized kernels Ships with C-API Command-Line Interface (CLI)
7 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Libvirt
Domain (VM etc.) Management library written in C Support for a variety of virtualization solutions, e. g. QEMU, VMware ESX, OpenVZ Domain Independent Layer ⇒ Independence of changes to virtualization layer Implements common functionality (e. g., configuration via XML) Domain Dependent Layer Implemented in terms of drivers Remote driver enabling remote management of domains
dom dom dom To Network
libvirt libvirt drv libvirt drv ··· remote drv
Domain Independent Layer
virsh API libvirtd From Network
User Interactions
8 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Checkpoint / Restore in Userspace (CRIU)
Checkpoint / Restore (C / R) without kernel modifications
C/R processes or groups thereof Process Supports incremental checkpoints Uses ptrace to freeze a process and inject parasite code dump restore Gathers file descriptors, memory maps, and register contents Requires root privileges File System Provides direct memory transfer to a page-server running on remote nodes
9 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 The lxctools Driver
Utilizes LXC and CRIU for usage by libvirt Minor libvirt source code modifications Make the driver public to libvirt (7 source files) Link against the LXC’s C-API Remote management capabilities by libvirt Implemented features Start / Stop containers Checkpoint / Restore containers Migrate containers (cold- and live-migration) Configuration of LXC instances via XML lxctools driver not related to libvirt’s lxc driver libvirt’s lxc driver provides no migration support
10 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Migration by lxctools Driver
Implements skeleton of libvirt’s domain-independent layer Reduce file system overhead by using tmpfs Migration in four steps 1. Create tmpfs on both nodes 2. Start page-server on target node 3. Dump process and transfer memory content via page-server transfer other checkpointing data via file-transfer 4. Restore process from dump on target system and remove tmpfs
source node dest. node container container memory content
memory content, metadata metadata
metadata file system file system
11 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Live-Migration by lxctools Driver
Live-migration leverages incremental dumps Static and dynamic iteration counts are supported Pre-dumps only for memory content Common pre-copy live-migration approach
pre-dumps
final dump (stop domain)
restore domain
resume domain
12 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 lxctools Driver Overhead
Systems: 2 NUMA nodes, each with 2 sockets Intel IvyBridge CPUs (E5-2650 v2), 8 cores (16 threads) @2.6 GHz Gigabit Ethernet Fedora 23, Linux Kernel v4.4.4-301, libvirt v1.2.16
1 LXC Comparison: Execution time libvirt 0.8 libvirt & lxctools vs. LXC CLI Averaged over 200 runs 0.6 Save / restore 0.4
with 1 GiB memory load Execution Time in0 s .2
Libvirt support by the cost 0 of a few tens of milliseconds start shutdown save restore Commands
13 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Comparison: Migration Time
Empty VM / Container Migration with Workload
4 40
3 30
2 20
1 10 Migration Time in s Migration Time in s
0 0 0.25 0.5 1 2 4 Min 0.25 0.5 1 2 4 Domain size in GiB Workload in GiB
lxctools QEMU / KVM
14 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Cold vs. Live Migration
1 GiB memory load Cold Averaged over 20 runs Live In this case: 0 2 4 6 8 10 12 only two pre-dumps Execution Time in Seconds Live Migration Downtime of 1.96 s Pre-dump Dump Restore
15 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Conclusion
First libvirt driver with migration support for LXC containers Working prototype Support for container management via libvirt Minimal overhead to LXC CLI Promising migration times Future work Reduction of downtimes for live-migration RDMA migration Open source: https://github.com/RWTH-OS/libvirt
16 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Thank you for your kind attention!
Simon Pickartz et al. – [email protected]
Institute for Automation of Complex Power Systems E.ON Energy Research Center, RWTH Aachen University Mathieustraße 10 52074 Aachen www.eonerc.rwth-aachen.de