Process Process Process Process

OS Interconnect OS

Host A Host B

Migrating Containers Using CRIU Simon Pickartz, Niklas Eiling, Stefan Lankes, Lukas Razik, and Antonello Monti

RWTH Aachen University, Aachen, Germany Why do we need Migration?

Resiliency Increasing hard- and software failures with growing cluster sizes Evacuation of faulty nodes ⇒ no whole job aborts

Load balancing Applications’ scalability is usually limited by a single resource Co- can improve the overall cluster utilization Revocation of an exclusive node assignment Dynamic schedules necessary

2 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 App App App

Abstraction Layer

Shared Kernel

Hardware

⇒ Containers are fast and light-weight

Why should we care about containers?

Virtual Maschines App App App of hardware Guest Guest Multiple kernels Kernel Kernel

Device emulation Requires special hardware support for nearly native performance Host Kernel

Flexibel in terms of kernel / OS choice Hardware

3 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 App App App

Guest Guest Kernel Kernel

Hypervisor

Host Kernel

⇒ Containers are fast and light-weight

Why should we care about containers?

Containers Re-use host kernel

User-space instances defined and App App App separated by so-called Abstraction Layer namespaces No special hardware support Shared Kernel required Hardware Restricts kernel / OS choice

3 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 App App App

Guest Guest Kernel Kernel

Hypervisor

Host Kernel

Why should we care about containers?

Containers Re-use host kernel

User-space instances defined and App App App separated by so-called cgroups Abstraction Layer namespaces No special hardware support Shared Kernel required Hardware Restricts kernel / OS choice

⇒ Containers are fast and light-weight

3 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Agenda

Background Container-based Virtualization in terms of HPC Libvirt Checkpoint / Restore in Userspace (CRIU)

Lxctools Driver Integration into Libvirt Migration Based on CRIU

Evaluation lxctools Driver Overhead Migration Time

Conclusion

4 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Container-based Virtualization in terms of HPC

Isolation at OS level which is lower than with ⇒ Container may crash whole system App App App No overhead introduced by multiple kernels Abstraction Layer

Flexible resource allocation Shared Kernel Potentially better I / O and CPU performance than common Hardware hypervisor-based approaches ⇒ Complies with the goals of HPC

5 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Namespaces and Cgroups

Namespaces and Cgroups are Linux Vanilla Kernel features Namespaces separate processes by groups Cgroups control, limit, and observe the resource usage of processes

Namespaces cgroup subsystems

pid Process IDs cpu CPU time and utilization net Network configuration cpuset Amount of CPU cores ipc Inter-process communication blkio Limit / observe I / O accesses mnt Mountpoints memory Limit memory uts Hostname consumption / utilization user User and group IDs

6 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 LinuX Containers

Infrastructure for container projects (e. g. Docker) Linux Conainers (LXC) use cgroups and namespaces User-space solution ⇒ needs no customized kernels Ships with -API Command-Line Interface (CLI)

7 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Libvirt

Domain (VM etc.) Management library written in C Support for a variety of virtualization solutions, e. g. QEMU, VMware ESX, OpenVZ Domain Independent Layer ⇒ Independence of changes to virtualization layer Implements common functionality (e. g., configuration via XML) Domain Dependent Layer Implemented in terms of drivers Remote driver enabling remote management of domains

dom dom dom To Network

libvirt libvirt drv libvirt drv ··· remote drv

Domain Independent Layer

virsh API libvirtd From Network

User Interactions

8 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Checkpoint / Restore in Userspace (CRIU)

Checkpoint / Restore (C / R) without kernel modifications

C/R processes or groups thereof Process Supports incremental checkpoints Uses ptrace to freeze a process and inject parasite code dump restore Gathers file descriptors, memory maps, and register contents Requires root privileges File System Provides direct memory transfer to a page-server running on remote nodes

9 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 The lxctools Driver

Utilizes LXC and CRIU for usage by libvirt Minor libvirt source code modifications Make the driver public to libvirt (7 source files) Link against the LXC’s C-API Remote management capabilities by libvirt Implemented features Start / Stop containers Checkpoint / Restore containers Migrate containers (cold- and live-migration) Configuration of LXC instances via XML lxctools driver not related to libvirt’s driver libvirt’s lxc driver provides no migration support

10 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Migration by lxctools Driver

Implements skeleton of libvirt’s domain-independent layer Reduce file system overhead by using tmpfs Migration in four steps 1. Create tmpfs on both nodes 2. Start page-server on target node 3. Dump process and transfer memory content via page-server transfer other checkpointing data via file-transfer 4. Restore process from dump on target system and remove tmpfs

source node dest. node container container memory content

memory content, metadata metadata

metadata file system file system

11 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Live-Migration by lxctools Driver

Live-migration leverages incremental dumps Static and dynamic iteration counts are supported Pre-dumps only for memory content Common pre-copy live-migration approach

pre-dumps

final dump (stop domain)

restore domain

resume domain

12 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 lxctools Driver Overhead

Systems: 2 NUMA nodes, each with 2 sockets Intel IvyBridge CPUs (E5-2650 v2), 8 cores (16 threads) @2.6 GHz Gigabit Ethernet Fedora 23, v4.4.4-301, libvirt v1.2.16

1 LXC Comparison: Execution time libvirt 0.8 libvirt & lxctools vs. LXC CLI Averaged over 200 runs 0.6 Save / restore 0.4

with 1 GiB memory load Execution Time in0 s .2

Libvirt support by the cost 0 of a few tens of milliseconds start shutdown save restore Commands

13 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Comparison: Migration Time

Empty VM / Container Migration with Workload

4 40

3 30

2 20

1 10 Migration Time in s Migration Time in s

0 0 0.25 0.5 1 2 4 Min 0.25 0.5 1 2 4 Domain size in GiB Workload in GiB

lxctools QEMU / KVM

14 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Cold vs. Live Migration

1 GiB memory load Cold Averaged over 20 runs Live In this case: 0 2 4 6 8 10 12 only two pre-dumps Execution Time in Seconds Live Migration Downtime of 1.96 s Pre-dump Dump Restore

15 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Conclusion

First libvirt driver with migration support for LXC containers Working prototype Support for container management via libvirt Minimal overhead to LXC CLI Promising migration times Future work Reduction of downtimes for live-migration RDMA migration Open source: https://github.com/RWTH-OS/libvirt

16 of 17 Migrating LinuX Containers Using CRIU | Simon Pickartz et al. | ACS | 06/23/2016 Thank you for your kind attention!

Simon Pickartz et al. – [email protected]

Institute for Automation of Complex Power Systems E.ON Energy Research Center, RWTH Aachen University Mathieustraße 10 52074 Aachen www.eonerc.rwth-aachen.de