CRIU: Time and Space Travel Service for Applications

Pavel Emelyanov LinuxCon NA, New Orleans, 2013 Agenda

What is CRIU?

Project history and state

Usage scenarios

 Live migration

 Reboot-less kernel upgrade

 Slow services startup

 Advanced debugging and testing

 and more...

2 What is CRIU?

Checkpoint Restore In Userspace

Checkpoint or Dump Full info about state Restore or Restart

3 Why in userspace?

Process

Restore: User-space C/R API - syscalls

Kernel Dump: - Ptrace - /proc - netlink - syscalls

kmod

4 CRIU background

Project started ~2 years ago – an RFC on kernel memory API extension – small command line tool – minimal dump of process' internals First release – 23 Jul 2012 – x86 and basic stuff Since then – 150+ kernel patches merged – new for reading and setting process' state

5 Current project state

The latest release – v0.7 – supports x86 & ARM – stuff typical applications use Explicitly checked – Apache, nginx, Oracle*, mysql, mongodb – Ssh/sshd, openvpn*, cron, sendmail – Java, gcc, make – VNC + { gimp, mplayer, blender, supertux } – Screen + { bash, top, tcpdump, tar/bz2 }

* some kernel tweaks required

6 Usage scenarios

Live migration – Useful in cluster Kernel upgrade w/o reboot Slow services startup Periodic snapshots – HPC case Advanced debugging and testing

7 Live migration

Host A Host B

8 Live migration ++

Host A Host B Pre-migrate memory

with memory tracker

Shared FS

9 Load balancing on cluster

Host A Host B

Host C

10 Node maintenance

Host A Host B

11 Kernel upgrade w/o reboot

Host

Kexec Kernel BA

12 Slow services startup

Service readiness Ready 100%

Initialize resource pools

Topup caches

Load config

Spawn process

T

# service foo start time

13 Slow services startup

Service readiness Ready 100%

Spawn process

t < T T

# service foo start time

14 Periodic snapshots

Memory tracker helps to keep images smaller

time

15 HPC

Power failure

time 0% 20% 40% 60% 60%

16 Advanced debugging

Application in trouble Production Host

Developer Host

Debugger

17 Advanced testing

Start App

T ~ 30 sec t ~ 0.1 sec t ~ 0.1 sec

......

18 Advanced testing

...

New test or new hardware ?

19 More (funny) usecases

Forgot to launch your program in screen – Live-migrate it there Playing a game without the save button – Snapshot it Suspend-to-RAM a VDI session

20 CRIU project resources

http://criu.org – project news and documentation http://git.criu.org – git repo with tool sources +CRIU page criu@.org mailing list [email protected] is me

Thank you!

21 Pavel Emelyanov

[email protected]

22 Parallels – Optimized ComputingTM Confidential