Checkpoint Restore In Userspace
SONG, CHANGAN(Leo) APAC Technical Account Manager, Customer Experience & Engagement, Strategic Customer Engagement, Red Hat
1 RED HAT ENTERPRISE LINUX 7 CRIU
Checkpoint / Restore In User space
• 프로세스의 현재 상태 저장 • 이전 상태 복원 기능 (checkpoint 전으로 ) • Checkpoint 된 프로세스의 모든 정보는 하나이상의 이미지 파일로 저장 됨 ( 저장정보 : memory pages, file descriptors, inter-process communication, and so on) • 같은 시스템 또는 다른 시스템에 프로세스 복원 • 컨테이너 라이브 마이그레이션 같은 용도로 사용됨 • RHEL7.3 에 패키지가 포함 (criu 2.3) • Tech-preview 기능으로 등록 https://access.redhat.com/articles/2455211
2 RED HAT ENTERPRISE LINUX 7 CRIU how does it works?
Image files Kernel objects Process tree 001101 001101 101010 101010 110001 110001 011010 011010 000011 000011 010101 010101
Files 001101 001101 101010 101010 110001 110001 011010 011010 Sockets criu 000011 000011 010101 010101
Pipes 001101 001101 101010 101010 110001 110001 011010 011010 000011 000011 010101 010101
Namespaces
3 RED HAT ENTERPRISE LINUX 7 CRIU how does it works?
Kernel interfaces
/proc/
ptrace
Dump Restore syscalls
4 RED HAT ENTERPRISE LINUX 7 CRIU
Dump Parasite code
Receive file descriptors Dump memory content Prctl(), sigaction, pending signals, timers, etc. Ptrace
freeze processes Inject a parasite code Netlink
Get information about sockets, netns Procfs
/proc/PID/maps, /proc/PID/map_files/, /proc/PID/status, /proc/PID/mountinfo
5 RED HAT ENTERPRISE LINUX 7 CRIU
Restore
Collect shared objects Namespaces Restore name-spaces Create a process tree
Restore SID, PGID Processes Restore objects, which should be inherited Files, sockets, pipes, ... Restore per-task properties. Restore memory Call sigreturn Awesome
6 RED HAT ENTERPRISE LINUX 7 CRIU
Interest moment
How to restore shared objects?
Send file descriptors via unix sockets Map files from /proc/self/map_files/ for restoring anon shared mappings How to restore memory mappings on the correct places?
Map a new code block and a stack Unmap crtools' mappings Remap task's mappings on the correct places How to resume a process?
Create a signal frame Call sigreturn()
7 RED HAT ENTERPRISE LINUX 7 CRIU
Birth of CR
• HPC 환경을 위해 개발 • 하나의 어플리케이션이 수백 , 수천 코어에 분산되어 실행되는 환경에 적합 • 특히 어플리케이션이 실패할 경우 , 전체 CPU 사용된 것이 쓸모없게 되고 데이터도 손실되는 약점을 CRIU 로 해소
• 어플리케이션과의 호환성 검토 필요
• 초기에는 관심받지 못하다가 container migration 으로 각광
8 RED HAT ENTERPRISE LINUX 7 CRIU
Limitations
• Inter-process-communication(IPC) 을 이용하여 checkpoint /restore 동작이 가능 .
• 항상 부모 프로세서와 모든 자식 프로세서 checkpoint/restores 에 대 해서 가능 .
• PID 항상 같아야 하며 , 시스템에서 이미 사용하는 PID 가 있는 경 우 , CRIU 를 이용한 프로세스 복구 단계에서 실패 .
https://criu.org/What_cannot_be_checkpointed
9 RED HAT ENTERPRISE LINUX 7 CRIU
Live migration
Host A Host B Pre-migrate memory
with memory tracker
Shared FS http://criu.org/P.Haul
10 RED HAT ENTERPRISE LINUX 7 CRIU
Load balancing on cluster
Host A Host B
Host C
11 RED HAT ENTERPRISE LINUX 7 CRIU
Power saving on cluster
Host A Host B
Host C
12 RED HAT ENTERPRISE LINUX 7 CRIU
Node maintenance
Host A Host B
13 RED HAT ENTERPRISE LINUX 7 CRIU
Kernel upgrade w/p reboot
Host
Kexec Kernel AB
14 RED HAT ENTERPRISE LINUX 7 CRIU
Slow services startup Service readiness Ready 100%
Initialize resource pools
Top-up caches
Load config
Spawn process T # service foo start time
15 RED HAT ENTERPRISE LINUX 7 CRIU
Slow services startup
Service readiness Ready 100%
Spawn process t < T T time # service foo restore
16 RED HAT ENTERPRISE LINUX 7 CRIU
Periodic snapshots
Memory tracker helps to keep images smaller
time
17 RED HAT ENTERPRISE LINUX 7 CRIU
HPC
Power failure
time 0% 20% 40% 60% 60%
18 RED HAT ENTERPRISE LINUX 7 CRIU
Advanced debugging
Application in trouble Production Host
Developer Host
Debugger
19 RED HAT ENTERPRISE LINUX 7 CRIU
Advanced testing
...
New test or new hardware ?
20 RED HAT ENTERPRISE LINUX 7 CRIU
Installation ciru package on RHEL7
# yum install criu -y ... Dependencies Resolved ======Package Arch Version Repository Size ======Installing: criu x86_64 2.3-2.el7 rhel-7-server-rpms 349 k Installing for dependencies: protobuf-c x86_64 1.0.2-3.el7 rhel-7-server-rpms 28 k …
# ldd `which criu` linux-vdso.so.1 => (0x00007ffed554d000) librt.so.1 => /lib64/librt.so.1 (0x00007f5fd0faf000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5fd0d93000) libprotobuf-c.so.1 => /lib64/libprotobuf-c.so.1 (0x00007f5fd0b89000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f5fd0985000) libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f5fd0764000) libc.so.6 => /lib64/libc.so.6 (0x00007f5fd03a2000) /lib64/ld-linux-x86-64.so.2 (0x00007f5fd11bd000) libm.so.6 => /lib64/libm.so.6 (0x00007f5fd00a0000)
21 RED HAT ENTERPRISE LINUX 7 CRIU
How to Use
1) criu on command 2) criu in runc - store checkpoint container # criu --help # runc checkpoint
http://rhelblog.redhat.com/2016/12/08/container-live-migration-using-runc-and-criu/
22 RED HAT ENTERPRISE LINUX 7 CRIU
Demo in runc
23 RED HAT ENTERPRISE LINUX 7 CRIU
Runc criu can now be used for following applications running in a Red Hat Enterprise Linux 7 runc container:
vsftpd apache httpd sendmail postgresql mongodb mariadb mysql tomcat dnsmasq
24 RED HAT ENTERPRISE LINUX 7 25 RED HAT ENTERPRISE LINUX 7 THANK YOU
26 RED HAT ENTERPRISE LINUX 7