LXC(Linux Container)
Lightweight virtual system mechanism Gao feng [email protected]
1 Outline
Introduction Namespace System API Libvirt LXC Comparison Problems Future work
2 Introduction
Container: Operation System Level virtualization method for Linux
Guest1 Guest2 Container Management P1 P2 Tools P1 P2
Namespace API/ABI Namespace Set 1 Set 2 Kernel
3 Introduction
Why Container Better Performance
Kvm Container App Guest-OS Emulator-Lay App Host-OS Host-OS
Easy to set up Multi-Tenancy environment Namespace
Namespace isolates the resources of system, currently there are 6 kinds of namespaces in linux kernel. Mount namespace UTS namespace IPC namespace Net namespace Pid namespace User namespace
5 Mount Namespace
Each mount namespace has its own filesystem layout. /proc/
/ /dev/sda1 / /dev/sda1 / /dev/sda3 /home /dev/sda2 /home /dev/sda2 /boot /dev/sda4
P1 P2 P3
Mount Mount Namespace1 Namespace2
6 UTS Namespace
Every uts namespace has its own uts related information.
ostype: Linux ostype: Linux Unalterable osrelease: 3.8.6 osrelease: 3.8.6 version: … version: …
hostname: uts1 hostname: uts2 alterable domainname : uts1 domainname : uts2
UTS namespace1 UTS namespace2
7 IPC Namespace
IPC namespce isolates the interprocess communication resource(shared memory, semaphore, message queue)
P1 P2 P3 P4
IPC IPC namespace1 namespace2
8 Net Namespace
Net namespace isolates the networking related resources
Net devices: eth0 Net devices: eth1 IP address: 1.1.1.1/24 IP address: 2.2.2.2/24 Route Route Firewall rule Firewall rule Sockets Sockets Proc Proc sysfs sysfs … … Net Namespace1 Net Namespace2
9 PID Namespace
PID namespace isolates the Process ID, implemented as a hierarchy.
PID namespace1 (Parent) ls /proc (Level 0) 1 2 3 4 pid:1 pid:4
P1 pid:2 pid:3 P4 pid:1 pid:1 ls /proc ls /proc 1 P2 P3 1
PID Namespace2 (Child) PID Namespace3 (Child) (Level 1) (Level 1)
10 User Namespace
kuid/kgid: Original uid/gid, Global uid/gid: user id in user namespace, will be translated to kuid/kgid finally Only parent User NS has rights to set map
kuid: kuid: 2000-2004 1000-1009 uid_map uid_map 10 2000 5 0 1000 10 uid: uid: 10-14 0-9
User namespace1 User namespace2 11 User Namespace
Create and stat file in User namesapce
root uid_map: root #touch 0 1000 10 #stat /file /file User namespace File : “/file” Access: uid (0/root)
Disk /file (kuid:1000)
12 System API/ABI
Proc /proc/
System Call clone unshare setns
13 Proc
/proc/
If the proc file of two processes is the same, these two processes must be in the same namespace.
14 System Call
clone int clone(int (*fn)(void *), void *child_stack, int flags, void *arg, …);
6 new flags: CLONE_NEWIPC,CLONE_NEWNET, CLONE_NEWNS,CLONE_NEWPID, CLONE_NEWUTS,CLONE_NEWUSER
15 System Call
clone create process2 and IPC namespace2
Mount1 Mount1
clone(,, CLONE_NEWIPC,) IPC2 IPC1 P1 P2 (new created)
Others1 Others1
16 System Call
unshare int unshare(int flags);
Namespace extends the system call unshare too. User space can use unshare to create new namespace and the caller will run in this new created namespace.
17 System Call
unshare create net namespace2
Mount1 Mount1
unshare(CLONE_NEWNET) Net2 Net1 P1 P1 (new created)
Others1 Others1
18 System Call
setns int setns(int fd, int nstype);
setns is a new added system call for namespace. Process can use setns to set which namespace the process will belong to.
@fd: file descriptor of namespace(/proc/
19 System Call
setns Change the PID namespace of P2
P1 PID1 PID1 P1
setns(open(/proc/p1/ns/pid,) , 0) P2 P2 PID2 PID2
20 Libvirt LXC
Libvirt LXC: userspace container management tool, Implemented as one type of libvirt driver. Manage containers Create namespace Create private filesystem layout for container Create devices for container Resources controller by cgroup
21 Comparison
The feature that host share the same kernel with guest makes container different from other virtualization method
Container KVM
performance Great Normal OS support Linux Only No Limit Security Normal Great Completeness Low Great
22 Problems
/proc/meminfo, cpuinfo… Kernel space (relate to cgroup) User space (poor efficiency)
New namespace Audit (assign to user namespace?) Syslog (do we really need it?)
23 Problems
Bandwidth control TC Qdisc On host (How to handle setting nic to container?) On container (user can change it) Netfilter How to control Ingress bandwidth Disk quota Uid/Gid Quota (Many users ) Project Quota (xfs only)
24 Future Work
Improve Libvirt LXC Unchanged systemd in Libvirt LXC Use interface of systemd to set cgroup Libvirt LXC based Docker Audit namespace
25 Thank you! Q&A
26