01,1001.01 100 010

, , 100 , 1 100 , 1 “All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections.” ----David Wheeler

.-

• :.’:. • : . • ...,:. • :....,:.: .,-.,. ., • .’. What’s Kata Containers

• /// • // • /.,/ ./A • /.,/. • ,/.,/// Kata Containers is Virtualized Container :- • • : : • - • : • •

app centric OS feature package runtime .,- • -,, • • • , , • , ,

system view OS supervisor isolation hardware aware*

Containerization Virtualization

app centric OS feature system view OS supervisor

package runtime isolation hardware aware*

View Point From application to OS; The From hardware to OS; intercept environment of an app the instructions of guest system Design Goal Kernel ABI, including syscall A machine like the host, could interfaces, procfs/sysfs, etc. run a kernel on top of it Implementati Inside kernel, a feature of the A supervisor of the kernel, a on scheduler scheduler of the schedulers

• Namespace Virtual Machine – • Process Process Process – Process

VM is a type of container Guest Kernel indeed Host Kernel *

Industry Compatible

Enhanced Networking

Direct Device Support

Run Custom Kernel

More Secure :-:-:

• : ! - :::

! :-:- -

! :- • : " ,-:-:- " --B Image runC " : -: : : -- - :

Docker Docker Image runC Image * Intel® Clear Containers

Announced project Release 1.0

May 2015 Dec 2017 May 2018

*Other names and brands may be claimed as the property of others.

Multi Architecture Direct Device Assignment Multi Hypervisor SRIOV Full Hotplug NVDIMM K8s Multi Tenancy Multi-OS VM templating KSM throttling Frakti native support CRI-O native support Traffic Controller net MacVTap, multi-queue net

Kata Containers inherits the features from both runV and Clear Containers -4240

• ,040241043 047047444 77 • 47044( 01454 • 74574E 0E420477 C44

• -270( • )440434

Video: Try CVE-2017-5123 in runC ,202-

• 552 215- • (A252C2 72252072 202525 • -(72-( 72233207 02

• )22522013 5212

Video: Try CVE-2017-5123 in Kata

KubeCon Sessions have "security" or "secure" in title

17 18 16 14 11 12 10 8 5 6 4 4 1 2 0 Ku b eCo n E U 16 Ku b eCo n N A 1 6 Ku b eCo n E U 17 Ku b eCo n N A 1 7 Ku b eCo n E U 18 ()(

runC runV gVisor AliUK Notes

self NA Completed Close to Compl Some of the syscall contained Completed eted in gVisor depends on host kernel

performan Very High High Low Very Memcached net_rate (bigger is better) ce High overhead Very Low Low Low Low

Compatibil NA Very Strong Weak Part of ABI ity Strong has not supported in gVisor

Security Weak Very Strong Very Attack surface of Strong Strong gVisor is a bit mysql oltp.lua latency (smaller is better) bigger than others

Source of the benchmark in this page: Alibaba (in Chinese) ))( )(

gVisor Host

Data source: CloudNetEngine

isolation Apps App performance

compatibility Guest Kernel Guest Kernel user - gofer Device Model overhead control (sentry) Vhost flexibility Host Kernel Host Kernel device passthrough kvm vhost kvm ptrace Kata Containers gVisor AGF

• ,FAFDE – AGFDAF:AEF.BF(AGFA • AFDGFADEBDE:FGI-&-)AAF – EAAFDGFFAAGFEGDCGGDFE – .DFD-DABFIF:. – ,GDFEGBBADF DF. .AFD • EAD – )AADAF – IKDIDFF)A – .ABF – ,GDFEH.AFDE.F • -ADM - -& - - 1A1(

• )8.8C – 8 28 8 • 8.2.A8888. – .2 8 8. 81 • 8B88A – A2 82.8A8 .28 . ,

The Kubelet should not vendor a runtime implementation. (Unfortunately, this is not the original words literally)

• B:-/D // :/ • ./,C/CC, – , ///D//: – .:/C, /..,-.C/ • ,::/: • ,::// DI F7

• /7:KFKDF:ZVP Picture from wikipedia – IA7: 7FDC CGGKGG7 GDA7:7FDCGD7GK7DF – FGGCC7DDFDII 2DC)C7GCIF – 02.IFCG DGFDFKDF:NYRVUS • (1,FIC(DCFI:FG CCFG – 1ICCDCFJA:DC7CFGKF:FIC – 1ICFJA:DC7CFGK:DFG – C:CFCA 2D( - - kubelet CRI Spec Sandbox Create Delete List CRI gRPC Container kubelet pod GenericRuntime Create CRI shim SyncLoop SyncPod Start (cri-o, frakti, etc.) Exec Image Pull List • --- • ---- • - )) /() )

Other Containerd client Containerd gRPC (ctr, standalone cri- interface containerd) Meta services (images, containers…) shim Container shim

runtime CRI/gRPC CRI service Kubelet Runtime service (e.g. runc) Pod Sandbox (cri plugin) (tasks) runtime (e.g. runc)

Storage service (snapshot)

containerd + No proxy when virt-serial or Create container with vscok enabled vsock translated OCI spec

shim proxy Kata Agent Container

shim

Runtime Container Kata cmdline

Pod Sandbox Image/rootfs storage

runc cmdline and OCI filesystem spec sharing virt-serial or Create container with vsock translated OCI spec unikernel

Kata Agent Container

streams CRI/gRPC Kata Lib CRI service Container

Image/rootfs Pod Sandbox storage frakti 2 filesystem, block, network storages

K8S

docker engine Virtual Machine Pod docker image container containerd container

9pfs / virtio-blk Kata Agent runc Kata-runtime Linux Kernel

Qemu

namespace cgroup KVM KSM Linux Kernel

Hardware

k8s k8s

Kata Kata Container Container ContainerContainer ContainerContainer ContainerContainer ContainerContainer Container Container Container Container

VM VM VM VM

IaaS

Kata Container

1. Github organizationhttps://github.com/kata-containers/ 2. Runtime: OCI spec compatible and runc cmdline compatible runtime, and provides virtcontainers Libs for embedded use for Frakti 3. Agent: run inside VM as init or systemd service, wrapper of libcontainer. 4. Shimone per container process, used for I/O transfer 5. Proxyproxy the CMD and I/O to kata-agent via serial 6. Linux: 4.14minimum workable Guest Kernel 7. Qemu: 2.11optimized hypervisor with better performance and smaller memory footprint 8. Others: osbuilder, ci, tests, packaging, community, documentation, ksm-throttler (@:

● C:AC##22@2:C:@ ● (@2@2:@@C@AC##:@#22@2:C# ● 2@2CC22@):-@C ● H(2(@2:C *GAC @:::::2-

● A2:C ● /222@2:CC2@ ● .(22&@

● 2:::C22&:CC22@2:C:@ ..,.

● ○ .:.21 ,221.:51: ●

○ ,2.,252..522.: ○ 5..,. ●

○ .121..,12.,.5.,2:2: ○ :.:..,.,2: ○ 22.5.:. :1..2.2,: A

• AAC:CC – 48CH8 – -C44AC88A6A • .8C8CHA4A8846A • 4AACC8C – F • CAA – ,C84C – CA-A