How to Virtualize with KVM

Christian Bornträger

© 2018,2019 IBM Corporation Trademarks

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines

Corporation in the United States, other countries, or both.

If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark

symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time

this information was published. Such trademarks may also be registered or common law trademarks in other

countries.

A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at

www.ibm.com/legal/copytrade.shtml

The following are trademarks or registered trademarks of other companies.

is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

© 2018,2019 IBM Corporation 2 Agenda

● Overview

● Components and management infrastructure

● Devices

● Performance measurement

● Problem determination

● Usability

© 2018,2019 IBM Corporation 3 What is KVM

● KVM is the abbreviation for Kernel-based

● It is part of the Linux ecosystem

● KVM is used as a building block in the industry

– KVM components are used to build the public clouds

● KVM Forum conference had many participants from Tencent, AliBaba, Yandex, AWS, Google and many more – IBM HyperProtect Services also use KVM components – KVM is used to isolate containers (e.g. Kata containers) – KVM is used as a normal instead of VMWare, z/VM, and others

© 2018,2019 IBM Corporation 4 How to get KVM

● Red Hat

– RHEL 7.6-alt: Kernel 4.14, QEMU 2.12, 4.5 – RHEL 8: Kernel 4.18, QEMU 2.12, Libvirt 4.5

● SLES

– SLES 12 SP4: Kernel 4.12, QEMU 2.11, Libvirt 4.0 SLES 15 SP1: Kernel 4.12; QEMU 3.1, Libvirt 5.0

– 16.04 LTS : Kernel 4.4, QEMU 2.5, Libvirt 1.3.1 – 18.04 LTS : Kernel 4.15, QEMU 2.11, Libvirt 4.0

© 2018,2019 IBM Corporation 5 How to get KVM

● Red Hat

– yum install -kvm libvirt virt-install – do a modprobe kvm (once)

● SLES

– zypper install qemu-kvm libvirt virt-manager virt-install

● Ubuntu

– apt install qemu-kvm libvirt-daemon libvirt-clients virt-manager

© 2018,2019 IBM Corporation 6 components

● BASE

– Linux base system – : kvm module – QEMU: device emulation, kvm exploitation,... – Libvirt: base management layer

● Management U U U k M M M r – Openstack: big management infrastructure e E E E a s s s s s s t g

– Q Q Q s s s s s s

Virtmanager: small management tool s a e e e e e e n n

– c c c c c c

Virsh libvirt provided command line tool e a t o o o o o o p r r r r r r r i – m

Kubevirt o t p p p p p p v r

i – b i

virt-install v X X X X X X l U U U ● Others U U U t t t N N N N N N I I I I I I s s s – L L L L L L

Openvswitch e e e u u u –

….. g g g

LINUX kernel KVM sie

© 2018,2019 IBM Corporation HW 7 Being a Linux process

● As KVM guests are just normal processes, KVM inherits a huge amount of features from Linux

– SMT – Paging – CPU scheduling – ECKD, FCP, NVMe disks – OSA,ROCE,Hipersocket – Multiple subchannel sets – Parallel Access Volumes (PAVs) – Other storage protocols also available (ie: iSCSI, NFS, GFS2, OCFS2, GPFS) – Encryption of guest disks via dmcrypt – …..

© 2018,2019 IBM Corporation 8 Being related to KVM

● As KVM on z shares a lot of code with the x86 variant , KVM inherits a huge amount of features from that

– Live guest migration via TCP/IP – Installs can be automated via scripts (AutoYast, KickStart, PreSeed) – Support for both character and graphical consoles – Network IPL – Can emulate modern hardware such as DVDs via ISO files. – Supports live resize of CPUs, networking, and storage devices in both the KVM host and guest virtual servers – Can take snapshots of running virtual machines (not just the disk storage) – With “Copy On Write” storage, snapshots and cloning of virtual servers are near instantaneous – Support for nested (KVM under KVM) – ...

© 2018,2019 IBM Corporation 9 Being on Z provides unique features

● When running on Z the mature hardware support for virtualization makes it easy to provide support

– CPACF crypto hardware exploitable by Host and Guests – Guarded Storage for Java Pause-less Garbage Collection – CryptoExpress cards are exploitable by hosts and guests – support for large pages in guest and host

© 2018,2019 IBM Corporation 10 Management by libvirt

© 2018,2019 IBM Corporation 11 Domain XML

vs1 524288 2 hvm 1 vs1 512MB eth0 vda bond0 Dont worry! Most things can be handled by tools! 0.0.1000 0.0.2000 0.0.3000

© 2018,2019 IBM Corporation 12 Disk storage

© 2018,2019 IBM Corporation 13 Disk Storage Options – Guest View

● KVM on Z provides no storage emulation

● Paravirtualized storage provided to the guest using virtio interfaces

– Virtual hard disks and CD/DVD drives

● Passthrough of host devices in progress

© 2018,2019 IBM Corporation 14 Disk Storage Options – Host Backing

● Image files (raw, )

– Residing in host filesystem – Flexible and space efficient

● Full disks (ECKD, FCP, other SAN)

– Requires planning – Best performance

● Network storage (NAS)

– Image files stored on NFS, CIFS, ...

© 2018,2019 IBM Corporation 15 Disk Storage (virtio)

Host vs1 vs2 vs3 img01 dm-0 img00 multipath dasda dasdb

SCSI ECKD ECKD LUN vola volb

© 2018,2019 IBM Corporation 16 Networking

© 2018,2019 IBM Corporation 17 Networking Options

● KVM guest OS only sees virtio network interfaces

● Virtio interfaces are backed most commonly by

– Host interfaces directly using macvtap – Linux bridges – OpenVSwitches

● Host interfaces can be

– OSA – Hipersocket – PCI (with limitations)

© 2018,2019 IBM Corporation 18 Think Switches

Logical View

● Conceptually all network Virtual Server 1 Virtual Server 2 attachments can be viewed as switches VNICs eth0 eth1 eth0

● Differences in capabilities and characteristics exist though p0 p1 p2 p3 … Switch Ports

VLAN 42 VLAN 17 VLAN 42 Virtual Switch

up0 …Uplink

Bonding I/F bond0 (for HA)

Host NICs eth0 eth1

© 2018,2019 IBM Corporation 19 Macvtap as a Switch

Implementation View Logical View

Virtual Server 1 Virtual Server 2 Virtual Server 1 Virtual Server 2

eth0 eth1 eth0 VNICs eth0 eth1 eth0

p0 p1 p2 p3 … Switch VLAN 42 VLAN 17 VLAN 42 Ports macvtap0@ macvtap1@ macvtap2@ Virtual Switch bond0.17 Bond0.42 Bond0.42 up0 …Uplink bond0.17 bond0.42

bond0 Bonding I/F bond0 (for HA)

eth0 eth1 Host NICs eth0 eth1

© 2018,2019 IBM Corporation 20 Macvtap: a More Common Setup

Implementation View Logical View

Virtual Server 1 Virtual Server 2 Virtual Server 1 Virtual Server 2

eth0 eth0 VNICs eth0 eth0

p0 p1 p2 p3 … Switch Ports Virtual Switch macvtap0@ macvtap1@ bond0 bond0 up0 …Uplink

bond0 Bonding I/F bond0 (for HA)

eth0 eth1 Host NICs eth0 eth1

© 2018,2019 IBM Corporation 21 Macvtap Characteristics

● No extra setup required

● Fastest connectivity option

● Layer 2 Only

● Can use shared OSAs, hipersockets

– But no sharing between migration hosts

● VLAN supported in access mode

© 2018,2019 IBM Corporation 22 Linux Bridge as a Switch

Implementation View Logical View

Virtual Server 1 Virtual Server 2 Virtual Server 1 Virtual Server 2

VNICs eth0 eth1 eth0 eth0 eth1 eth0

tap0 tap1 tap2 p0 p1 p2 p3 … Switch br_vlan17 br_vlan42 VLAN 42 VLAN 17 VLAN 42 Ports Virtual Switch bond0.17 bond0.42 up0 …Uplink

bond0 Bonding I/F bond0 (for HA) eth0 eth1 Host NICs eth0 eth1

© 2018,2019 IBM Corporation 23 Linux Bridge: a More Common Setup

Implementation View Logical View

Virtual Server 1 Virtual Server 2 Virtual Server 1 Virtual Server 2

VNICs eth0 eth0 eth0 eth0

tap0 tap1 p0 p1 p2 p3 … Switch Ports virbr0 Virtual Switch

up0 …Uplink

bond0 Bonding I/F bond0 (for HA) eth0 eth1 Host NICs eth0 eth1

© 2018,2019 IBM Corporation 24 Linux Bridge Characteristics

● Moderate setup requirements

● Layer 2 Only

● OSA must operate in bridgeport mode

– Not shareable between KVM hosts

● Hipersockets must enable VNICC learning and flooding

– Shareable between KVM hosts

● VLAN supported in access mode

© 2018,2019 IBM Corporation 25 OpenVSwitch IS a Switch

Logical View Logical View

Virtual Server 1 Virtual Server 2 Virtual Server 1 Virtual Server 2

eth0 eth1 eth0 VNICs eth0 eth1 eth0

tap0 tap1 tap2

p0 p1 p2 p3 p0 p1 p2 p3 … Switch VLAN 42 VLAN 17 VLAN 42 VLAN 42 VLAN 17 VLAN 42 Ports ovsbr0 Virtual Switch

up0 up0 …Uplink

bond0 Bonding I/F bond0 (for HA)

eth0 eth1 Host NICs eth0 eth1

© 2018,2019 IBM Corporation 26 OpenVSwitch Characteristics

● Requires some understanding of switches in general and OVS specifically

● Most flexible

● Layer 2 Only

● OSA must operate in bridgeport mode

– Not shareable between KVM hosts

● Hipersockets must enable VNICC learning and flooding

– Shareable between KVM hosts

● VLAN supported in access and trunk mode

© 2018,2019 IBM Corporation 27 Other devices

© 2018 IBM Corporation 28 What else

● virtio-balloon: lightweight memory hotplug

● virtio-random: share hardware random numbers

● Virtio-scsi: use a virtual SCSI HBA to talk to disks and others

● Virtio-vsock: network-less communication between guests<→host

● virtio-: file system passthrough

● Virtio-gpu: provide a frame buffer device (useful for VNC to the guest)

● Sclp line mode console

● ….

© 2018,2019 IBM Corporation 29 Performance Measurement and Auditing

© 2018 IBM Corporation 30 Base Account Features

● Guest time accouting for tools like sar, top,etc

● host # sar -u ALL 1 100 [...] 09:58:28 AM CPU %usr %nice %sys %iowait %steal %irq %soft %guest %gnice %idle 09:58:29 AM all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.14 0.00 92.86 09:58:30 AM all 0.00 0.00 0.07 0.00 0.00 0.00 0.00 7.14 0.00 92.79 09:58:31 AM all 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.14 0.00 92.8 [...]

● Steal time accounting in guests

guest # sar -u 1 100 [...] 10:19:54 CPU %user %nice %system %iowait %steal %idle 10:19:55 all 100.00 0.00 0.00 0.00 0.00 0.00 10:19:56 all 100.00 0.00 0.00 0.00 0.00 0.00 10:19:57 all 100.00 0.00 0.00 0.00 0.00 0.00 10:19:58 all 99.01 0.00 0.00 0.00 0.99 0.00 10:19:59 all 100.00 0.00 0.00 0.00 0.00 0.00 [...]

© 2018,2019 IBM Corporation 31 kvm_stat

kvm statistics - summary

Event Total %Total CurAvg/s Many more exit counters added halt_wakeup 637549 26.0 kvm_stat allows exit_wait_state 635803 25.9 Grouping deliver_ckc 430489 17.6 exit_null 253775 10.4 Filtering by guest inject_io 185154 7.6 Filtering by regex deliver_io 182803 7.5 Default uses trace points exit_instruction 70539 2.9 instruction_diag_44 26898 38.1 Use option -d to get counters instruction_sigp_emergency 21592 30.6 Logging mode instruction_diag_500 18452 26.2 Option -l instruction_essa 860 1.2 instruction_diag_9c 713 1.0 Packages instruction_io_other 447 0.6 Ubuntu 18.04: linux-tools-host instruction_tsch 258 0.4 Rhel 7.5: qemu-kvm-tools-ma [...] inject_emergency_signal 21592 0.9 SLES12: kvm_stat deliver_emergency_signal 21524 0.9 From kernel tree halt_attempted_poll 5084 0.2 userspace_handled 1900 0.1 exit_external_request 1578 0.1 [...]

© 2018,2019 IBM Corporation 32 Problem Determination

© 2018,2019 IBM Corporation 33 Kdump in Guest

Self dumping like in z/VM or LPAR kdumpctl from kexec-tools systemctl enable kdump Several triggers Via dump on panic Via psw restart (virsh inject-nmi )

Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 34 Hypervisor Dump

What if guest is completely locked up or kdump was not configured? Save guest dump in host virsh dump –memory-only Developer can use crash tool to debug Customer can use makedumpfile to compress and/or filter dump (e.g. exclude all memory from processes)

© 2018,2019 IBM Corporation 35 Libvirt Log

Permanent logging of libvirt /var/log/libvirt/qemu/xxxx.log Logrotation is in place

© 2018,2019 IBM Corporation 36 Usability

© 2018,2019 IBM Corporation 37 virt-install

● Normal installation path:

● Define new guest domain with

– Disk – Network connectivity – Kernel/initrd – CD/DVD or prepare network installation media

● Boot VM and install

● Switch boot device & reboot

© 2018,2019 IBM Corporation 38 Distribution ISO Repository virt-install Alpine Linux Yes - 3.8.0 ClefOS 7.4 No http://download.sinenomine.net/clefos/ 7.4 ● Virt-install takes care of all the previous steps. Debian Stretch No http://ftp.debian.org/debian/dists/stret (9.4) ch/main/installer-s390x ● No manual XML editing.

●root@host# Fedora 29,30 No https://download.fedoraproject.org/pu virt-install --name vm-name \ b/fedora-secondary/releases/28/Server --disk size=10 \ ● /s390x/os --memory=2000 \ openSUSE Yes http://download.opensuse.org/ports/zs $install_approach Factory ystems/factory/repo/oss ● RHEL 7.5 and Yes - ● Two install approaches: later SLES 12 SP2 Yes - ● ISO Image: and later ● --cdrom /var/lib/libvirt/images/bionic-server-s390x.iso Ubuntu Xenial Yes http://ports.ubuntu.com/ubuntu-ports/di (16.04), sts/bionic/main/installer-s390x ● Repository Bionic (18.04) http://ports.ubuntu.com/ubuntu-ports/di sts/xenial/main/installer-s390x ● --location http://ftp.debian.org/debian/dists/stretch/main/installer-s390x/

© 2018,2019 IBM Corporation 39 text virt-install un-attended install install ignoredisk --only-use=vda clearpart --all autopart bootloader --location=mbr --driveorder=vda rootpw super!secret:-O timezone Europe/Athens ● optimize your installs by providing reboot %packages response files to installers: @Core %end – kickstart file (for Red Hat installs), f30.ks (Kickstart Response File) – autoyast (for SUSE), or – preseed (Ubuntu, Debian, or any of its other derivatives).

root@host# virt-install --name fedora \ --disk size=10 \ --memory=2000 \ --location https://download.fedoraproject.org/pub/\ fedora-secondary/releases/30/Server/s390x/os/ \ --initrd-inject=f30.ks \ --extra-args inst.ks=file:///f30.ks \

© 2018,2019 IBM Corporation 40 virt-install special setups

● Use a dasd instead of an image file

root@host# virt-install --disk path=/dev/disk/by-id/ccw-0XABCD ...

● Apply suggested I/O parameters root@host# virt-install --disk path=/dev/disk/by-id/ccw-0XABCD,cache=none,io=native ...

● Use macvtap

root@host# virt-install --network type=direct,source=enccw0.0.f500,source_mode=bridge ...

● Disable vnc and use text console (if graphic is supported and default)

root@host# virt-install --graphics=none

© 2018,2019 IBM Corporation 41 http://kvmonz.blogspot.com/ https://www.ibm.com/support/knowledgecenter/en/linuxonibm/liaaf/lnz_r_kvm_base.html

Thank You!

© 2018,2019 IBM Corporation 42 Backup

© 2018,2019 IBM Corporation 43 Virtio

● Common data structure (virtio ring Guest buffers and queues) used for all device types

– Block, network, SCSI, console, RNG, ...

● Host can directly access the guest's buffers for data transfer

● Emulation only used for device Data detection using native mechanisms transfer (channel subsystem, PCI) QEMU

© 2018,2019 IBM Corporation 44 Virtual I/O Devices

● Emulated

– Guests can use the native device drivers – Example: SCLP(*) and 3270 console types

● Para-virtualized

– Higher efficiency – Guests need special device drivers – Example: virtio devices

● Passthrough (in development)

– Potentially highest efficiency – Usually inhibits live migration – Guests can use the native device drivers – Example: Crypto Express, CCW ECKD,

© 2018,2019 IBM Corporation 45 HW Crypto

© 2018 IBM Corporation 46 Overview – HW Crypto Support in IBM Z

CPC Drawer Each PU is capable of having the CPACF function

Crypto Express PCIe I/O drawers Trusted Key Entry (TKE) Smart Cards

© 2018 IBM Corporation 47 Crypto express

● KVM support added in future (and latest) distributions

– SLES15 SP1 – RHEL 8.0 – Ubuntu 18.04

● Only full pass-through of Adapter/Domain tuples

– Focus on secure key capabilities – No sharing

© 2018 IBM Corporation 48 CPACF

● CPACF (CP assist for Cryptographic Functions)

– Several versions MSA1...MSA8 depending on HW level – Encryption (AES, DES) – Hashing (SHAx) – True random number – Clear Key and Protected key

● MSA1..MSA8 (z14) Clear Key fully virtualized by KVM

● Protected key support also fully virtualized, but for key management a crypto adapter/domain is necessary in the guest

© 2018 IBM Corporation 49 CPU Hotplug, Shares and Pinning

© 2018,2019 IBM Corporation 50 CPU Virtualization

● Guest CPUs are normal threads of the ... 2 QEMU process Each thread is a scheduling entity being handled by the Linux scheduler – QEMU and the CPU threads are also handled by 2048 1000000 ● This allows -1 1000000 – Pinning to host CPUs -1 ● VCPUs 1000000 -1 ● IO threads 1000000 ● Other QEMU threads -1 – Providing relative shares between guests ... ● Shares are for the full guest and not multiplied with the number of CPUs – Bandwidth control

© 2018,2019 IBM Corporation 51 HA / DR

© 2018,2019 IBM Corporation 52 Elements of HA/DR

Application based replication e.g. mongodb Network failover See network chapter Channel bonding for layer 2 networks Dynamic Routing for layer 3 networks (e.g. to connect to z/OS) Disk failover Failover can be implemented transparently for the guest Image files will inherit all host failover solutions Relies on typical Linux means for failover SCSI multipath ECKD with 2 or more pathes

© 2018,2019 IBM Corporation 53 Inside the Guest: /proc/sysinfo

Provides uuid, name and configuration or Machine,LPAR and Guest

[root@zhyp137 ~]# cat /proc/sysinfo | tail -n 8 VM00 Control Program: KVM/Linux VM00 Adjustment: 1000 VM00 CPUs Total: 4 VM00 CPUs Configured: 4 VM00 CPUs Standby: 0 VM00 CPUs Reserved: 0 VM00 Extended Name: zhyp137 VM00 UUID: 4c3ae636-529d-4d90-b203-c8d3d150f0d0

© 2018,2019 IBM Corporation 54 Inside the Guest: STHYI, qclib and ILMT

[root@guest]# qc_test [...] ===== Layer 3: KVM-hypervisor ======qc_layer_type [n/a]: KVM-hypervisor qc_layer_category [n/a]: HOST Subcapacity pricing requires information qc_layer_type_num [n/a]: 6 KVM now provides the STHYI instruction similar to z/VM qc_layer_category_num [n/a]: 2 qc_control_program_id [S ]: KVM/Linux Flexible software pricing via direct usage, qclib and/or ILMT qc_adjustment [S ]: 1000 For more information on qclib, see qc_num_cpu_total [ V]: 4 http://www.ibm.com/developerworks/linux/linux390/qclib.html qc_num_cpu_dedicated [SHV]: 0 qc_num_cpu_shared [SHV]: 4 qc_num_cp_total [ HV]: 0 qc_num_cp_dedicated [ hV]: 0 qc_num_cp_shared [ hV]: 0 qc_num_ifl_total [SHV]: 4 qc_num_ifl_dedicated [ShV]: 0 qc_num_ifl_shared [ShV]: 4 ======

===== Layer 4: KVM-guest ======qc_layer_type [n/a]: KVM-guest qc_layer_category [n/a]: GUEST qc_layer_type_num [n/a]: 7 qc_layer_category_num [n/a]: 1 [...]

© 2018,2019 IBM Corporation 55 Perf Sampling

# perf kvm stat live

09:15:39.047705

Analyze events for all VMs, all VCPUs:

VM-EXIT Samples Samples% Time% Min Time Max Time Avg time

Wait state 99412 46.68% 99.60% 0.30us 4999968.11us 901.84us ( +- 10.33% ) DIAG (0x500) KVM virtio functions 66230 31.10% 0.22% 0.60us 167.39us 3.00us ( +- 0.25% ) SIGP emergency signal 46627 21.89% 0.18% 0.55us 175.01us 3.44us ( +- 0.31% ) Partial-execution 340 0.16% 0.00% 0.31us 5.66us 1.25us ( +- 5.38% ) External request 186 0.09% 0.00% 0.22us 2.88us 0.34us ( +- 4.59% ) Host interruption 132 0.06% 0.00% 0.21us 8.44us 0.91us ( +- 12.21% ) DIAG (0x9c) time slice end directed 37 0.02% 0.00% 0.69us 4.38us 1.33us ( +- 7.17% )

Total Samples:212964, Total events handled time:90013327.50us.

© 2018,2019 IBM Corporation 56 virt-manager

● Virt-manager also allows to install guests

– Uses virt-install under the cover

© 2018,2019 IBM Corporation 57 cockpit

● Web-based management console for Redhat / Fedora

– Cockpit-machines can manage KVM guests

● Uses virt-install under the cover – How to enable

● https://cockpit-project.org/running

● Also install cockpit-machines

© 2018,2019 IBM Corporation 58