<<

Kunpeng BoostKit for

Technical White Paper

Issue 06 Date 2021-03-23

HUAWEI TECHNOLOGIES CO., LTD.

Copyright © Huawei Technologies Co., Ltd. 2021. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. i Kunpeng BoostKit for Virtualization Technical White Paper Contents

Contents

1 Solution Overview...... 1 2 Solution Architecture...... 2 3 Virtualization Architecture...... 4 3.1 Open-Source KVM Virtualization Solution...... 4 3.1.1 Architecture...... 4 3.1.2 Typical Configuration...... 6 3.1.3 Component Principles...... 6 3.2 Open-Source oVirt and KVM Solution...... 8 3.2.1 Architecture...... 8 3.2.2 Typical Configuration...... 10 3.2.3 Component Principles...... 12 3.3 Open-Source OpenStack and KVM Solution...... 14 3.3.1 Architecture...... 14 3.3.2 Typical Configuration...... 16 3.3.3 Component Principles...... 19 3.4 Open-Source and Container Solution...... 22 3.4.1 Architecture...... 22 3.4.2 Typical Configuration...... 24 3.4.3 Component Principles...... 26 3.5 Open-Source Open vSwitch and Huawei-developed XPF Acceleration Solution...... 28 3.5.1 Architecture...... 28 3.5.2 Typical Configuration...... 30 3.5.3 Component Principles...... 31 3.6 Open-Source Open vSwitch SR-IOV Hardware Offload and Acceleration Solution...... 33 3.6.1 Architecture...... 33 3.6.2 Typical Configuration...... 35 3.6.3 Component Principles...... 35 4 Advantages...... 36 5 Solution Networking...... 38 6 Feature List...... 42 7 Compatibility...... 45

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. ii Kunpeng BoostKit for Virtualization Technical White Paper Contents

8 Process...... 46 9 Reference...... 47 A Change History...... 48

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. iii Kunpeng BoostKit for Virtualization Technical White Paper 1 Solution Overview

1 Solution Overview

Nowadays, transformation from using digital technologies to a digital enterprise is a challenge for enterprises in all industries around the world. Application modernization is the core of digital transformation. It helps enterprises attract customers, enable employees, optimize operations, and improve products. As the IT infrastructure of digital transformation, technologies have developed rapidly in recent years. Enterprises benefits greatly from the development of cloud computing technologies such as virtualization, cloud services, and containerization in their digital transformation. The continuous innovation of cloud computing relies largely on the rapid development of open- source technologies and ecosystems. Open-source cloud computing technologies such as QEMU-KVM, OpenStack, Docker, and Kubernetes have broken the siloed computing architecture that was once closed and inefficient, continuously enriching IT infrastructure, making user applications evolve towards more agile and efficient, and accelerating digital transformation. The Kunpeng BoostKit for Virtualization provides the following features to accelerate the implementation of cloud computing: ● Unbinding x86 servers to provide more computing platform options and reduce service continuity risks ● Multi-core processor architecture that features higher density, lower power consumption, higher cloud infrastructure computing capability, and lower total cost of ownership (TCO). ● Replacement of the cloud computing infrastructure platform without affecting user experience ● Hybrid deployment of x86 and Kunpeng for better flexibility and scalability This document describes the architecture, application scope, software and hardware, and typical configuration of the Kunpeng virtualization solution.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 1 Kunpeng BoostKit for Virtualization Technical White Paper 2 Solution Architecture

2 Solution Architecture

The Kunpeng BoostKit for Virtualization consists of the hardware infrastructure, , cloud platform, and cloud management cluster platform. The cloud platform supports HUAWEI CLOUD Stack (HCS) private cloud platform and open-source QEMU-KVM and Docker container platforms. The cloud management cluster platform supports the open-source OpenStack, oVirt and Kubernetes platforms.

Figure 2-1 shows the overall architecture of the Kunpeng BoostKit for Virtualization.

Figure 2-1 Overall architecture of the Kunpeng BoostKit for Virtualization

Table 2-1 Components of the Kunpeng BoostKit for Virtualization

Name Description

Infrastructure TaiShan 200 servers (model 2280 or 5280) powered by Kunpeng processors

OS ● Open-source CentOS 7.6 ● EulerOS 2.8 for HCS commercial use

Cloud platform Open-source QEMU-KVM, Docker container platform, and HCS

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 2 Kunpeng BoostKit for Virtualization Technical White Paper 2 Solution Architecture

Name Description

Guest OS CentOS 7.6, Ubuntu 16.04, SLES 15.1, and Kylin V7.6 on VMs

Cluster management Open-source OpenStack and Kubernetes management platform platforms

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 3 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

3 Virtualization Architecture

3.1 Open-Source KVM Virtualization Solution 3.2 Open-Source oVirt and KVM Solution 3.3 Open-Source OpenStack and KVM Solution 3.4 Open-Source Kubernetes and Docker Container Solution 3.5 Open-Source Open vSwitch and Huawei-developed XPF Acceleration Solution 3.6 Open-Source Open vSwitch SR-IOV Hardware Offload and Acceleration Solution

3.1 Open-Source KVM Virtualization Solution

3.1.1 Architecture The open-source KVM virtualization solution applies to offline virtualization scenarios, including single-node, two-node high availability (HA), and multi-node cluster scenarios. VM migration and HA are used to ensure service reliability. Typical applications include databases, web servers, and cache servers. ● Single-Node Scenario Analysis In a single-node system, the QEMU-KVM open-source software is used on a single server. The virt-manager management software and virsh commands are used for out-of-band VM management. Both of them invoke libvirt APIs. The VNC software is used for in-band guest OS management. The local LVM virtual storage pool is used for VM storage. The VM network uses bridges (bridge mode) or physical NICs (host-only mode). ● Two-Node and Cluster Scenario Analysis The configurations of the two-node and cluster scenarios are based on that of the single-node system scenario. The computing virtualization, storage, and network configurations are the same as those of the single-node scenario. The difference is that the HA or live migration technology can be used to ensure cluster robustness in the two-node cluster and cluster scenarios. The following uses the Keepalived+LVS+MySQL two-node master-slave architecture as an example. Keepalived provides a floating IP address and

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 4 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

periodically checks the health status of servers in the cluster. When a faulty node is detected, a switchover is triggered. The architecture of the open-source KVM virtualization scenario is divided into three layers. The bottom layer is the TaiShan server hardware, and the middle layer is the host Linux kernel and the KVM virtualization software. The top layer is the QEMU, which virtualizes I/O devices. Figure 3-1 shows the detailed system architecture.

Figure 3-1 Open-source KVM virtualization architecture

Table 3-1 Components in the open-source KVM virtualization scenario Name Description

KVM The KVM is a kernel feature of the host Linux OS. It supports simulation of the CPU, memory, and I/O. The KVM is used as the and works with QEMU to virtualize KVM VMs.

QEMU QEMU runs in the user mode on the host as a process. Based on the KVM and kernel features, QEMU simulates hardware such as the CPU, memory, and I/O to support running of the guest OS in a process.

libvirt The libvirt library provides APIs for Linux virtualization functions. Virtualization management services, such as virt- manager, manage and monitor VMs by using libvirt.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 5 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Name Description

Virtual VMs are server resources provided for users. Guest OSs can be Machine installed on VMs. The supported guest OSs include CentOS 7.6, SLES 15.1, Ubuntu 16.04, and Kylin 7.6. User applications run on guest OSs.

3.1.2 Typical Configuration Table 3-2 describes the typical compute node configuration in the open-source KVM virtualization scenario.

Table 3-2 Typical compute node configuration for open-source KVM VMs Node Typical Configuration Qua Remarks Type ntity

Comp Dual-socket rack servers - The number of compute ute ● 2 x Huawei Kunpeng 920 nodes is calculated based node processors (5250/7260 on the number of VMs. processors are recommended) ● 256 GB or larger memory ● 2 x 480 GB SSDs ● 6 x 600 GB SAS HDDs (six or more) ● Avago SAS3508 RAID controller cards ● 1 x 1822 NIC ● An independent power supply

3.1.3 Component Principles

KVM KVM uses features of the Linux kernel to implement functions such as CPU virtualization, memory virtualization, peripheral virtualization, and VM management. It is a bridge for software to simulate hardware and mode conversion. Figure 3-2 shows the KVM architecture.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 6 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-2 KVM architecture

KVM is a hypervisor that runs in the host OS kernel. It simulates the CPU, memory, and I/O, monitors VMs, and provides entity support for the QEMU. KVM has the following advantages: ● High compatibility ● Easy memory management. A VM is a process. ● High performance. The code resources of the KVM and OS kernel can be directly invoked. ● Focusing on virtualization. Hardware is fully utilized to support virtualization. ● Good scalability. Native memory management and multi-processor functions provided by the Linux kernel are used. ● Simplicity. Currently, KVM I/O virtualization is implemented by using QEMU, which significantly reduces the implementation workload.

QEMU QEMU is a simulator that provides a hardware environment for running VMs. It runs in the user mode on the host as a process. Based on the KVM and kernel features, QEMU simulates hardware such as the CPU, memory, and I/O to support running of the guest OS in a process. QEMU supports full system emulation, full virtualization, and paravirtualization. Full system simulation and full virtualization are pure software simulation. A process constructed by a CPU can run on another CPU. The difference is that full system simulation allows the entire system to be simulated, including CPUs and peripheral devices, whereas in paravirtualization mode, QEMU and KVM are used together to simulate the CPU in KVM hardware-

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 7 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

assisted virtualization mode. Th paravirtualization mode is efficient and is commonly used. Figure 3-3 shows the QEMU architecture.

Figure 3-3 QEMU architecture

QEMU works with KVM. KVM runs in kernel mode to simulate CPUs and memory. QEMU runs in user mode to simulate I/O devices and present virtual devices to external systems. QEMU provides the following functions for KVM: ● QMP interfaces for interaction with the upper-layer libvirt ● Virtual devices simulation ● ioctl interface for interaction with the lower-layer KVM

3.2 Open-Source oVirt and KVM Solution

3.2.1 Architecture The oVirt and KVM solution is a combination of TaiShan 200 servers and open- source OVirt and KVM. This solution resolves the problem of adapting the oVirt virtualization software to TaiShan servers. oVirt is an open-source virtualization management platform that allows centralized management of VMs and computing, storage, and network resources from a platform-independent front-end that runs a web browser. Main components include the oVirt engine, VDSM, KVM-based VMs, storage, and database. For details, see Table 3-3. Figure 3-4 shows the oVirt and KVM deployment architecture.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 8 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-4 oVirt and KVM deployment architecture

Table 3-3 Component description Name Description

Browser A browser can be used to log in to the administrator portal or user portal. ● The administrator portal is a website based on UI application programming on the engine and is used by the system administrator to perform advanced operations. ● The user portal is a website based on UI application programming and is used to manage simple scenarios.

oVirt Engine ● Support for multiple virtual data centers and multi-cluster management ● Support for different storage architectures (FC-SAN, IP-SAN, local storage, and NFS) ● Hyper-converged deployment architecture ● Unified management of the virtual computing, storage, and VM networks ● VM live migration and storage live migration ● High availability upon physical host breakdown ● Load balancing resource scheduling policy

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 9 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Name Description

VDSM The VDSM performs operations related to oVirt Engine requests. Its functions are as follows: ● Starting and registering nodes ● Managing the VM operations and life cycles ● Managing the networks ● Managing the storage ● Monitoring and reporting the status of hosts and VMs ● Externally interfering with VMs ● Combining memory and storage and enabling memory and storage overcommitment The VDSM system is designed based on the principles of high availability, high scalability, cluster security, backup and restoration, and performance optimization.

Guest agent A guest agent runs on a VM, provides the oVirt engine with the VM resource usage information, and communicates with the oVirt engine through the virtual serial connection.

PostgreSQL The oVirt engine uses PostgreSQL persistence for data storage.

3.2.2 Typical Configuration Table 3-4 shows the typical configuration of the open-source oVirt and KVM solution.

Table 3-4 Typical configuration of open-source oVirt and KVM Node Typical Configuration Qua Remarks Type ntity

Mana TaiShan 200 servers (model 2280) Two A management node can geme ● Huawei Kunpeng 920 5250 or or be self-hosted on a nt 7260 processors more compute node. node ● 256 GB or larger memory ● 2 x 480 GB SATA SSDs ● 3508/3516 RAID controller cards ● 1 x 1822 10GE NIC (There are four network ports, and every two of them are bound together to form a logical port.) ● An independent power supply

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 10 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Node Typical Configuration Qua Remarks Type ntity

Comp TaiShan 200 servers (model 2280) Two ● The cluster scale is ute ● Huawei Kunpeng 920 5250 or or calculated based on node 7260 processors more the VM scale. ● 256 GB or larger memory ● 2 x 10GE: network service + management ● 2 x 480 GB SATA SSDs ● 2 x 10GE: network ● 3508/3516 RAID controller cards storage ● 1 x 1822 10GE NIC (There are four network ports, and every two of them are bound together to form a logical port.) ● An independent power supply

Stora Hot data: TaiShan 200 servers Calc The calculation formula is ge (model 2280) ulate as follows in terms of the node ● Huawei Kunpeng 920 5250 or d data volume: (NFS 7260 processors base Number of nodes = or d on ● 128 GB or larger memory Planned data volume x local the 1.5 (Data expansion rate) stora ● 2 x 480 GB SATA SSDs data x 1 (Data compression ge) ● 12 x 3.2 TB NVMe SSDs volu rate) x 3 (three me ● 2 x 25GE NICs (LOMs) (Every copies)/0.8 (Disk two network ports are bound utilization)/0.9 (Disk together to form a logical port number system and two logical ports are conversion)/(12 (Number obtained. The two logical ports of disks) x 3.2 TB (Disk are connected to the public capacity)) network and cluster network respectively.) ● An independent power supply

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 11 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Node Typical Configuration Qua Remarks Type ntity

Warm data: TaiShan 200 servers Calc The calculation formula is (model 2280) ulate as follows in terms of the ● Huawei Kunpeng 920 5250 or d data volume: 7260 processors base Number of nodes = d on ● 192 GB or larger memory Planned data volume x the 1.5 (Data expansion rate) ● 2 x 480 GB SATA SSDs data x 1 (Data compression ● 2 x 3.2 TB NVMe SSDs for volu rate) x 3 (three acceleration me copies)/0.8 (Disk ● 12 x 8 TB HDDs utilization)/0.9 (Disk number system ● 2 x 25GE/10GE NICs (LOMs) conversion)/(12 (Number (Every two network ports are of disks) x 8 TB (Disk bound together to form a capacity)) logical port and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

Cold data: TaiShan 200 servers Calc The calculation formula is (model 5280) ulate as follows in terms of the ● Huawei Kunpeng 920 5230 d data volume: processors base Number of nodes = d on ● 128 GB or larger memory Planned data volume x the 1.5 (Data expansion rate) ● 2 x 480 GB SATA SSDs data x 1 (Data compression ● 36 x 8 TB HDDs volu rate) x 3 (three me ● 2 x 25GE/10GE NICs (LOMs) copies)/0.8 (Disk (Every two network ports are utilization)/0.9 (Disk bound together to form a number system logical port and two logical conversion)/(36 (Number ports are obtained. The two of disks) x 8 TB (Disk logical ports are connected to capacity)) the public network and cluster network respectively.) ● An independent power supply

3.2.3 Component Principles oVirt Engine

The oVirt engine runs on JBoss-based Java applications. The service communicates with the VDSM on the host to deploy, start, stop, migrate, and monitor VMs. In addition, the service can create new images on the storage by using templates.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 12 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

The oVirt engine uses a scalable, secure, and high-performance architecture to provide centralized management for large-scale servers and . oVirt provides the following functions: ● VM life cycle management ● Network management: adding logical networks to the host ● Storage management: managing storage domains (NFS, iSCSI and local storage) and VM disks. ● High availability: automatically restarting VMs of a faulty node on another node. ● Live migration: migrating running VMs between nodes without interrupting services ● System scheduling: performing load balancing for VMs based on resource usage or policies ● Energy saving: deploying VMs on a few servers during off-peak hours ● Maintenance manager: ensuring that VMs are not stopped during the planned maintenance period ● Image management: template-based configuration, thin provisioning, and snapshots ● Monitoring: monitoring all objects in the system, such as VMs, nodes, networks, and storage ● Importing and exporting: importing and exporting a VM or template by using an OVF file Figure 3-5 shows the oVirt engine component architecture.

Figure 3-5 oVirt engine component architecture

oVirt Node An oVirt node is a compute node where VMs run. For details about how to install a common TaiShan 200 server as a compute node in the oVirt virtual environment, see the Kunpeng oVirt Lightweight Virtualization Management Platform Deployment Guide (CentOS 8.1). The VDSM on an oVirt node functions as the agent of the oVirt engine. The VDSM manages all resources in the oVirt virtual environment and performs client

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 13 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

operations. A VDSM command is run on each compute node. After receiving the instruction from the client, the VDSM invokes the libvirt underlying tool library to manage VMs and hardware devices. QEMU supports the display of SPICE drivers. Therefore, clients can use SPICE client software to access VMs in graphical mode. The Storage Pool Manager (SPM) is a role given to one of the hosts in the data center enabling it to manage the storage domains of the data center. Any host in the data center can function as an SPM. The system grants the role to one of the hosts in the data center. The SPM does not affect the normal functions of the host. The host running as the SPM can still provide virtual resources for running VMs. Figure 3-6 shows the oVirt node architecture.

Figure 3-6 oVirt node architecture

3.3 Open-Source OpenStack and KVM Solution

3.3.1 Architecture The Kunpeng virtualization OpenStack+KVM solution is a combination of TaiShan 200 servers and open-source OpenStack and KVM. This solution resolves the adaptation problem of open-source OpenStack and KVM virtualization software on TaiShan servers. In addition, this solution optimizes performance based on open-source software and provides performance optimization guidance for customers. OpenStack is an open-source project based on a community. It provides an operation platform and tool set for cloud deployment. It aims to help

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 14 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

organizations run clouds that provide virtual computing or storage services and provide scalable and flexible cloud computing for public and private clouds. OpenStack includes the following open-source components: Nova, Cinder, Neutron, Glance, Swift, Placement, Keystone, Horizon, Heat, and Ceilometer. Figure 3-7 shows the OpenStack system architecture.

Figure 3-7 OpenStack component architecture

Table 3-5 Nodes in the OpenStack cloud scenario Name Description

Nova Nova is the core component of OpenStack and manages VM computing resources including the CPU and memory resources. When a VM creation request is received, Nova filters computing resources based on the computing resource requirements in the request, selects the resources that can be used to create VMs, sorts the resources based on a certain policy, and selects certain computing resources for VM creation.

Cinder Cinder provides block storage resources for VMs. The OS operates drives by block. The OS divides a drive into blocks (clusters) to read and write the drive. Cinder uses drivers to manage, read, and write the physical storage media, and provides unified iSCSI storage (unified drive volumes) for VMs. VMs mount the unified drive volumes provided by Cinder.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 15 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Name Description

Neutron Neutron provides network resources for VMs. To implement communication and isolation between VMs and the external network, information such as IP addresses, routes, and VLANs need to be configured so that a grouped packet data forwarding channel that supports isolation, switching, and routing is set up on the network. Neutron configures interfaces, VLANs, and routes on hosts, switches, and routers to support VM data forwarding according to VM requirements.

Glance Glance manages VM images and allows users to query, register, upload, obtain, and delete VMs.

Swift Swift is an object storage component of OpenStack. It can interconnect with Ceph to provide object storage for OpenStack.

Horizon Horizon provides a WebUI for almost all components. OpenStack components support only CLI commands. Horizon provides the graphical encapsulation function.

Keystone Keystone provides the authentication function. In OpenStack, Keystone authenticates all operations of components that require authentication.

Ceilometer Ceilometer provides resource metering and monitoring services.

Heat Heat provides the service orchestration function.

3.3.2 Typical Configuration Table 3-6 describes the configuration of each component in the OpenStack scenario.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 16 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Table 3-6 Typical configuration in the OpenStack scenario Node Typical Configuration Qua Remarks Type ntity

Mana TaiShan 200 servers (model 2280) Thre PM: physical machines geme ● Huawei Kunpeng 920 5250 e or VM: virtual machines nt processors more node PM ≤ 50 and VM ≤ 500: 3 ● 256 GB or larger memory servers ● 2 x 480 GB SATA SSDs PM ≤ 100, VM ≤ 1000: 4 ● Avago SAS3508 RAID controller servers cards PM ≤ 200 and VM ≤ 2000: ● 1 x 1822 10GE NIC (four 7 servers network ports) PM ≤ 500 and VM ≤ 5000: ● An independent power supply 10 servers PM ≤ 1000, VM ≤ 10,000: 12 servers

Netw TaiShan 200 servers (model 2280) Two Used only in the software ork ● Huawei Kunpeng 920 5230 or SDN scenario. PM ≥ 200: node processors more four or more (recommended) ● 256 GB or larger memory 2 x 10GE: network service ● 2 x 1.2 TB SATA SSDs 2 x GE: network ● Avago SAS3508 RAID controller management cards ● 2 x 1822 10GE NICs (four network ports) ● 2 x GE LOMs ● An independent power supply

Comp TaiShan 200 servers (model 2280) Two ● The cluster scale is ute ● Huawei Kunpeng 920 5250 or or calculated based on node 7260 processors more the VM scale. ● 256 GB or larger memory ● 2 x 10GE: network service + management ● 2 x 480 GB SATA SSDs ● 2 x 10GE: network ● Avago SAS3508 RAID controller storage cards ● 1 x 1822 10GE NIC (There are four network ports, and every two of them are bound together to form a logical port.) ● An independent power supply

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 17 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Node Typical Configuration Qua Remarks Type ntity

Distri Hot data: TaiShan 200 servers Calc The calculation formula is buted (model 2280) ulate as follows: Number of stora ● Huawei Kunpeng 920 5250 d nodes = Planned data ge processors base volume x 1.5 (Data node d on expansion rate) x 1 (Data (Ceph ● 128 GB or larger memory the compression rate) x 3 ) ● 2 x 480 GB SATA SSDs data (Three copies)/0.8 (Disk ● 12 x 3.2 TB NVMe SSDs volu utilization)/0.9 (Disk me number system ● 2 x 25GE NICs (LOMs) (Every conversion)/(12 (Number two network ports are bound of disks) x 3.2 TB (Disk together to form a logical port capacity)) and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

Warm data: TaiShan 200 servers Calc The calculation formula is (model 2280) ulate as follows: Number of ● Huawei Kunpeng 920 5230 d nodes = Planned data processors base volume x 1.5 (Data d on expansion rate) x 1 (Data ● 192 GB or larger memory the compression rate) x 3 ● 2 x 480 GB SATA SSDs data (Three copies)/0.8 (Disk ● 2 x 3.2 TB NVMe SSDs for volu utilization)/0.9 (Disk acceleration me number system conversion)/(12 (Number ● 12 x 8 TB HDDs of disks) x 8 TB (Disk ● 2 x 25GE/10GE NICs (LOMs) capacity)) (Every two network ports are bound together to form a logical port and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 18 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Node Typical Configuration Qua Remarks Type ntity

Cold data: TaiShan 200 servers Calc The calculation formula is (model 5280) ulate as follows: Number of ● Huawei Kunpeng 920 5230 d nodes = Planned data processors base volume x 1.5 (Data d on expansion rate) x 1 (Data ● 128 GB or larger memory the compression rate) x 3 ● 2 x 480 GB SATA SSDs data (Three copies)/0.8 (Disk ● 36 x 8 TB HDDs volu utilization)/0.9 (Disk me number system ● 2 x 25GE/10GE NICs (LOMs) conversion)/(36 (Number (Every two network ports are of disks) x 8 TB (Disk bound together to form a capacity)) logical port and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

3.3.3 Component Principles

Nova Nova is a tool for deploying the cloud and provides functions including running instances, managing the network, controlling users, and controlling other projects' access to the cloud. Nova defines drivers that interact with underlying virtualization mechanisms that run on your host operating system, and exposes functionality over a web-based API. Nova provides API services for external systems, provides an endpoint for all API queries, initializes most deployment activities, and implements certain policies. Nova supports single-node, two-node, and multi-node deployment. Multi-node deployment is recommended. Nova is a message-based architecture without sharing. Therefore, each nova-service can be installed on an independent server, but the dashboard must be installed on the nova-api server. Figure 3-8 shows the position of Nova in the OpenStack framework.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 19 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-8 Nova component architecture

The Nova services can be deployed on control nodes and compute nodes. The nova-api provides web REST API services. The nova-scheduler selects the nodes to run VMs based on the CPU and memory weights. The nova-database is the database operation agent. The nova-compute manages the VM life cycles.

Cinder

Cinder is a block storage service that provides persistent block storage services for VMs. Figure 3-9 shows the architecture of Cinder.

● Cinder Client encapsulates REST APIs provided by Cinder, so that users can access these APIs in CLI mode. ● Cinder API exposes REST APIs, parses operation requests, and routes APIs to handle the requests. It provides functions such as adding, deleting, modifying, and querying volumes (creating volumes from existing volumes, images, or snapshots); adding, deleting, modifying, querying, and backing up snapshots; managing volume types; attaching and detaching volumes. ● Cinder Scheduler collects capacity and capability information reported by the backend, and completes scheduling from volumes to specified cinder-volumes based on preset algorithms. ● Cinder Volumes are deployed on multiple nodes. Different configuration files are used to enable connection to different backend devices. Storage vendors insert driver codes to enable interaction with storage devices to collect capacity and capability information and perform volume operations. ● Cinder Backup backs up volume data to other storage media. ● SQL DB provides data such as storage volumes, snapshots, backups, and services and supports SQL databases including MySQL and PostgreSQL.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 20 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-9 Cinder component architecture

Neutron Neutron is a virtual network management system that provides REST APIs, DHCP, and L2 network services. It uses a plug-in architecture to support network devices and network technologies from various vendors. Figure 3-10 shows the Neutron architecture. ● neutron-server Provides the REST API service and uses the relational database at the backend. ● Message Queue neutron-server uses message queues to exchange messages with other Neutron agents, but does not use message queues to exchange messages with other OpenStack components such as Nova. ● L2 Agent It is used to connect ports and devices so that they are in a shared broadcast domain. L2 agents usually run on the hypervisor. ● DHCP agent Used to configure the network of the host of VMs. In some cases (for example, the config drive command is used to configure the VM host), the DHCP agent may not be required. . ● L3 Agent Used to connect the tenant network to the data center or Internet. In real deployment environments, multiple L3 agents run at the same time.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 21 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-10 Neutron component architecture

3.4 Open-Source Kubernetes and Docker Container Solution

3.4.1 Architecture

The Kubernetes+Docker solution is a combination of TaiShan 200 servers and open-source Kubernetes and Docker. This solution resolves the adaptation problem of open-source Kubernetes and Docker virtualization software on TaiShan servers. In addition, this solution tunes performance based on open-source software and provides performance tuning guidance for customers.

LXC (Linux Containers) is an OS-level virtualization method for running multiple isolated containers on a control host using a single Linux kernel. The functionality and namespace isolation functionality are used to achieve low host resource usage and high startup speed. Docker is a Linux container engine technology that enables application packaging and quick deployment. Docker uses Linux Container technology to turn applications into standardized, portable, and self-managed components, enabling the "build once" and "run everywhere" features of applications. Features of Docker technology include: quick application release, easy application deployment and capacity expansion, high application density, and easy application management.

Kubernetes groups Docker container host machines into a cluster for unified resource scheduling, automatic container lifecycle management, and cross-node service discovery and load balancing. It provides better support for the microservice concept and division of the boundaries between services, such as the introduction of concepts of label and pod.

A Kubernetes cluster consists of at least one cluster master and multiple nodes. It features lightweight architecture, easy migration, quick deployment, plug-ins, and scalability. Figure 3-11 shows the architecture.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 22 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-11 Kubernetes and Docker container architecture

Table 3-7 Nodes in the Kubernetes and Docker container scenario

Name Description

Pod Pods are the smallest deployable units of containers that can be created, scheduled, and managed in Kubernetes. A pod is a collection of containers rather than a single application container. A pod is a group of one or more containers. Pods are always co-located and co-scheduled, and run in a shared context. Containers within the same pod share the same network namespace, IP address, port space, and volume. Pods are short-lived applications. Pods remain on the nodes where they are scheduled until being deleted.

API server Exposes Kubernetes APIs. No matter whether the kubectl or API is invoked to operate various resources of the Kubernetes cluster, the operations are performed through the interfaces provided by the kube-apiserver.

kube- Manages the entire Kubernetes and ensures that all resources controller- in the cluster are in the expected state. When the status of a manager resource in the cluster is abnormal, the management controller triggers the corresponding scheduling operation. The management controller consists of the following parts: ● Node controller ● Replication controller ● Endpoints controller ● Namespace controller ● Service accounts controller

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 23 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Name Description

Scheduler Schedules the Kubernetes cluster, receives scheduling operation requests triggered by the kube-controller-manager, performs scheduling calculation based on the request specifications, scheduling constraints, and overall resource status, and sends scheduling tasks to the kubelet component of the target node.

etcd etcd is an efficient KV storage system used to share configurations and discover services. It features distribution and strong consistency. It is used for storing all data that needs to be persisted in Kubernetes.

kubelet kubelet is the most important core component on a node. It is responsible for computing tasks of the Kubernetes cluster and performs the following functions: ● Monitors task assignment of kube-scheduler. ● Mounts volumes for pod containers. ● Downloads secrets for pod containers. ● Runs the Docker container by interacting with the Docker daemon. ● Periodically performs container health check. ● Monitors and reports pod status to kube-controller- manager. ● Monitors and reports node status to kube-controller- manager.

Kube-proxy kube-proxy forwards service requests from services to pod instances and manages load balancing rules.

3.4.2 Typical Configuration Table 3-8 describes the configuration of each component in the Kubernetes and Docker container scenario.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 24 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Table 3-8 Typical configuration in the Kubernetes and Docker container scenario Node Typical Configuration Qua Remarks Type ntity

Mana TaiShan 200 servers (model 2280) Thre ● Used to deploy core geme ● Huawei Kunpeng 920 3226 e or Kubernetes nt processors more components, including node kube-apiserver and ● 256 GB or larger memory kube-controller- ● 2 x 480 GB SATA SSDs manager. ● Avago SAS3508 RAID controller ● Used to deploy HA and cards load balancing ● 1 x 1822 10GE NIC components such as Keepalived, HAProxy, ● An independent power supply and Nginx. ● Supports deployment of etcd and SWR. ● Supports deployment of ElasticSearch and monitoring, log, and alarm components.

Conta TaiShan 200 servers (model 2280) Two ● The cluster scale is iner ● Huawei Kunpeng 920 4826 or or calculated based on node 6426 processors more the container scale. ● 256 GB or larger memory ● 2 x 10GE: storage ● 2 x 480 GB SATA SSDs ● 2 x 10GE: management +service ● Avago SAS3508 RAID controller cards ● 2 x 10GE NICs ● An independent power supply

Distri Hot data: TaiShan 200 servers Calc The calculation formula is buted (model 2280) ulate as follows: Number of stora ● Huawei Kunpeng 920 4826 d nodes = Planned data ge processors base volume x 1.5 (Data node d on expansion rate) x 1 (Data (Ceph ● 128 GB or larger memory the compression rate) x 3 ) ● 2 x 480 GB SATA SSDs data (Three copies)/0.8 (Disk ● 12 x 3.2 TB NVMe SSDs volu utilization)/0.9 (Disk me number system ● 2 x 25GE NICs (LOMs) (Every conversion)/(12 (Number two network ports are bound of disks) x 3.2 TB (Disk together to form a logical port capacity)) and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 25 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Node Typical Configuration Qua Remarks Type ntity

Warm data: TaiShan 200 servers Calc The calculation formula is (model 2280) ulate as follows: Number of ● Huawei Kunpeng 920 3226 d nodes = Planned data processors base volume x 1.5 (Data d on expansion rate) x 1 (Data ● 192 GB or larger memory the compression rate) x 3 ● 2 x 480 GB SATA SSDs data (Three copies)/0.8 (Disk ● 2 x 3.2 TB NVMe SSDs for volu utilization)/0.9 (Disk acceleration me number system conversion)/(12 (Number ● 12 x 8 TB HDDs of disks) x 8 TB (Disk ● 2 x 25GE/10GE NICs (LOMs) capacity)) (Every two network ports are bound together to form a logical port and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

Cold data: TaiShan 200 servers Calc The calculation formula is (model 5280) ulate as follows: Number of ● Huawei Kunpeng 920 3226 d nodes = Planned data processors base volume x 1.5 (Data d on expansion rate) x 1 (Data ● 128 GB or larger memory the compression rate) x 3 ● 2 x 480 GB SATA SSDs data (Three copies)/0.8 (Disk ● 36 x 8 TB HDDs volu utilization)/0.9 (Disk me number system ● 2 x 25GE/10GE NICs (LOMs) conversion)/(36 (Number (Every two network ports are of disks) x 8 TB (Disk bound together to form a capacity)) logical port and two logical ports are obtained. The two logical ports are connected to the public network and cluster network respectively.) ● An independent power supply

3.4.3 Component Principles

Kubernetes Kubernetes is an open-source container cluster deployment and management system and has become a de facto standard in the PaaS field. Kubernetes is developed based on container technology and provides functions like resource scheduling, deployment, execution, service discovery, and capacity expansion and

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 26 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

reduction for containerized applications. Kubernetes is the scheduling platform indeed, which is based on container technologies. Figure 3-12 shows the architecture. Kubernetes provides the following functions: ● Supports container-based application deployment, maintenance, and rolling upgrade. ● Supports load balancing and service discovery. ● Supports cluster scheduling cross different machines and areas. ● Supports auto scaling. ● Supports stateless and stateful services. ● Supports a wide range of volumes. ● Ensures plug-in mechanism scalability. Kubernetes develops rapidly and has become a leader in the field of container orchestration.

Figure 3-12 Kubernetes component architecture

Docker A Docker container is a process on the host OS. Docker uses namespaces and Cgroups to isolate and restrict resources. Namespaces in the Linux kernel are used to implement lightweight virtualization services. Processes in the same namespace can sense the changes of each other but are unaware of external processes. Users, hostnames, domain names, semaphores, networks, file systems, and processes in the same namespaces can be isolated. Cgroups can limit and record physical resources (including CPU, memory, and I/O resources) used by task groups to provide a basis for container virtualization. Cgroups provides the following functions: ● Resource limit: Cgroups can limit the total amount of resources used by tasks. For example, the upper limit of the memory used during running can be set.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 27 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

● Priority allocation: controls the allocation of resource priorities based on the allocated CPU time slice and I/O bandwidth. ● Resource statistics: Cgroups can collect statistics on system resource usage such as CPU usage duration and memory usage. ● Task control: Cgroups can suspend and resume tasks.

Figure 3-13 Docker container architecture

Docker uses the typical C-S architecture. A user uses a Docker client to communicate with a Docker daemon and sends a request to the Docker daemon. The Docker client is used to send a request to the Docker daemon and perform container management operations. It can be a Docker command line tool or a Docker API client. The Docker daemon is the core Docker background process. It responds to requests from the Docker client and translates these requests into system calls to complete container management operations. The Docker image management module is Image Management. It downloads images from the Docker Registry and stores the images in the file system. The Docker network can be a host physical network, virtual bridge network, or overlay network.

3.5 Open-Source Open vSwitch and Huawei-developed XPF Acceleration Solution

3.5.1 Architecture Open vSwitch (OVS) is a solution that combines open-source software and Huawei-developed XPF and applies to public and private cloud scenarios. Open vSwitch is an excellent open-source software switch that supports mainstream switch functions, such as Layer 2 switching, network isolation, QoS, and traffic monitoring. It supports OpenFlow, which defines flexible data packet processing specifications. The solution provides the L1–L4 packet processing capability for users. OVS supports multiple Linux virtualization technologies, including , KVM, and VirtualBox. Based on open-source software, Huawei-developed XPF combines multiple types of flow table action sets. This can reduce the number of query times and greatly improves packet forwarding performance in the connection tracking scenario. Figure 3-14 shows the architecture of the open-source Open vSwitch and Huawei- developed XPF acceleration solution. Table 1 describes the components.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 28 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-14 Open-source Open vSwitch and Huawei-developed XPF solution architecture

Table 3-9 Components of the open-source Open vSwitch and Huawei-developed XPF solution Name Description

NIC The NICs send and receive packets.

QEMU QEMU runs in the user mode on the host as a process. Based on the KVM and kernel features, QEMU simulates hardware such as the CPU, memory, and I/O to support running of the guest OS in a process.

DPDK The Data Plane Development Kit (DPDK) provides a set of data plane libraries and network interface controller polling-mode drivers for offloading TCP packet processing from the OS kernel to processes running in user space. In other words, it is a software library used to accelerate packet data processing.

ovs-vswitchd It is an OVS daemon and a core component of OVS. It works with the Linux kernel compatibility module to implement flow- based switching. It communicates with the upper-layer controller through the OpenFlow protocol, communicates with the ovsdb-server through the OVSDB protocol, and communicates with the kernel module through Netlink.

OpenFlow OpenFlow separates the data layer from the control layer. The OpenFlow switches forward data at the data layer, and the controllers implement the functions of the control layer. The OpenFlow component in figure 1 mainly implements the control layer.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 29 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Name Description

XPF It is a Huawei-developed function module, which implements an intelligent offload engine module in the OVS software. This module is used to trace all flow tables and CT tables of data packets in the OVS software. This way, it comprehensively orchestrates the executed CT behavior and all flow table behavior items into a comprehensive behavior item, and generates an integrated flow entry with reference to the unified matching item. After subsequent data packets enter the OVS, if the packets match the integrated flow table, the comprehensive behavior is directly executed. Compared with the open-source processing flow, the number of query times is reduced, and the performance is greatly improved.

OVSDB It is a lightweight database developed for OVS. It stores various configuration information (such as bridge and port information) about OVS.

3.5.2 Typical Configuration Table 3-10 shows the typical configuration of the open-source Open vSwitch and Huawei-developed XPF acceleration solution.

Table 3-10 Typical compute node configuration for open-source KVM VMs Node Typical Configuration Qua Remarks Type ntity

Comp Dual-socket rack servers - OVS is deployed on ute ● 2 x Huawei Kunpeng 920 compute nodes. node processors (5250/7260 processors are recommended) ● 256 GB or larger memory ● 2 x 480 GB SSDs ● 6 x 600 GB SAS HDDs (six or more) ● Avago SAS3508 RAID controller cards ● 1 x 1822 NIC ● An independent power supply

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 30 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

3.5.3 Component Principles

DPDK DPDK is a data plane development tool set provided by Intel. It provides library functions and drivers for efficient data packet processing in the user space. In other words, it is a software library used to accelerate packet data processing. The DPDK focuses on high-performance processing of data packets in network applications. A DPDK application runs in the user space and uses the data plane library provided by the DPDK to send and receive data packets, bypassing the Linux kernel protocol stack. It is not a complete product with which users can build applications directly, and does not contain tools that need to interact with the control layer (including the kernel and protocol stack). Compared with native Linux, the Intel DPDK technology greatly improves the IPv4 forwarding performance and enables users to obtain better cost and performance advantages when migrating package processing applications. In addition, different services, such as application processing, control processing, and packet processing services, can be deployed on a unified platform. Figure 3-15 shows the DPDK architecture. ● Pool Mode Driver (PMD): improves the efficiency of sending and receiving data frames by using the non-interruption mechanism and the zero-copy mechanism for data frames entering and leaving the application buffer. ● Flow classification: provides an optimized search algorithm for N-tuple matching and longest prefix matching (LPM). ● Ring queue: provides a lock-free mechanism for the ingress and egress queues of a single or multiple data packet producers and a single data packet consumer, effectively reducing system overheads. ● MBUF buffer management: allocates memory to create buffers, creates MBUF objects, and encapsulates actual data frames for applications to use. ● Environment Abstract Layer (EAL): initializes the PMD, configures and binds the CPU cores and DPDK threads, and sets the hugepage memory.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 31 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-15 DPDK architecture

OVS

OVS is a product-level virtual switch that is widely used in the production environment to support the running of the entire data center virtual network. Based on the software-defined networking (SDN) concept, OVS divides the entire core architecture into the control plane and data plane. The data plane is responsible for data switching, and the control plane implements switching policies to guide the data plane. The OVS can be divided into three planes: management plane, data plane, and control plane. The data plane consists of the forwarding modules which mainly include ovs-vswitchd in user mode and datapath in kernel mode, and the associated ovsdb-server database module. The control plane is mainly implemented by the ovs-ofctl module, which interacts with the data plane based on the OpenFlow protocol. The management plane is implemented by various tools provided by the OVS. These tools are also provided to facilitate control and management of each module at the bottom layer and improve user experience.

● ovs-ofctl: ovs-ofctl is a module on the control plane. It is also a management tool that monitors and manages OpenFlow switches based on OpenFlow. It displays the status of an OpenFlow switch, including functions, configurations, and entries in the table. ● ovs-dpctl: configures the datapath kernel module of a switch. It can be used to create, modify, and delete datapath. Generally, there are 256 datapaths (numbered from 0 to 255) on a single machine. One datapath corresponds to one virtual network device. This tool can also collect statistics on the traffic passing through the device on each datapath and print flow information. ● ovs-appctl: queries and controls the running OVS daemon processes, including ovs-switchd, datapath, and OpenFlow controller. It has the functions of ovs- ofctl and ovs-dpctl and is a powerful command. After ovs-vswitchd and

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 32 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

similar processes are started, they run as daemons. This command enables users to control these processes better. ● ovs-vsctl: queries and updates the configuration of ovs-vswitchd. It is also a powerful command that implements commands related to bridges, ports, and protocols. In addition, it is responsible for database operations related to ovsdb-server. ● ovsdb-client: It is a client program that accesses ovsdb-server and performs database operations through ovsdb-server. ● ovsdb-tool: Different from ovsdb-client that requires ovsdb-server to perform database operations, ovsdb-tool can directly perform database operations. Figure 3-16 shows the OVS component architecture.

Figure 3-16 OVS component architecture

3.6 Open-Source Open vSwitch SR-IOV Hardware Offload and Acceleration Solution

3.6.1 Architecture The Open vSwitch single-root I/O virtualization (SR-IOV) hard offload solution applies to public and private cloud scenarios. Open vSwitch is an excellent open- source software switch that supports mainstream switch functions, such as Layer 2 switching, network isolation, QoS, and traffic monitoring. It supports OpenFlow, which defines flexible data packet processing specifications and provides the L1–L4 packet processing capability. OVS supports multiple Linux virtualization technologies, including Xen, KVM, and VirtualBox. With hardware offload in SR- IOV mode, flow tables can be offloaded from Open vSwitch to the NICs. The NICs search for and forward packets, and the forwarded packets are sent and received in the VMs directly. This greatly improves the packet search and forwarding speed and improves network performance.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 33 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

Figure 3-17 shows the Open vSwitch SR-IOV hardware offload framework.

Figure 3-17 Framework of Open-source Open vSwitch hardware offload in SR-IOV mode

Table 3-11 Components used for open-source Open vSwitch hardware offload in SR-IOV mode Name Description

Intelligent NIC The intelligent NICs send, receive, search for, and forward packets. After the NICs search for and forward packets, the VMs can directly receive the packets, greatly improving the forwarding performance.

QEMU QEMU runs in the user mode on the host as a process. Based on the KVM and kernel features, QEMU simulates hardware such as the CPU, memory, and I/O to support running of the guest OS in a process.

ovs-vswitchd It is an OVS daemon and a core component of OVS. It works with the Linux kernel compatibility module to implement flow- based switching. It communicates with the upper-layer controller through the OpenFlow protocol, communicates with the ovsdb-server through the OVSDB protocol, and communicates with the kernel module through Netlink.

OpenFlow OpenFlow separates the data layer from the control layer. The OpenFlow switches forward data at the data layer, and the controllers implement the functions of the control layer. The OpenFlow component in Figure 3-17 mainly implements the control layer.

OVSDB It is a lightweight database developed for OVS. It stores various configuration information (such as bridge and port information) about OVS.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 34 Kunpeng BoostKit for Virtualization Technical White Paper 3 Virtualization Architecture

3.6.2 Typical Configuration Table 3-12 describes the typical configuration of the open-source Open vSwitch SR-IOV hardware offload and acceleration solution.

Table 3-12 Typical compute node configuration for open-source KVM VMs Node Typical Configuration Qua Remarks Type ntity

Comp ● Dual-socket rack servers - OVS is deployed on ute ● 2 x Huawei Kunpeng 920 compute nodes. node processors (5250/7260 processors are recommended) ● 256 GB or larger memory ● 2 x 480 GB SSDs ● 6 x 600 GB SAS HDDs (six or more) ● Avago SAS3508 RAID controller cards ● 1 x Mellanox CX5 NIC ● An independent power supply

3.6.3 Component Principles

Intelligent NICs In addition to receiving and sending packets, intelligent NICs also store and forward flow tables as follows: ● Store the flow tables offloaded from the software and provide the flow table management function. ● Search the flow tables and forward the packets to the corresponding VMs based on the flow table rule. The VMs directly receive the messages from the intelligent NICs for communication. ● Report to the kernel if no matching flow table is found. After the matching is complete, the matched flow tables are offloaded to the NICs.

OVS For details, see OVS.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 35 Kunpeng BoostKit for Virtualization Technical White Paper 4 Advantages

4 Advantages

Based on TaiShan 200 servers, the Kunpeng BoostKit for Virtualization provides a full stack of hardware, OSs, virtualization software, and OpenStack cloud management software, with in-depth performance optimization implemented for the Kunpeng architecture. The Kunpeng BoostKit for Virtualization provides better performance and VM density than those of competitors' solutions. Performance optimization includes hardware parameter optimization, operating system kernel optimization, and virtualization optimization. The solution delivers the following benefits:

High Performance The Kunpeng BoostKit for Virtualization has in-depth virtualization performance improvement by utilizing the multi-core performance advantages of Kunpeng processors. ● In full load scenarios, the TaiShan 200 server (Huawei Kunpeng 920 5250 processor x 2) delivers higher performance compared with mainstream high- end dual-socket servers in the industry. ● In comprehensive load scenarios, the TaiShan 200 server (Huawei Kunpeng 920 5250 processor x 2) delivers higher performance compared with mainstream high-end dual-socket servers in the industry. The Kunpeng BoostKit for Virtualization optimizes performance in terms of hardware parameters, OS kernel, and virtualization platform. Figure 4-1 describes the optimization items.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 36 Kunpeng BoostKit for Virtualization Technical White Paper 4 Advantages

Figure 4-1 Virtualization performance optimization items

Open Ecosystem Huawei Kunpeng BoostKit for Virtualization has an open software ecosystem: ● Open-source KVM, Docker components, and OpenStack and Kubernetes cloud management platforms ● HUAWEI CLOUD Stack (HCS) private cloud

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 37 Kunpeng BoostKit for Virtualization Technical White Paper 5 Solution Networking

5 Solution Networking

The system networking of the Kunpeng BoostKit for Virtualization consists of the management plane and service plane, as shown in Figure 5-1.

Figure 5-1 System networking of the Kunpeng BoostKit for Virtualization

The network of the Kunpeng BoostKit for Virtualization consists of the management plane and service plane. If big data services are deployed on the cloud, the big data service cluster is generally deployed independently. OpenStack components and BMC out-of-band management are deployed in the management zone. Compute nodes, storage nodes, bare metal server (BMS) nodes, and physical gateways are deployed in the service zone.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 38 Kunpeng BoostKit for Virtualization Technical White Paper 5 Solution Networking

Table 5-1 BMC access network configuration

Port Type Access Description Network

GE electrical Managem Connected to the management top-of-rack (TOR) port ent switch. network

Table 5-2 Management node network configuration

Port Type Access Description Network

2 x 10GE Managem Connected to the management TOR switch. The optical ports ent two ports form a logical bond port in active/ network standby mode.

2 x 10GE Storage Connected to the storage TOR switch. The two optical ports network ports form a logical bond port in active/standby mode.

The following table shows an alternative management node network configuration.

Port Type Access Description Network

2 x 10GE Managem Connected to the management TOR switch. The optical ports ent two ports form a logical bond port in active/ network standby mode. +storage network

Table 5-3 Network node network configuration

Port Type Access Description Netwo rk

2 x GE Manag Connected to the management TOR switch. The two optical ports ement ports form a logical bond port in active/standby networ mode. k

2 x 10GE Service Connect to the integrated access TOR switch and optical ports networ transmit vRouter traffic. The two ports form a logical k bond port in Link Aggregation Control Protocol (LACP) mode.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 39 Kunpeng BoostKit for Virtualization Technical White Paper 5 Solution Networking

Port Type Access Description Netwo rk

2 x 10GE Service Connect to the integrated access TOR switch and optical ports networ transmit Load Balancer as a Service (LBaaS) traffic. k The two ports form a logical bond port in LACP mode.

2 x 10GE Service Connect to the integrated access TOR switch and optical ports networ transmit Linux virtual server (LVS) traffic. The two k ports form a logical bond port in active/standby mode.

2 x 10GE Service Connect to the integrated access TOR switch and optical ports networ transmit Nginx traffic. The two ports form a logical k bond port in active/standby mode.

Table 5-4 Compute node network configuration Port Type Access Description Netwo rk

2 x 10GE Manag Connected to the compute TOR switch. The two ports optical ports ement form a logical bond port in active/standby mode. networ k +servic e networ k

2 x 10GE Storage Connected to the compute TOR switch. The two ports optical ports networ form a logical bond port in active/standby mode. k

Table 5-5 Storage node network configuration Port Type Access Description Netwo rk

2 x 10GE Manag Connected to the compute TOR switch. The two ports optical ports ement form a logical bond port in active/standby mode. networ k +servic e networ k

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 40 Kunpeng BoostKit for Virtualization Technical White Paper 5 Solution Networking

The following table shows an alternative storage node network configuration.

Port Type Access Description Netwo rk

2 x 25GE Manag Connected to the compute TOR switch. The two ports optical ports ement form a logical bond port in active/standby mode. networ k +servic e networ k

Table 5-6 BMS node network configuration Port Type Access Description Netwo rk

2 x 10GE Manag Connected to the compute TOR switch. The two ports optical ports ement form a logical bond port in active/standby mode. networ k +servic e networ k

2 x 10GE Storage Connected to the compute TOR switch. The two ports optical ports networ form a logical bond port in active/standby mode. k

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 41 Kunpeng BoostKit for Virtualization Technical White Paper 6 Feature List

6 Feature List

Feature Sub- Description Constraints Featu re

Performan 2 x KVM VM and Docker container - ce Kunp performance benchmarked against eng mainstream high-end dual-socket 920 servers in the industry (20 cores, 2.4 5250 GHz): 45% higher in the full load proce scenario and 9% higher in the ssors comprehensive load scenario

2 x KVM VM and Docker container - Kunp performance benchmarked against eng mainstream high-end dual-socket 920 servers in the industry (20 cores, 2.5 5250 GHz): 35% higher in the full load proce scenario and the same in the ssors comprehensive load scenario

2 x KVM VM and Docker container - Kunp performance benchmarked against eng mainstream high-end dual-socket 920 servers in the industry (24 cores, 3.0 5250 GHz): 8% higher in the full load proce scenario and 12% lower in the ssors comprehensive load scenario

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 42 Kunpeng BoostKit for Virtualization Technical White Paper 6 Feature List

Feature Sub- Description Constraints Featu re

OpenStack Hybri Supports hybrid deployment for 1. The (Stein) d BMSs and KVM VMs in availability management Hybrid deplo zones (AZs). Each AZ must contain and control deploymen ymen only x86 or Kunpeng servers. For nodes do not t t of details, see OpenStack Stein VM support hybrid BMSs and BMS Hybrid Deployment deployment. and Guide (CentOS 7.6). 2. Kunpeng and VMs x86 servers are deployed hybridly in AZs.

Kubernetes Hybri Supports Hybrid deployment of The Kubernetes (1.15.1) d Kunpeng and x86 servers for Docker management Hybrid deplo containers. For details, see Hybrid nodes do not deploymen ymen Container Deployment Guide support hybrid t t of (CentOS 7.6). deployment. contai ners

VM oVirt Supports oVirt use on the Kunpeng - lightweigh (4.4.0 platform. For details, see Kunpeng t ) oVirt Lightweight Virtualization manageme light Management Platform nt weigh Deployment Guide (CentOS 8.1). software t mana geme nt

Virtualized OVS Supports OVS (2.12) software 1. It applies to the network (2.12) offload and acceleration on the VM OVS acceleratio softw Kunpeng platform. For details, see scenario. n are XPF User Guide. 2. Only cluster offloa type (CT) d services are accelerated.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 43 Kunpeng BoostKit for Virtualization Technical White Paper 6 Feature List

Feature Sub- Description Constraints Featu re

Performan Light- Supports virtual hyper-threading 1. It applies to the ce loade and virtual-physical collaborative light-loaded improveme d VM scheduling. VMs scenario in nt of light- V- which the CPU loaded Turbo usage of the VMs techn physical ology machine is less than 50%. 2. This feature requires CPU overcommitmen t.

Virtualizati Mem Supports the memory interleaving The performance of on ory and guest NUMA features. the VMs using the resource interl two features is fragment eavin lower than that of reduction g the VMs using the NUMA feature. Guest NUM A

Hardware Functi Supports the Kunpeng Accelerator - accelerator on Engine (KAE) on the KVM VMs and virtualizati Docker containers. on Comp Supports web Nginx applications - onent and Ceph applications. suppo rt

Encry Supports RSA, AES, SM3, SM4, zlib, - ption and gzip. algori thms

Perfor Provides 20% higher performance - manc than software encryption. e

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 44 Kunpeng BoostKit for Virtualization Technical White Paper 7 Software Compatibility

7 Software Compatibility

Use the Compatibility Checker to obtain information about the software supported by the Kunpeng BoostKit for Virtualization.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 45 Kunpeng BoostKit for Virtualization Technical White Paper 8 Process

8 Process

Figure 8-1 shows the end-to-end process for porting, deploying, and tuning the Kunpeng virtualization solution.

Figure 8-1 End-to-end process for porting, deploying, and tuning the Kunpeng virtualization solution

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 46 Kunpeng BoostKit for Virtualization Technical White Paper 9 Reference

9 Reference

● Information and figures in the official QEMU, libvirt, OpenStack, and Ceph websites are referenced in sections 3.1.1 Architecture and 3.1.2 Typical Configuration. – QEMU website: https://www.qemu.org/ – libvirt website: https://libvirt.org/ – OpenStack website: https://www.openstack.org/ – Ceph website: https://ceph.io/ ● Information and figures in the official Kubernetes and Docker websites are referenced in section 3.1.3 Component Principles. – Kubernetes website: https://kubernetes.io/ – Docker website: https://www.docker.com/

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 47 Kunpeng BoostKit for Virtualization Technical White Paper A Change History

A Change History

Date Description

2021-03-23 This issue is the sixth official release, which incorporates the following change: Changed the solution name from "Kunpeng virtualization solution" to "Kunpeng BoostKit for Virtualization".

2021-01-07 This issue is the fifth official release, which incorporates the following change: ● Added 3.6 Open-Source Open vSwitch SR-IOV Hardware Offload and Acceleration Solution. ● Added the open-source oVirt platform to the cloud cluster management platforms in 2 Solution Architecture.

2020-10-24 This issue is the fourth official release, which incorporates the following change: ● Added 3.2 Open-Source oVirt and KVM Solution and 3.5 Open-Source Open vSwitch and Huawei-developed XPF Acceleration Solution. ● Added the following features in 6 Feature List: VM lightweight management software, virtualized network acceleration, improved light-loaded VM performance, and reduced virtualization resource fragments.

2020-09-21 This issue is the third official release. Changed the solution name from "Kunpeng cloud platform solution" to "Kunpeng virtualization solution".

2020-07-03 This issue is the second official release. Modified the hybrid deployment features in 6 Feature List.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 48 Kunpeng BoostKit for Virtualization Technical White Paper A Change History

Date Description

2020-06-10 This issue is the first official release.

Issue 06 (2021-03-23) Copyright © Huawei Technologies Co., Ltd. 49