Accelerating Virtual Server with OpenNPU Gilad Ben-Yossef Principal Software Architect NetDev 1.2 October 2016 How it all started 辻斬り

Tsujigiri ( 辻斬り or 辻斬 tsuji-giri, literally crossroads killing) is a Japanese term for a practice when a samurai, after receiving a new katana or developing a new fighting style or weapon, tests its effectiveness by attacking a human opponent, usually a random defenseless passer-by, in many cases during nighttime. https://en.wikipedia.org/wiki/Tsujigiri

© 2015 Mellanox Technologies - Mellanox Confidential - 2 Linux Virtual Server

What is LVS? LVS (Linux Virtual Server) implements transport-layer load balancing inside the Linux kernel, so called Layer-4 switching.

LVS running on a host acts as a load balancer at the front of a cluster of real servers, it can direct requests for TCP/UDP based services to the real servers, and makes services of the real servers to appear as a virtual service on a single IP address.

LVS has been in active use for 14 years “Yesterday at DockerCon Europe, Andrey Sibiryov, a senior engineer at Uber Technologies, demonstrated how to improve load-balancing performance using an open-source technology that’s been part of the Linux kernel for more than a decade — IPVS.”

“Wikimedia uses LVS for balancing traffic over multiple servers, see also load balancing architecture”

© 2015 Mellanox Technologies - Mellanox Confidential - 3 NPS-400: a Network Processor

. NPS-400 is a Network Processor • Think “GPU, but for networking” • An NPU let’s you program your network by writing a program that processes packets at data center line rates . NPUs used to be part of the secret sauce of carrier equipment • e.g. NP-5, NPS predecessor, is part of the Cisco ASR-9K service router shown here • These programmable devices were “buried” inside proprietary silos . We are bringing them into the open and into the data center • White box systems from MLNX and ODM • OpenNPU – Open Source (GPL v2/BSD) SDK

© 2015 Mellanox Technologies - Mellanox Confidential - 4 Accelerated Linux Virtual Server

ALVS is LVS with the data path running on a network processor. Same program, 400 GBPS performance

KeepAliveD management & control IP-A Control & Configuration Up to 400 Gbps of A Load Balancer requests traffic VIP LVS Linux kernel data path pass through NPS based load balancer VIP NPS State & Counters ALVS NPS data path IP-B B VIP

Decision taken on flow establishmen t for flow ToR Response traffic assignment pass directly from IP- to server WAN server, so not C Router limited by NPS VIP bandwidth

© 2015 Mellanox Technologies - Mellanox Confidential - 5 NPS-400 Main Features

. 400 Gbps line rate • 600Mpps wire speed with up to 960 Gbps oversubscription DDR DDR DDR DDR DDR DDR . Hardware Traffic Manager

• 1M queues, 5-level H-QoS MC MC MC Stat TCAM Stat MC MC MC . 960Gbps of network I/O • Including10GE, 40GE, 100GE, 400G NDMA NPC NPC NPC NPC NDMA . 256 CTOP cores – 4,096 CPU (SMT threads) MAC PMU NPC NPC NPC NPC PMU MAC • Specialized instruction set for network processing IFU IFU • Runs SMP Linux (we’re upstream) BMU NPC NPC NPC NPC BMU ICU ICU . Hardware acceleration engines PCie TM NPC NPC NPC NPC TM PCIe • Crypto (180 Gbps of IPsec), buffer allocations

• Network order engines, DPI, TCAM MC MC MC Stat TCAM Stat MC MC MC . Commodity DDR (96 GB) • Unlimited tables, states, counters at wire-speed performance . C on Linux programmable DDR DDR DDR DDR DDR DDR • Not an ASIC controlled by Linux, it is a processor that runs Linux

© 2015 Mellanox Technologies - Mellanox Confidential - 6 LVS to ALVS Software Migration

LVS Reflector Supplied by Management tools, Management Mellanox tools, config., Daemon configuration, etc. etc. EZcp IPVS Data Plane LINUX Kernel Processing IPVS Data Plane LINUX Kernel Processing LINUX Kernel NPU

x86 x86 NPS

© 2015 Mellanox Technologies - Mellanox Confidential - 7 Detailed Software Architecture

Linux ALVS Daemon

Listen to IPVS, FDB and Arp KeepAliveD Daemons messages EZcp

NETLINK IPVS and FDB control & Update NPS control tables config messages ip_vs_forward_ Ipvs+out Ip_vs_post_routing via EZcp interface over PCIe icmp (return LVS-NAT) (LVS-NAT only)

Network PREROUTING Route FORWARD POSTROUTING Network

Route

LOCAL_IN ip_vp_in LOCAL_OUT Synchronize IPVS state over Ethernet via IPVS HA SYNC messages Linux Kernel Data Path Local Process Local Process

IPVS State IPVS Config FDB NPS Data Path

Classify IPVS IPv4 Route LAG

Punt

SFT FrameLib

EZdp

© 2015 Mellanox Technologies - Mellanox Confidential - 8 Minimal Viable Product

. Minimal • Single forwarding mode out of 3 • Three algorithms out of 10 • TCP/IPv4 (will add SCTP, UDP and IPv6 later) . Viable • LVS look and feel - Same API, same CLI , same log mechanism - Integrates with unmodified management plane • Robust - Resilient. Cover the corner cases . Testing already revealed one bug in LVS itself… - Supports passive/active fail over • Product - x400 Performance - Scales with your ToR switch

© 2015 Mellanox Technologies - Mellanox Confidential - 9 ALVS Test Setup

IXIA Load Balancer 100 Gb/s port Client Side 2 service (VIP) Each service with 5 servers IXIA simulates a lot of clients IXIA (large range of IP/port) VIP 100 Gb/s port Server Side

Test limited by testing equipment scale

© 2015 Mellanox Technologies - Mellanox Confidential - 10 Performance

Criteria Lab test Simulation 25% capacity 100% capacity

Concurrent connections 30 M 128 M (200 M) Connection setup rate 1 M/s 3 M/s Requests bandwidth 75 Gbps 400 Gbps

© 2015 Mellanox Technologies - Mellanox Confidential - 11 Connecting NPUs to Linux networking stack

.It’s a useful thing to do • If you need an L4 load balancer and love LVS, running it at 400 Gbps / 200 M connection on an open source platform is useful .We need to put the low level NPU driver into the kernel • Since NPU is a programmable entity possibly remoteproc subsystem is the right way .We need to figure out how to hook NPU into network stack • Switchdev? XDP? Something else?

© 2015 Mellanox Technologies - Mellanox Confidential - 12 Vision Architecture

VNFs Layers 2 - 7 OPNFV g-API Remote

NOS User Layers 2 - 7

Open Network Services Interface API

Kernel Open NPU Control API Linux net stack OpenNPU API “The CUDA of NPUs” Layers 2 - 7 NPS driver switchdev

Mellanox provided Middleware Commercial third party data plane NPS Custom data Stateful connection tracking, DPI Application plane Recognition, Crypto NPS

Open NPU Data Plane API

Linux

© 2015 Mellanox Technologies - Mellanox Confidential - 13 Connecting NPUs to Linux networking stack – cont.

.The ALVS data path program ended up very different than IPVS code • The architecture of an NPU is very different than a CPU + NIC - HW engines for packet scheduling, order restoration, memory architecture that does not rely on caches - Program ended up very similar in design to Google Maglev, with HW engines taking place of some of the code blocks • This has implications on ideas such as using eBPF/XDP to bring NPU into kernel - Yes, you can run the eBPF bytecode, but the program is written under different assumption .We ran into networking stack scaling issues when trying to synchronize state with the NPU • IPVS slowed down to a crawl way before we reached 30 M flows • What does it mean when NPU slave device can hold more state than the OS on the host?

© 2015 Mellanox Technologies - Mellanox Confidential - 14 Thank you!

Thank You

http://www.opennpu.org

ALVS: https://github.com/Mellanox/ALVS

© 2015 Mellanox Technologies - Mellanox Confidential - 15