RDMA Device Isolation Dror Goldenberg, Parav Pandit - Mellanox Technologies

ISC Container Workshop Frankfurt, June 2019

© 2019 Mellanox Technologies 1 RDMA Device Access Paths

interface rdma ▪ RDMA Connection Manager (CM) rds tool ▪ Verbs (via uverbs char device) nvme ▪ Rdmatool (via netlink sockets) fabrics ▪ Resource, connection information rdma uverbs ▪ Sysfs file interface rdmacm device device ▪ Counters nfs- ▪ Network addresses (IP/GID/LID) rdma Kernel ▪ More.. RDMA umad, ▪ UMAD char device subsystem issm ▪ MAD packets ▪ Device, address information device rdma device

© 2019 Mellanox Technologies 2 RDMA Device - The Need for Isolation

▪ Device cgroup - char devices ACL sysfs ▪ Too coarse for network level interface rdma ▪ RDMA cgroup - # of resources rds tool ▪ Does not control the network access nvme ▪ RDMA is yet another network device fabrics rdma uverbs Need to protect RDMA devices device device rdmacm ▪ At the network level nfs- ▪ In a reliable, unified, deterministic way rdma Kernel ▪ Fit in existing orchestration frameworks RDMA umad, (CNI, device plugin…) subsystem issm ▪ Future proof for new apps, interfaces, device ▪ Backward compatible rdma device

© 2019 Mellanox Technologies 3 RDMA Devices in Net Namespace ▪ Isolation ring to access the RDMA device sysfs ▪ Use existing net namespace of kernel interface rdma rds tool ▪ RDMA network namespace modes ▪ Exclusive or shared nvme ▪ Via netlink fabrics rdma uverbs ▪ Default as shared mode (backward compatible) rdmacm device device nfs- rdma ▪ RDMA device associated with net namespace umad, ▪ New netlink command net namespace = foo issm device ▪ Integrates with CNI and device plugin of K8s, Docker network plugin extension

rdma device

net namespace = bar

© 2019 Mellanox Technologies 4 RDMA Devices in Network Namespaces

net namespace net namespace net namespace Kubernetes/ = A = B = C Docker Pod1/ Pod2/ Pod3/ Container1 Container2 Container3

SR-IOV SR-IOV Device SR-IOV CNI ibdev=mlx5_1 ibdev=mlx5_2 ibdev=mlx5_3 operator Plugin netdev= netdev= netdev= ib1/eth1 ib2/eth2 ib3/eth3

PF VF-1 VF-2 VF-3

Mellanox ConnectX Adapter Card with SR-IOV Enabled ▪ Every container/POD has an IB device (mlx5_1,2,3) and netdevice ▪ Isolation is done on the net namespace level © 2019 Mellanox Technologies 5 Additional Information…

▪ Examples ▪ Query, Change RDMA subsystem mode ▪ $ rdma system show ▪ $ rdma system set netns exclusive

▪ Move RDMA device to new network namespace ▪ $ ip netns add foo ▪ $ rdma dev set mlx5_1 netns foo

▪ Current status (6/15/2019) ▪ Merged to upcoming 5.2 and /rdma tool ▪ Merged to netlink golang

▪ Ahead of us ▪ Integrate to docker sr-iov plugin ▪ Integrate to SR-IOV operator and CNI plugin

© 2019 Mellanox Technologies 6 Thank You

© 2019 Mellanox Technologies 7