Securing RDMA for High-Performance Datacenter Storage Systems

Anna Kornfeld Simpson, Adriana Szekeres (Paul G. Allen School of Computer Science & Engineering, University of Washington), Jacob Nelson, Irene Zhang ( Research)

1 Remote (RDMA) does CPU-bypass over the datacenter network with only a few microseconds of latency

RDMA over Converged (RoCEv2) packet. Source: RoCEv2 spec, Infiniband Trade Association, 2014 Ethernet Head. IP Head. UDP Head. RDMA Head. & Data

Queue Pair Info. Remote Memory Addr. and r_key Payload Abstracted RDMA portion of RoCEv2 packet.

2 Example RDMA System: Pilaf (2013): Put (SEND)

Clients Server

CPU Memory

3

2 1 4 NIC

5

3 Pilaf (2013): Unlike Put, Get is CPU-bypassing

4 RDMA not designed for datacenter security needs Security weaknesses discovered over past 2 decades (see Section 2 of paper for citations): • Confidentiality: packet in plaintext • Integrity: no packet integrity check or authentication • Availability: denial of service • Side channels: non-random r_keys and more

5 We analyzed recent distributed storage systems built on RDMA and discovered additional systems design challenges even after security fundamentals are fixed.

• Can RDMA-based storage systems provide security at least as good as pre-RDMA datacenter security best practices?

• We analyzed: Pilaf, FaRM, HERD, DrTM, FaSST, Octopus, Hyperloop, DrTM+H

6 Threat Model = Compromised Storage Client Clients Server CPU Memory

Bad()

NIC

VLANs/virtualization does not help! Compromised client only needs to see its own network traffic to spoof RDMA.

7 Challenge 1: no auditability/logging on reads

Clients Server CPU Memory

2 What data was exfiltrated? 1 NIC 3 Adversary does CPU-bypassing READ

8 Challenge 2: Design Implications of Storage Logic Location: RPC and Concurrency

5

4 6 DrTM (2016) Put

9 Challenge 3: Separating different users’ data

• Single big remote memory registration -> attacker access to all user data • Vendor suggested solution (protection domains) a poor performance fit for storage systems with multiple storage clients who all want to access same data

10 Ingredients for more secure CPU-bypass systems

Security Fundamentals System Design Challenges • High AEAD • Logging strategy that does not cryptography for rely on client datapath (e.g. DTLS) • Alternatives to unreliable RPC • Centralized key • Finer-grained permissions on management remote data access

Source: Zookeeper Project

11 Lots of big open questions for future research!

• Wishlist for features to help support application security when building systems that use CPU-bypassing RDMA? • Wishlist for securing non-user-facing datacenter tasks? • How do we get these better features baked in? Changing the RDMA standard?

Thank you for watching! Questions? Email Anna: [email protected]

12