Securing RDMA for High-Performance Datacenter Storage Systems
Anna Kornfeld Simpson, Adriana Szekeres (Paul G. Allen School of Computer Science & Engineering, University of Washington), Jacob Nelson, Irene Zhang (Microsoft Research)
1 Remote Direct Memory Access (RDMA) does CPU-bypass over the datacenter network with only a few microseconds of latency
RDMA over Converged Ethernet (RoCEv2) packet. Source: RoCEv2 spec, Infiniband Trade Association, 2014 Ethernet Head. IP Head. UDP Head. RDMA Head. & Data
Queue Pair Info. Remote Memory Addr. and r_key Payload Abstracted RDMA portion of RoCEv2 packet.
2 Example RDMA System: Pilaf (2013): Put (SEND)
Clients Server
CPU Memory
3
2 1 4 NIC
5
3 Pilaf (2013): Unlike Put, Get is CPU-bypassing
4 RDMA not designed for datacenter security needs Security weaknesses discovered over past 2 decades (see Section 2 of paper for citations): • Confidentiality: packet in plaintext • Integrity: no packet integrity check or authentication • Availability: denial of service • Side channels: non-random r_keys and more
5 We analyzed recent distributed storage systems built on RDMA and discovered additional systems design challenges even after security fundamentals are fixed.
• Can RDMA-based storage systems provide security at least as good as pre-RDMA datacenter security best practices?
• We analyzed: Pilaf, FaRM, HERD, DrTM, FaSST, Octopus, Hyperloop, DrTM+H
6 Threat Model = Compromised Storage Client Clients Server CPU Memory
Bad()
NIC
VLANs/virtualization does not help! Compromised client only needs to see its own network traffic to spoof RDMA.
7 Challenge 1: no auditability/logging on reads
Clients Server CPU Memory
2 What data was exfiltrated? 1 NIC 3 Adversary does CPU-bypassing READ
8 Challenge 2: Design Implications of Storage Logic Location: RPC and Concurrency
5
4 6 DrTM (2016) Put
9 Challenge 3: Separating different users’ data
• Single big remote memory registration -> attacker access to all user data • Vendor suggested solution (protection domains) a poor performance fit for storage systems with multiple storage clients who all want to access same data
10 Ingredients for more secure CPU-bypass systems
Security Fundamentals System Design Challenges • High throughput AEAD • Logging strategy that does not cryptography for rely on client datapath (e.g. DTLS) • Alternatives to unreliable RPC • Centralized key • Finer-grained permissions on management remote data access
Source: Zookeeper Project
11 Lots of big open questions for future research!
• Wishlist for features to help support application security when building systems that use CPU-bypassing RDMA? • Wishlist for securing non-user-facing datacenter tasks? • How do we get these better features baked in? Changing the RDMA standard?
Thank you for watching! Questions? Email Anna: [email protected]
12