ReDMArk: Bypassing RDMA Security Mechanisms

Benjamin Rothenberger,∗ Konstantin Taranov,∗ Adrian Perrig, and Torsten Hoefler Department of Computer Science, ETH Zurich

Abstract Current RDMA technologies include multiple plaintext State-of-the-art remote (RDMA) tech- access tokens to enforce isolation and prevent unauthorized nologies such as InfiniBand (IB) or RDMA over Converged access to system memory. As these tokens are transmitted (RoCE) are becoming widely used in data center in plaintext, any entity that obtains or guesses them can read applications and are gaining traction in cloud environments. and write memory locations that have been exposed by using Hence, the security of RDMA architectures is crucial, yet po- RDMA on any machine in the network, compromising not tential security implications of using RDMA communication only secrecy but also integrity of applications. To avoid com- remain largely unstudied. ReDMArk shows that current se- promise of these access tokens, RDMA architectures rely on curity mechanisms of IB-based architectures are insufficient isolation and the assumption that the underlying network is a against both in-network attackers and attackers located on well-protected resource. Otherwise, an attacker that is located end hosts, thus affecting not only secrecy, but also integrity of on the path between two communicating parties (e.g., bugged RDMA applications. We demonstrate multiple vulnerabilities wire or malicious switch) can eavesdrop on access tokens of in the design of IB-based architectures and implementations bypassing packets. of RDMA-capable network interface cards (RNICs) and ex- Unfortunately, encryption and authentication of RDMA ploit those vulnerabilities to enable powerful attacks such as packets (e.g., as proposed by Taranov et al. [36]) is not part of packet injection using impersonation, unauthorized memory current RDMA specifications. While IPsec transport recently access, and Denial-of-Service (DoS) attacks. To thwart the dis- became available for RoCE traffic, the IPsec standard does covered attacks we propose multiple mitigation mechanisms not support InfiniBand traffic. Furthermore, application-level that are deployable in current RDMA networks. encryption (e.g., based on TLS) is not possible since RDMA operations can be handled without involvement of the CPU. 1 Introduction As TLS cannot support purely one-sided communication rou- tines, the applications would need to store packets in a buffer In recent years, numerous state-of-the-art systems started to before decryption, completely negating RDMA’s performance leverage remote direct memory access (RDMA) primitives as advantages. We discuss these potential mitigation techniques a communication mechanism that enables high performance to secure RDMA in more detail in §7.3. guarantees and resource utilization. Deployments in public In this work, we analyze current security mechanisms of clouds, such as Azure and IBM Cloud, are becom- RDMA architectures based on InfiniBand (IB) such as native ing available and an increasing number of systems make use InfiniBand and RDMA over converged Ethernet (RoCE) ver- of RDMA for high-performance communication [8,11,18,28]. sions 1 and 2. ReDMArk reveals multiple vulnerabilities and However, the design of RDMA architectures is mainly fo- flaws in the design of InfiniBand, but also in implementations cused on performance rather than security. Despite the trend of several RDMA-capable network interface cards (RNICs) of using RDMA, potential security implications and dangers by Mellanox and Broadcom. These vulnerabilities enable that might be involved with using RDMA communication in powerful attacks on RDMA networks, such as unauthorized upper layer protocols remain largely unstudied. For example, memory access or breaking of existing connections based on RFC 5042 [30] analyzes basic security issues and potential packet injection. To show the feasibility of the discovered at- attacks in RDMA-based implementations, but lacks an in- tacks in practice, we implemented an attack framework, that is depth analysis of state-of-the-art RDMA architectures and able to inject bogus packets into the network and impersonate implementations. other endpoints to corrupt the memory state of remote end- ∗These authors contributed equally to this work. points. For each of the discovered attacks, we discuss potential

1 Routing Base Transport RDMA long-term mitigation mechanisms. In addition, we propose Payload Checksums short-term mitigations that can be deployed in today’s RDMA Header Header Header networks before the long-term mitigations become available. Queue Pair Number Two integrity Finally, we assess the vulnerability of open-source systems Packet Sequence Number checksums InfiniBand: IB Routing Header Target virtual address that rely on RDMA for high-performance communication RoCEv1: Ethernet + IB RH Memory key (rkey) against the discovered attacks. RoCEv2: Ethernet + UDP/IP Data Length Figure 1: General format of an RDMA packet. 2 Remote Direct Memory Access RDMA enables direct data access on remote machines across a network. Memory accesses are offloaded to dedicated hard- invalid QPN or PSN are dropped without any notification to ware and can be processed without involvement of the CPU the receiving application. (and context switches). Using RDMA read and write requests To detect errors that may have been introduced during the application data is read from/written to a remote memory ad- transmission, each packet contains two checksums that are dress and directly delivered to the network, reducing latency checked by the receiving node. The checksum algorithms are and enabling fast message transfer. RDMA can also enable defined in the IB specification and use pre-defined seeds. one-sided operations, where the CPU at the target node is not In addition to packet integrity checks, IBA defines three notified of incoming RDMA requests. memory protection mechanisms to restrict unauthorized ac- Even though several network architectures support RDMA, cess to local memory by remote entities: Memory Regions, in this work we focus on the most widely used interconnects Memory Windows, and Protection Domains (PD) [37]. These for RDMA: InfiniBand (IB) [3] and RDMA over Converged mechanisms allow enforcing memory access restrictions (e.g., Ethernet (RoCE) [4]. InfiniBand is a network architecture the application allows reads, but no writes to a memory re- specifically designed to enable reliable RDMA and defines its gion). own hardware and protocol specification. RoCE is an exten- sion to Ethernet to enable RDMA over an Ethernet network Memory Regions. To access host memory, the RNIC first and exists in two versions. RoCEv1 uses the IB routing header, allocates the memory region, which involves copying page whereas RoCEv2 uses UDP/IP for routing. Even though this table entries of the corresponding memory to the memory work focuses on IBA and RoCE, the proposed attacks could management unit of the RNIC. Then, the RNIC creates a also be extended to other RDMA architectures. memory region to enforce access restrictions to the memory such as read-only, write-only, or local-only. Memory regions can also be reregistered to change its properties or deregis- 2.1 RDMA packet format tered to destroy its memory mappings. The RDMA packet header consists of a routing header and For each memory region RNIC generates two keys for local a base transport header (see Figure1). The routing header and remote access, namely lkey and rkey. To remotely access a contains the source and destination ports, that identify link memory location using RDMA read or write operations, each layer endpoints. The IB protocol uses the IB link layer proto- packet must include a virtual address and its associated rkey col as a data link, whereas RoCE relies on Ethernet. RoCEv1 as depicted in Figure1. The rkeys are not used in any form of encapsulates an IB packet, including its IB routing header, cryptographic computation, but used as access tokens that are into an Ethernet frame. RoCEv2 is designed as an Internet transmitted in plaintext. The lkeys are not part of the transport layer protocol and uses a UDP/IP header for routing. protocol, but used as a local authorization token allowing the All data communication in RDMA is based on queue pair channel adapter to access local memory of an application. (QP) connections between the two communicating parties. Memory Windows. To allow different access rights QPs are a bi-directional message transport mechanism used among remote QPs within a memory region or grant access to send and receive data in InfiniBand. Endpoints in RDMA to a part of the region, IBA makes use of Memory windows are identified by the combination of an adapter port address type 1. Memory windows type 2 further extend this protection and a queue pair number (QPN), a unique identifier of a QP mechanism by assigning a single QP to a memory window connection within destination port. For all QP endpoints at a and enforcing that only the assigned QP can access it. destination port, the RNIC generates a unique QPN. Protection Domain. IBA protection domains (PD) group 2.2 InfiniBand Architecture Security Model IB resources such as QP connections and memory regions, such that QP connections within a PD can only access mem- Processing of incoming packets is based on the base transport ory regions allocated in the same PD, providing protection header that contains the destination QPN and also a packet from unauthorized or inadvertent use of a memory area. All sequence number (PSN). The PSN is used to enforce in-order QPs and memory regions are always assigned to a specific delivery and detect duplicate or lost packets. Packets with PD and can only be a member of one PD.

2 3 Adversary Model Client A RNIC covert channel RNIC T4 In our adversary model we consider three parties (see Fig- T3 ure2): an RDMA service which hosts one or several RDMA T2 inject applications, a client who interacts with the service through T2 RNIC connect RNIC Service B sudo RDMA, and an adversary who can legitimately connect to the RDMA service, but tries to violate RDMA’s security mecha- victim connection T1T1 RNIC adversary connection nisms (e.g., access memory of other clients using RDMA). on-path adversary We assume that the adversary is located within the same network as the other parties and consider four different at- Figure 2: Illustration of the adversary model including poten- tacker models. tial adversary locations. Model T1 . First, we consider an adversary that is located at a different end host than the victim (off-path) and have these potential attack locations are much harder to achieve rightfully obtained these hosts (e.g., by renting an instance than obtaining or compromising an arbitrary end-host. in a public cloud). This attacker cannot conduct any network- based attacks such as packet injection, but can connect to 4 Security Analysis of IB Architectures RDMA services and issue RDMA messages over these con- nections. Given the aforementioned adversary model, we analyse exist- ing security mechanisms in IB-based architectures including Model T2 . Second, we consider attackers (potentially off-path) that can actively compromise end hosts and fab- memory protection key generation, QP number generation, ricate and inject messages. To successfully inject an arbitrary memory regions, memory windows, and protection domains. RDMA request the adversary must have root administrative We identify 10 vulnerabilities, labeled V1 – V10 . access. The adversary is required to know the host’s address (local identifier for IBA / IP address for RoCE), QP numbers, 4.1 Analysis Setup and the PSN to forge a valid RDMA packet. Additionally, to read or write a memory location on the remote host, the Our analysis setup includes multiple IB-based architectures adversary needs to include a valid virtual memory address such as native InfiniBand (IBA), RoCEv1, and RoCEv2. To and the corresponding memory protection key rkey. execute and evaluate our tests we use a server cluster with RNICs from Broadcom, Mellanox, and also run tests on Mi- Model T3 . Third, we consider network-based attackers where the attacker is located on the path between the victim crosoft Azure HPC instances that support RDMA (A8, A9, and the service. On-path attacks require the attacker to con- H16r). Additionally, we consider software-based RoCE (soft- trol routers or links between the victims (e.g., rogue cloud RoCE), a software implementation of RoCE that has been provider, rogue administrator, malicious bump-in-the-wire de- integrated into the kernel [21]. Table1 lists the analyzed vice). A network-based attacker can passively eavesdrop on devices and summarizes the discovered memory protection messages, but also actively tamper with the communication issues. between hosts by injecting, dropping, delaying, replaying, or altering messages. This includes altering and forging any in- 4.2 Memory Protection Keys formation in any packet header, including all IB and Ethernet headers. Since RDMA communication is in plaintext and the V1 Memory Protection Key Randomness. To protect re- IB protocol does not provide any mechanisms for authenticat- mote memory against unauthorized memory access, IBA re- ing a message to prevent on-path packet alteration, this only quires that RDMA read/write requests include a remote mem- requires recalculation of packet checksums, whose algorithms ory access key rkey, which is negotiated between communicat- and seeds are publicly available in the IBA specification. ing peers and is checked at the remote RNIC. Packets with an Model T4 . Finally, we consider an adversary that makes invalid rkey cause a connection error leading to disconnection. use of RDMA as a covert channel for exfiltrating data. For this The requirement of including an rkey is built into the silicon purpose, the adversary manipulates code or libraries executed and the driver code cannot be disabled by an attacker. Thus, by the victim (e.g., using malware) such that it establishes an to successfully circumvent this protection mechanism against RDMA connection to an RDMA-capable attacker machine in unauthorized memory access, an attacker needs to include a the same network as the victim (e.g., by renting an instance valid rkey in his requests. in a public cluster). This allows the adversary to exploit one- We analyze the randomness of the rkey generation process sided RDMA operation to "silently" access memory of the for different RNIC models and drivers. For all tested devices, victim process. rkey generation is independent of the address and length of Since both the network and control over a machine are the buffer to be registered. Changes in access flags have no well protected resources in cloud datacenters, we assume that influence on the generation of an rkey. The generated rkeys

3 Table 1: Summary of Memory protection issues across different IBA drivers.

Model Driver Arch. Static Shared Key Step QPNs QP limitd Init. Gen. Broadcom NetXtreme-E BCM57414 bnxt_re RoCEv2   0x100 sequential 32,707 Broadcom Stingray PS225 BCM58802 bnxt_re RoCEv2   0x100 sequential 61,438 Mellanox ConnectX-3 MT27500 mlx4 IB/RoCEv1   0x100a sequential 261,359 Mellanox ConnectX-4 MT27700 mlx5 IB/RoCEv2   randomb sequential 64,443 Mellanox ConnectX-5 MT27800 mlx5 IB/RoCEv2   randomb sequential 65,449 Mellanox ConnectX-6 Dx MT28841 mlx5 RoCEv2   randomb sequential 262,100 softRoCE rxe RoCEv2   0x100 + lfsr-8bitc sequential 32,707 a for a subsequent registrations b has low entropy c seed and states are known d bound by the OS limit on active file descriptors only depend on previous registration/deregistration operations. 0.3

Further, we investigate how registration/deregistration affects ) i

δ 0.2 memory registration. ( p 0.1 RNIC models from Broadcom (using the bnxt_re driver) 0.0 always increase the rkey value by 0x100 independent of the

previously mentioned factors. The exact algorithm can be 0x102 0x101 0x103 0x201 0x109 0x202 0x302 0x402 0x502 0x301 0x401 0x501 0x602 0x701 0x601 others found in AppendixA. Thus, assuming the attacker is able to obtain an rkey that is part of this series of increasing key Figure 3: Probabilities of differences for random rkey value values, predicting preceding or subsequent rkeys is trivial. generated using mlx5 device. For devices based on the mlx4 driver, the sequence of rkeys depends on registration/deregistration operations. For con- process for all tested drivers seems predictable, we further secutive registration operations each rkey gets incremented quantify the randomness of the key generation process by by 0x100. However, after a deregistration operation, the next calculating the min-entropy [5, 24], which denotes a measure rkey gets incremented by 0x80000 based on the rkey for the to describe the uncertainty associated with a random variable memory region that has been deregistered. In case of multiple by guessing the key until a correct key is found. Thus, the min- consecutive deregistration operations, the rkeys of the dereg- entropy measures the difficulty of guessing the most likely istered memory regions are queued and for each upcoming output of an entropy source. Following, the optimal strategy registration operation a key gets dequeued. The algorithm for successive guessing is to try all possible values in order for key generation can be found in AppendixA. All tested of decreasing probability. Azure HPC instances (A8, A9, H16r) use the mlx4 driver and If we consider this problem of guessing a discrete random allocate rkeys with the previously described algorithm. variable X on x ,x ,...,x with probability distribution P = The software implementation of RoCE, SoftRoCE, also in- 1 2 m (p , p ,..., p ), where p = P(X = x ), 1 ≤ i ≤ m. We assume creases the rkey by 0x100 for each registration operation, but 1 2 m i i that p ≥ p ≥ ... ≥ p . Then the min-entropy of X is defined additionally randomizes the last 8 bit using a linear-feedback 1 2 m as shift register (LFSR). However, since LFSRs are determin- istic and the initial seed is known, all subsequent states are H∞(X) = min (−log2 pi) 1≤i≤m easily computable. Moreover, the LFSR implementation used (1) = −log2 max pi = −log2 p1 by SoftRoCE generates only 15 distinct numbers, which does 1≤i≤m not increase the randomness of rkeys. The full algorithm for key generation can be found in AppendixA. Our observations showed that for all tested drivers the se- Devices based on the mlx5 driver do not use a fixed increase quence of generated rkeys was strictly increasing (modulo 32 between subsequent registrations, but still strictly increase the 2 ). Thus, we define the dependency of a newly generated values with a random value (modulo 232). An analysis of these key on the previous key as follows: values shows that with more than 60% probability either the x = x + δ (2) value 0x101 or 0x102 (see Figure3) is chosen. Thus, even i+1 i i though the key generation process of devices based on mlx5 where δi denotes the difference between key xi and xi+1 and driver contains higher entropy than other drivers, the sequence is further described using a discrete random variable ∆ with of generated keys is still predictable by an adversary with probability density function P(∆ = δi). Thus, given key xi the moderate effort. generation of key xi+1 is dependent on the randomness of ∆ Key Entropy Analysis. Given that the rkey generation and quantified by the min-entropy of ∆ (see Table2).

4 Table 2: Entropy of rkey generation for key differences RNIC, e.g., by encoding a bit-stream in the number of regis- trations they perform per a time unit. This is especially critical Driver H (∆) H(Y|X) ∞ if an adversary is located on the same physical host as the bnxt_re 0 0 victim (e.g., two VMs on the same physical host in a public mlx4 0.14 1 mlx5 2.16 2.85 cloud environment). rxe 2.04 0 4.3 Memory Allocation Randomness This dependency can further be generalized by calculating the conditional entropy [23] of subsequent key generations, V4 Consecutive Allocation of Memory Regions. In addi- which quantifies the amount of information needed to predict tion to the rkey associated to a memory location, the adversary a newly generated key xi+1 given the previous key xi (e.g., if is also required to predict the corresponding memory address. an attacker obtains a key legitimately by registering a memory Typically, techniques such as address space layout randomiza- region) and is defined as: tion (ASLR) randomly arrange the address space positions of a process. This prevents an attackers from directly referring p(x,y) to other objects in memory by randomizing their locations. H(Y|X) = − p(x,y) log (3) ∑ 2 However, subsequent objects in memory are allocated in con- x∈X,y∈Y p(x) secutive addresses with respect to a random address base [40]. where p(x,y) denotes the joint probability of x and y. For example, all objects allocated via the mmap() Linux sys- For bnxt_re and rxe we can always predict which rkey will tem call are placed side by side in the mmap area. Since RDMA-based applications run in a single process on be generated next, making the value of xi+1 completely deter- the target host, they are not protected by ASLR, but instead mined by xi, which results in the conditional entropy being equal to 0 [23]. For mlx4, the value of Y only depends on objects in memory are allocated side-by-side. Assuming an whether a region has been deregistered before the next rkey is attacker knows the address of a memory object on a target host, generated. Assuming that this occurs with probability 0.5, the predicting the memory address of other objects is possible. conditional entropy is 1 bit. Finally, for mlx5 the distribution Even though consecutive allocation of memory regions is not of key differences ∆ is illustrated in Figure3. If an attacker caused by RDMA protocols, it still affects the security of guesses that the next key is incremented by the difference RDMA applications. with the highest probability, his guess would be correct in one out of three guesses on average. The computation of the con- 4.4 QP Number Identifiers & Packet Se- ditional entropy of mlx5 results in 2.85 bits of entropy, which quence Numbers enables an attacker to guess the rkey of future registrations with high probability. V5 Linearly Increasing QP Numbers. Our evaluation V2 Static Initialization State for Key Generation. In ad- (see Table1) shows that for all tested devices and drivers dition to the limited number of rkeys, the RNICs based on the the QP numbers are allocated sequentially. Assuming that an bnxt_re and mlx4 drivers are initialized using static state and adversary registers a QP himself or observes a QP registration the same set of keys persists across different physical reboots request, predicting preceding or subsequent QP numbers is of the machine. Assuming that an adversary has observed the trivial. Furthermore, as IBA uses 24 bit QP numbers, it is not entire key set, the same keys will be reused even after the possible to establish more than 224 QP connections within an physical machine rebooted. RNIC. Since the IB network adapter on Azure instances is virtu- V6 Fixed Starting Packet Sequence Number (PSN). alized and a reboot of the instance does not lead to physical The implementation of RDMA offers two ways of establish- reboot of the machine, the tested Azure instances were not ing RDMA connections: a native RDMA connection interface affected by static initialization. or using the RDMA connection manager [20] to establish V3 Shared Key Generator. On all tested devices the key connections. Using the native connection interface, the con- generator is fully shared between applications using the same nection parameters, such as destination QP number, local network interface even if they use different protection do- and remote starting PSNs, are set by the application devel- mains. Thus, if multiple RDMA applications are running on opers. The RDMA connection manager moves this burden the same service the prediction of rkeys of other applications away from application developers and randomly generates a based on own rkeys is trivial as they have been generated starting PSN (using a cryptographic pseudorandom number using the same key generator. generator), thereby making the process of RDMA connection In addition to enabling memory key prediction across mul- establishment similar to TCP sockets. Our analysis (see §6) tiple application, this vulnerability can also be exploited to shows that many RDMA-based open-source applications opt open a side-channel between applications sharing the same for using the native interface and manually set the starting

5 PSNs. In case the starting PSNs are not randomized on a per attacks on RDMA networks, labeled A1 – A6 . For each of QP connection basis, predicting PSNs of established connec- the attacks we explain the experiments we conducted and tions becomes much simpler. discuss potential mitigation mechanisms, which are discussed in greater detail in §7. Table3 outlines the dependency on 4.5 Other Security Weaknesses in IBA/RoCE attacker locations and vulnerabilities, and potential mitigation mechanisms for each of the attacks. Furthermore, we describe four security weaknesses that greatly enable or facilitate attacks on RDMA applications. 5.1 A1: Packet Injection using Impersonation V7 Limited Attack Detection Capabilities. RDMA al- lows one machine to directly access data on remote machines As current RDMA systems enforce no source authentica- across the network. Due to network offloading of one-sided tion V8 , an adversary can impersonate any other endpoint RDMA operations, all memory accesses are performed using and inject packets that seem to belong to an established con- dedicated hardware on RNIC without any CPU interaction. nection by another client. To inject an RDMA packet that This makes memory accesses completely invisible to applica- is considered valid by the receiving endpoint, the adversary tions and limits their capabilities of detecting attacks. needs to know the QPN of the victim and the current PSN. V8 Missing Encryption and Authentication in RDMA Apart from obtaining these parameters by on-path eaves- Protocols. Existing RDMA network protocols do not pro- dropping or impersonation of end hosts, an attacker could vide any mechanisms for authentication nor encryption of try to predict them. Given that QP numbers are generated the header and the payload of RDMA packets. An adversary sequentially for each new client V5 , an attacker can obtain can spoof any field in the packet header or alter any byte in expected QPNs of clients by simply connecting to the RDMA- the packet payload of RDMA messages. In-network packet enabled service and decrement the QPN that gets assigned to alteration only requires recalculation of packet checksums, the attacker. Thus, a valid PSN remains the only protection whose algorithms and seeds are known and specified by the mechanism that prevents an attacker from injecting a packet. IBA. Potential solutions for encryption and authentication in If an application also does not generate starting PSNs ran- IBA are further discussed in §7.3. domly, the attacker can start bruteforcing PSN by exploiting V9 Single Protection Domain for all QPs. To reduce the the fixed starting PSN issue V6 , which significantly reduces state overhead on RNICs the RDMA connection manager the search space. Otherwise, even with random PSNs, the at- by default uses a single protection domain for all established tacker can bruteforce the PSN within a reasonable amount of 23 QPs and memory registrations within a single process. As time as only 2 packets are required on average to generate a result, all QPs of a single process can access memory of a valid PSN. Bruteforcing the current PSN of a QP connec- each other. Nonetheless, even developers using the native tion is related to enumerating the sequence number of a TCP connection interface seem to opt for using a single PD for all connection [41], with the main difference that packets with its QPs and memory registrations (see §6). invalid PSN simply get discarded by the RNIC and do not affect the established QP connection. V10 Implicit On-Demand Paging (ODP). Implicit On- Demand Paging (ODP) enables a process to register its com- In addition to regular RDMA packets, injection of RDMA plete memory address space for I/O accesses. This feature read and write packets additionally requires the attacker to is used for high-performance communication settings, where know a valid memory address and its corresponding memory the overhead of frequently registering communication buffers protection key rkey. This attack is further discussed in §5.3. leads to performance degradation. ODP removes the need to In the remainder of this work, attacks based on imperson- register memory as any memory address can be registered by ation are referenced to as A1 . RNIC on demand. If ODP is enabled, an attacker can remotely Experiments. To verify the feasibility of the attacks on access the entire memory space of a process resulting in high RDMA protocols, we implemented a spoofing tool for RoCE*. attack potential. While this feature is disabled by default, re- The tool can fabricate any custom RoCEv1 and RoCEv2 cent advances in high-performance communication systems packet including RDMA read and write operations and is lead to this feature gaining traction in IB deployments [19]. fully compatible with the IBA specification [3]. Our RoCEv1 implementation uses Linux raw Ethernet sockets, and Ro- 5 Attacks on IBA CEv2 uses IPv4 raw sockets and UDP as a transport layer. The tool can mimic any RDMA request initiator and inject Using the discovered vulnerabilities, an adversary could custom RDMA packets over any Ethernet links. RDMA over launch attacks in RDMA networks, e.g., by using unautho- IB link cannot be fabricated in software, as the IB protocol is rized access to memory regions or by disrupting communica- implemented fully in hardware. The tool has been tested on tion using DoS. Furthermore, vulnerabilities in RDMA could all RoCE devices listed in the Table1. also be misused as an attack vector for application-level at- tacks (e.g., malware). In this section, we describe six potential *https://github.com/spcl/redmark

6 Table 3: Overview of dependency on attacker location and vulnerabilities for attacks on RDMA combined with an overview of mitigation mechanisms that thwart the attack.

T1 T2 T3 T4 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 M1 M2 M3 M4 M5 M6 M7 M8

A1        

A2        

A3        

A4        

A5        

A6        

weak static init. shared keyweak gen. mem. rand.lin. inc. QPNfixed startinglim. PSN attack detect.no enc./auth.single PD ODP enabledrand. QPN rand. HW countersmem. win. typemultiple 2 PDsenc./auth. inresource IB const.in-network filt.

rkey rkey

enables attack facilitates attack does not affect attack  mitigates attack  increases attack complexity  does not mitigate attack

In these experiments (see Table4), we measure the injec- Injection of multiple valid packets causes a connection tion of our RoCE spoofing tool, which allows an loss on the victim side. The QP of the victim who has been estimation of the time required to bruteforce a random PSN impersonated by the adversary experiences a "Transport Retry and a PSN that has been generated based on a known initializa- Counter Exceeded" error and transitions its connection to an tion value. The injection tool for RoCEv1 was able generate error state, when it tries to send packets. However, the other 1.30 millions packets per second (Mpps), whereas the tool for endpoint of the QP only transitions into error state if it tries RoCEv2 was able to generate 0.74 Mpps for Broadcom and to reply to the disconnected victim QP side. Otherwise, the 1.57 Mpps for ConnectX-5. Full enumeration of a random connection remains open. Moreover, since packet injection PSN thus took 13 s for RoCEv1. RoCEv2 enumeration tool increases the PSN counter on the receiver, the victim’s packets takes 10.60 s for Mellanox and 23 s for Broadcom. will get discarded due to a PSN mismatch. This effectively The performance of ReDMArk’s spoofing and packet in- prevents the victim from closing the connection and allows jection framework could not achieve line-rate for the tested the attacker to inject messages over an extended period. devices as: 1) the packet checksums are calculated by the Furthermore, our experiments showed that if the attacker CPU and not offloaded to the NIC. 2) our framework does injects exactly 224 packets (i.e., the size of the PSN counter), not bypass the OS, whereas native RDMA messages do. To then the injection remains completely unnoticed by the vic- improve the performance of packet injection and thus exhaust- tim and it can continue communication due to matching PSN ing this bruteforce search even faster, a hardware appliance counters. Our injection tool requires approximately 13 s to could be used. Furthermore, specific Mellanox NICs support inject 224 packets. Therefore, if the victim does not use its raw Ethernet programming with a kernel bypass [2]. connection for this amount of time, the attacker can success- 24 Table 4: Injection throughput of the RoCE spoofing tool. fully inject 2 packets without disrupting the connection of the victim. Model Link speed Protocol Throughput Practicality. Packet injection based on impersonation can Mellanox ConnectX-3 40 Gbps RoCEv1 1.31 Mpps be performed under the assumption of the T2 model, i.e., Mellanox ConnectX-5 100 Gbps RoCEv2 1.57 Mpps the attacker requires root access to any machine in the same Broadcom NetXtreme-E 25 Gbps RoCEv2 0.74 Mpps network as the victim. Similarly, an on-path attacker T3 Broadcom Stingray 25 Gbps RoCEv2 0.74 Mpps (e.g., bump-in-the-wire) also has packet injection capabilities. Since root access to machines in public cloud environments Interestingly, injection of a single correct packet does not is a well protected resource (and would require a sandbox cause a victim’s connection to break. A mismatch in the PSN escape [27]), this attack is more realistic in the setting of counter by 1 packet between sender and receiver is resolved small cloud providers or private RDMA cluster as used by by the protocol. The protocol treats the victim’s packet as a re- companies and research groups. peated packet and always acknowledges it without processing. Mitigation. To effectively mitigate attacks based on imper- The sender receives the acknowledgment about successful sonation, source authentication could be deployed. However, transmission, even though the packet has not been processed since this is on-going research and not yet available for IB- by the remote RNIC. As a result, the attacker is able to replace based RDMA deployments, we suggest the following miti- the victim’s packet with a forged one. gations to increase the complexity of packet injection by an

7 off-path attacker: each QP connection should be initialized reduce performance of the attack, as the attacker will be un- with a random starting PSN instead of using a per-device aware of the QPN of other clients, making enumeration in- starting PSN. As this only marginally increases the attack feasible. If QPNs are randomly generated, the attacker needs complexity also QPN should be randomly assigned to QPs. 224 to probe approximately o QPNs, where o is the number of We suggest a mechanism for randomizing QPNs for existing open connections on the victim’s machine. For example, if RDMA deployments and that can be deployed for all vendors the victim has 1024 open QPs, the attack tool can only break of RNICs in §7.1. As modern RNICs provide hardware coun- one connection per 48 h on average. ters that are accessible by the application (see §7.2), these counters should be used to detect bruteforcing attempts. Fur- thermore, operators of RDMA networks could also perform 5.3 A3: Unauthorized Memory Access ingress filtering for all end hosts (see §7.4). Unauthorized memory access effectively breaks secrecy of applications running on a victim host, but might also influence 5.2 A2: DoS Attack by Transiting QPs to an their behavior. Even worse, since RDMA operations can be performed purely one-sided V7 , the victim is unable to detect Error State such attacks. Attacks based on unauthorized memory access In IB-based architectures, connections based on the RC QPs are referred to as A3 . are sensitive to content of the header of requests. Protocol As illustrated in Figure4, an attacker establishes an RDMA errors, such as inconsistencies in the sequence number or QP connection with a service and tries to access memory of other number, are recoverable errors and resolved by the protocol. clients connected to this service. RDMA applications typi- However, memory errors, such as incorrect operation numbers cally share a PD for all RDMA resources V9 and allocate or an inconsistency between payload length and DMA length private RDMA-accessible buffers for each new user. These immediately leads to unrecoverable errors, which will cause buffers are allocated in close proximity to each other, e.g., the RNIC to transit the QP to the error state and the QP to as chunks of a larger continuous memory region, allowing disconnect [3]. We refer to this attack as A2 . an attacker to predict the virtual memory address of buffers An on-path attacker T3 can trivially modify the operation belonging to other clients V4 . In the example, the attacker numbers or payload lengths to drop connections. However, tries to access the memory region adjacent to its own memory even an off-path adversary can inject incorrect packets to- region. To gain access, the attacker is also required to guess wards a victim QP endpoint and effectively disrupt communi- the corresponding memory protection key rkey for a mem- cation of other entities (see §5.1). ory region. Given that rkeys are highly predictable V1 , the attacker can guess the keys of adjacent memory regions based Experiments. To conduct an attack that transits a QP to an on its rkey that he obtained after registering a memory region. error state, we use the packet injection tool with the goal to The attacker can also exploit other vulnerabilities of rkey inject invalid packets into the victim’s QP connection such generator such as static initialization of memory key gener- that it triggers an unrecoverable error, which results in the QP ator V2 and shared key generator V3 . V2 can be used to being forced to disconnect. These experiments showed that a single fabricated packet injected into the connection was guess rkey after a reboot of the machine. V3 can be employed sufficient to effectively break the victim’s connection. by an attacker sitting on the same physical machine. Finally, Given these insights, an attacker could try to repeatedly ODP allows accessing any virtual address by having a single drop the connection of a specific client or drop all connec- rkey V10 . If the attacker can successfully guess the rkey of tions of clients that are trying to connect to a service. The fact ODP registration, it can access the whole memory space of the process. that QPNs are generated sequentially V6 highly facilitates the realization of such attacks. In addition, if an application uses non-random starting PSNs V7 , the attacker needs to 0x7FF1234 Client A owner = A guess only small number of PSNs to break a connection. For r_key = 0x100 0x7FF2345 example, our injection tool for RoCEv2 can drop one con- RDMA Read 24 0x7FF1234 owner = M nection every 10.60 s by fully enumerating all possible 2 M 0x200 r_key = 0x200 PSNs for a single QP. Then it transits to the next QPN and 0x7FF3456 thus sequentially breaks all QP connections. Figure 4: Unauthorized memory access on the same host. Practicality. As this attack relies on packet injection, it assumes the same threat models as A1 . Due to the missing integrity protection V8 , an on-path Mitigation. As this attack relies on packet injection to attacker T3 would even be able to alter remote memory by successfully break a victim’s connection, similar mitigation spoofing valid RDMA write packets. Similarly, an off-path techniques should be applied to thwart such DoS attacks. In attacker T2 can perform this attack via impersonation by addition, our QPN randomization technique can significantly tricking the victim host into processing fabricated packets.

8 Experiments. Using our attack framework, an attacker con- per application: the results varied from 32,707 for Broadcom nects to an RDMA-enabled system to obtain a memory access to 261,359 for Mellanox. Thus, the attacker needs to keep key and memory address. Then, it tries to gain unauthorized alive a much smaller number of active connections than 224. access by trying combinations of rkeys and memory locations. The variation in the numbers comes from default settings of The attacker polls for RDMA completion events to receive the drivers and the OS. Drivers put a limit on the number of acknowledgment on the success of the unauthorized access. open QP connections per application. If the guess is incorrect, the attack framework reconnects and In addition, if an application uses the RDMA connection retries the attack. manager to establish connections, each RDMA connection Successful unauthorized access without sniffing requires gets a file descriptor assigned for receiving link events. The knowing the code of the system under attack to see the pat- underlying operating system usually enforces strict limits on terns in memory allocation and registration. This is required to the number of concurrently open file descriptors. Thus, by reduce the search space of potential virtual addresses. We fur- opening QP connections the attacker can exhaust the number ther analyzed open-source RDMA applications to see whether of available file descriptors (instead of QPs), which might they are vulnerable to these type of attacks (see §6). Unfortu- be much smaller. Experiments in our testbeds showed a file nately, almost all of the tested application were vulnerable to descriptor limit of 4096, whereas for instances deployed on unauthorized memory access (see Table5). Microsoft Azure we were able to open 65,535 file descriptors. Practicality. Unauthorized memory access is possible un- Practicality. Resource exhaustion of QP allocations is pos- der the assumption of the T1 model, as it can be performed sible under the assumption of the T1 model and does not by any client located on an RDMA-enabled service without require any special capabilities. requiring any special capabilities, and even in a trusted net- Mitigation. RDMA-capable devices should limit the num- work. If an RDMA-based system (e.g., see Table5) would be ber of open QP connections from the same remote endpoint. deployed in a public cloud environment with RDMA support, This could be realized based on the IB endpoint identifiers or any client could perform unauthorized memory access. the IP addresses for RoCE. Mitigation. To mitigate unauthorized memory access, each new RDMA client could be assigned to a different PD. How- 5.5 A5: Performance Degradation using Re- ever, this would increase the resource usage per client on source Exhaustion RNICs. Additionally, more modern RDMA devices can em- ploy memory windows type 2 to pin a memory region to a Since RDMA allows an attacker to target offloading resources specific QP, which prevents other clients from accessing it. to an RNIC V7 , an attacker might try to exhaust these re- Memory windows type 2 further allow applications to choose sources by issuing a large number of RDMA reads or writes. the 8 least significant bits of the rkey randomly. For example, an attacker might target computational resources Another measure to prevent unauthorized memory access of the RNIC’s packet processing units. This will cause an in- would be the randomization of memory addresses chosen for creased latency for other entities accessing the same end host, buffers (similar to ASLR [40] / PIC [29] for regular applica- but might eventually lead to disruption of service access. In- tions). terestingly, due to the one-sided nature of RDMA reads and Finally, RDMA applications should randomize rkey gener- writes, resource exhaustion attacks can be executed "silently", ation, especially, if the RDMA devices with low entropy are i.e., with minor detection possibilities on the victim host. We used. We propose a mechanism, working on all RDMA de- refer to performance degradation attacks based on resource vices, that randomizes the rkey generation process (see §7.1). exhaustion as A5 . Experiments. We analyze the influence of resource ex- 5.4 A4: DoS Attack based on Queue Pair Allo- haustion using a varying number of attackers on the latency cation Resource Exhaustion and bandwidth of RDMA read or write operations. Each at- tacker is located on a dedicated machine equipped with a Another exhaustion attack focuses on the number of QPs a Mellanox ConnectX-3 RNIC and connected to the victim ser- 24 device can handle. Theoretically, up to 2 connections could vice through a switch. Each attacker floods the victim service be opened on a device V5 . In reality, the tested devices were with RDMA write requests of maximum transmission unit able to handle much lower numbers. Thus, an attacker could size, (4 KB in the testing environment) with the intention of try to open as many QP connections as possible and keep exhausting packet processing resources of the victim’s RNIC. them open with minimal effort. Thus, if the attacker is able Since RDMA write are enabled by default, but RDMA read to saturate the limit for QP allocations of the victim service, operations must be explicitly enabled during connection es- he could effectively deny other benign clients from opening a tablishment, exhaustion attacks based on RDMA writes are QP connection, which is further referenced as A4 . more likely to occur. Experiments. According to our findings, tested devices To observe the effect of this attack, we measure the latency had different limit on the number of active QP connections and available bandwidth as observed by a client of the vic-

9 40 40 35 35 30 30 25 25 20 20 15 15

Latency (us) 10 Latency (us) 10 5 10x 5 0 0 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 Packet size (byte) Packet size (byte)

5000 5000 4000 4000 10x 3000 3000 2000 2000 1000 1000 Throughput(req/sec) Throughput(req/sec) 0 0 16 32 64 128 256 512 1024 2048 4096 16 32 64 128 256 512 1024 2048 4096 Packet size (byte) Packet size (byte) orig. 1 att. 2 att. 3 att. 4 att. 5 att. 6 att. 7 att. 8 att. orig. 1 att. 2 att. 3 att. 4 att. 5 att. 6 att. 7 att. 8 att.

Figure 5: Effect of an exhaustion attack using RDMA write Figure 6: Effect of an exhaustion attack using RDMA write on latency and bandwidth of RDMA read. on latency and bandwidth of RDMA write.

5.6 A6: Facilitating Attacks using RDMA tim service. Figures5 and6 illustrate the results of these In addition to attacks that use RDMA as an attack vector, experiments. During normal operation, the latencies for both RDMA can also facilitate attacks (e.g., for data extraction). RDMA read and write operations as observed by the client As RDMA read and write operations do not require any inter- remain largely unaffected by the size of the requests as the action by a remote host’s CPU V7 , they allow an attacker to requests fit in a single packet. However, given the presence of “silently” read and write data. only two attackers that flood the victim service with requests, For example, if an attacker has the privilege to preload a the latency for regular RDMA requests increases by factor library to a victim’s application, the attacker can misuse this 3. For each additional attacker the latency further increases ability to inject code that establishes an RDMA connection by 4.20 µs. In terms of throughput, the resource exhaustion to the attacker’s application. This RDMA connection can attack is even more severe. For two or more attackers, the then be used by the attacker to read memory from the victim throughput of a victim reduces by factor 8 – 10 for RDMA without involvement of the victim CPU or intervention with read requests. For RDMA writes, the attack should be per- applications executed by the victim. formed by at least five attackers to notably affect the write throughput of legitimate applications. Since the memory registered by an application must also be readable by the RNIC, the attacker can either preregis- ter a large chunk of memory or enable ODP access which Practicality. This resource exhaustion attack does not re- grants access to any valid virtual address without memory quire any special capabilities and is achievable under the registration V10 . Then, by continuously sweeping the read- assumption of the T1 model. However, collusion of several able memory, the attacker can eavesdrop on sensitive data of attackers is required to effectively disrupt a public service. applications. Experiments. To illustrate the feasibility of using RDMA Mitigation. Due to the nature of one-sided RDMA opera- as an attack vector, we implemented a proof-of-concept appli- tions, the misuse of RDMA for performance degradation is cation that preloads a malware library to a binary and allows almost undetectable. However, modern RNICs (e.g., based an attacker to intercept a secret passphrase entered by the on mlx5) support hardware counters on the device which are victim by reading memory using RDMA read operations. The accessible by the host. Thus, a host would be able to detect malware (see Listing1) preallocates a memory space and resource exhaustion attacks based on excessive issuance of register it for RDMA Read access. Then it sends the rkey and requests. Using this detection based on HW counters would the memory address of the memory region to the attacker, allow a host to mitigate these attacks. and deallocates the memory. Freed memory is still RDMA

10 Table 5: Summary on vulnerabilities of open-source RDMA- //Initialization rdma_connection* con= connect("Attacker 's IP and PORT") enabled systems, and how they establish connections: using //Size of adversarial memory native interface or connection manager. const uint32_t length= 4096; //Pre-allocate the adversarial memory void* buf= malloc(length); System Connection A1 A2 A3 A4 A5 //Register the pre-allocated buffer a a // with RDMA READ access Infiniswap [11] Manager      ibv_mr* mr= ibv_reg_mr(PD,buf,size,RDMA_READ); Octopus [22] Native a a    //Send the memory region to the attacker HERD [13] Native a a    con->send(mr->address,mr->rkey,mr->length); RamCloud [28] Native a a    //Free the buffer so that Dare [31] Native a a    // the victim can use it. a a free(buf); Crail [34, 35] Manager      a if deployed over RoCE Listing 1: Pseudocode of RDMA malware library  vulnerable to attack  resilient to attack

accessible, as it is not deregistered. Since the memory is deal- (difference of 0x40002000 bytes). Using A3 , an attacker located, the victim is able to use it for storing its passphrase. can connect to the Infiniswap service and get access to the The remote attacker can continuously read the memory using memory of other clients in the same PD. Since Infiniswap RDMA to gain the passphrase. ODP strengthens the attack by does not limit the number of connections and resources per allowing the attacker to read the whole memory space, and client, a single client can occupy all connections using A4 or not just the pre-registered region. Practicality. The attack is achievable under T4 model. execute performance degradation attacks using A5 . Both attacker and the victim needs to be in the same network Octopus [22] is an RDMA-enabled distributed persistent and be equipped with RDMA-capable NICs. In addition, the memory file system. As Octopus uses a hard-coded fixed start- attacker should be capable of replacing or modifying the ex- ing PSN, an attacker can trivially predict subsequent PSNs ecution binaries of the victim. Note that this attack can not and a perform packet injection attack A1 and A2 . Further- only be applied to RDMA applications, but can also be used more, all clients share a single buffer that can be accessed to obtain sensitive data from other applications. using a single rkey A3 . Thus, the Octopus system relies on Mitigation. A mitigation mechanism that prevents an ad- strict trust in all participating parties, i.e., clients must write versary from misusing RDMA as an attack vector would be remote procedure call (RPC) requests to predefined offsets, check to what libraries are preloaded with a binary. How- as otherwise the system would fail. Additionally, even though ever, to more generally prevent attacks that include RDMA Octopus does not use RDMA reads, all buffers are registered operations in code, the system should rely on remote code with read permissions enabled. Thus, a misbehaving client attestation (e.g., based on SGX [6]). can read and change RPCs of all other clients and force the system to execute a wrong RPC. Finally, Octopus also does 6 Vulnerability Assessment of Open-Source not limit the number connections and resources per client RDMA Systems A4 , and clients are able to obtain RDMA writable memory

We analyse whether recent open-source applications and sys- regions after establishing a connection A5 . tems that use RDMA as a communication mechanism are The HERD [13] system implements an RDMA-enabled vulnerable to the aforementioned attacks. Table5 lists the key-value store. Similar to Octopus, HERD uses a single analyzed systems and their security issues. memory buffer with a single registration for all RPC requests Infiniswap [11] is a remote memory paging system that uses by clients, does not limit the number of connections and re- remote memory as a swap block device and is specifically sources per client and is thus vulnerable to A3 , A4 , and designed to be used in an RDMA network. To “swap out” memory pages a local block is sent over RDMA to a remote A5 . However, unlike Octopus HERD generates PSNs ran- block using an RDMA write operation. Similarly, to “swap in” domly. Unfortunately, systems such as Hermes [14] and cc- NUMA [10] that are implemented using HERD inherit all its memory pages RDMA read operations are used. Using A1 , an vulnerabilities. attacker can inject a packet and modify the content of swapped pages. Infiniswap is also vulnerable to DoS attacks using RamCloud [28] is a distributed key-value store based on two-sided RDMA. As one-sided RDMA operations are not A2 which breaks connections of other clients. Furthermore, the Infiniswap daemon uses posix_memalign in a loop to enabled, performance degradation using A5 is not possible. allocate and register buffers of 1 GB, allowing an attacker However, unauthorized memory access A3 is still possible to predict the position and rkey of newly allocated buffers because RamCloud registers memory with enabled remote

11 accesses and the memory allocation is static. RamCloud starts //Initialization memory allocation at address 0x40000000 and all subsequent RandomPool pool; //Create a pool of QP connections allocations are offset by adding 1 GB to the previous address. // by skipping a random number of QP connections Furthermore, the number of clients is also not limited, and a for(int i=0; i

12 //Initialization based on TLS with client authentication [32]) of RDMA RandomPool pool; applications is not possible, because RDMA read and write //Allocate an anchor buffer void *anchor=mmap(0, PAGESIZE, PROT_READ|PROT_WRITE, operations can operate as purely one-sided communication MAP_PRIVATE|MAP_ANONYMOUS,-1,0); routines (without involvement of the other parties CPU). //Create a pool of memory registration An approach based on application-layer encryption would // register the anchor buffer many times. require a temporal buffer for the incoming encrypted for(int i=0; i

13 aims to provide a guideline for designing protocols based communication in RDMA networks. Given that InfiniBand is on RDMA, but is completely implementation agnostic and deployed in public infrastructure and more providers plan to only mentions potential vulnerabilities specific to RDMA adopt RDMA networking, weak RDMA security creates real- protocols. ReDMArk tests the applicability of vulnerabilities world vulnerabilities in RDMA-enabled systems. This work to specific implementations of RDMA (such as InfiniBand) shows the security implications of RDMA on cloud systems and shows that the security pitfalls of using RDMA remain and demonstrates the critical importance of security in the misunderstood. design of upcoming versions of InfiniBand and RoCE (e.g., Tsai et al. [39] discuss the threats and opportunities of by fully integrating header authentication and payload encryp- one-sided communication. They raise concerns about the pre- tion). In addition, developers of RDMA-enabled systems must dictability of hardware-managed memory protection key and be aware of the threats introduced by RDMA networking and the potential misuse of one-sided RDMA communication for should employ mitigations such as using type 2 memory win- DoS. Compared to previous work, ReDMArk provides an dows, a separate PD for each connection, and our proposed in-depth security analysis of RDMA networking (e.g., investi- algorithms to randomize the QPN and the rkey generation. gates the algorithms behind rkey generation in detail) covering not only vulnerabilities, but discussing the full chain of vul- Responsible Disclosure nerabilities, proposes specific attacks based on the discovered We have notified and responsibly disclosed the weaknesses to vulnerabilities, and mitigations for these attacks. Mellanox, Broadcom, and Microsoft prior to the submission Kornfeld Simpson et al. [33] summarize the security flaws of this work. in RDMA protocols (e.g., missing authentication and encryp- tion) and discuss security challenges of designing RDMA- Acknowledgments enabled storage systems. In addition to the attacks discussed by ReDMArk, they suggest to exploit priority flow control We would like to thank our shepherd, Haya Shulman, and (PFC) pause frames in RoCE [12] to flood buffers on switches. the anonymous reviewers for their constructive feedback. We However, they mention that the most recent version of RoCE thank and Broadcom Inc. for the hard- is not subject to this attack as it does not require PFC. ware donations as well as their feedback during the disclosure of this work. In addition, we thank Igor Zablotchi for assisting Furthermore, Tsai et al. [38] discovered that RNICs could with the evaluation of this work. We gratefully acknowledge be exploited for side-channel attacks. They implemented an support from ETH Zurich, and from the Zurich Information RDMA-based side channel attack that allows an attacker on Security and Privacy Center (ZISC). Furthermore, we thank one client machine to learn how victims on other client ma- the Microsoft Swiss Joint Research Centre for their support. chines. The attacker uses RDMA access latency and a trained classifier to statistically predict victim accesses. References Kurth et al. [15] have shown that the Intel DDIO [1] and RDMA features facilitate a side-channel attack named Net- [1] Intel® Data Direct I/O Technology Overview. CAT. Intel DDIO technology allows RDMA read and write https://www.intel.co.jp/content/dam/www/ accesses not only to a pinned memory region, but also parts public/us/en/documents/white-papers/data- of the lowest CPU cache. NetCAT remotely measures cache direct-i-o-technology-overview-paper.pdf, activity caused by a victim’s SSH connection to perform a 2019. [Online; accessed 19-Sep-2020]. keystroke timing analysis and recovers words typed in the [2] Raw Ethernet Programming: Basic Introduction - SSH session. Using this analysis, an attacker can recover Code Example. https://community.mellanox.com/ words typed in the SSH session on another computer. These s/article/raw-ethernet-programming--basic- works based on side-channel attacks using RDMA are com- introduction---code-example, 2019. [Online; plementary to ReDMArk. accessed 19-Sep-2020].

9 Conclusion [3] InfiniBand Trade Association. The InfiniBand archi- tecture specification. https://www.infinibandta.org/ibta- RDMA architectures such as RoCE and InfiniBand were de- specifications-download/, 2000. signed for HPC and private networks, and have neglected security in their design in favor of focusing on high perfor- [4] Infiniband Trade Association. Supplement to Infini- mance. As illustrated by ReDMArk, the design of IBA and Band architecture specification volume 1, release 1.2. 1: the implementation of IB-capable NICs contain multiple vul- Annex a16: RDMA over Converged Ethernet (RoCE), nerabilities and design flaws. These weaknesses allow an ad- 2010. versary to inject packets, gain unauthorized access to memory regions of other clients connected to an RDMA-based service [5] Christian Cachin. Entropy measures and unconditional with potentially drastic consequences, and effectively disrupt security in cryptography. PhD thesis, ETH Zurich, 1997.

14 [6] Victor Costan and Srinivas Devadas. Intel SGX ex- [17] Manhee Lee, Eun Jung Kim, and Mazin Yousif. Security plained. IACR Cryptology ePrint Archive, (086), 2016. enhancement in InfiniBand architecture. In Proceedings of the IEEE International Parallel and Distributed Pro- [7] Naganand Doraswamy and Dan Harkins. IPSec: the cessing Symposium, 2005. new security standard for the Internet, intranets, and virtual private networks. Prentice Hall Professional, [18] Bojie Li, Tianyi Cui, Zibo Wang, Wei Bai, and Lintao 2003. Zhang. Socksdirect: Datacenter sockets can be fast and compatible. In Proceedings of the ACM Special Interest [8] Aleksandar Dragojevic,´ Dushyanth Narayanan, Orion Group on Data Communication, pages 90–103. 2019. Hodson, and Miguel Castro. Farm: Fast remote memory. In Proceedings of USENIX Conference on Networked [19] Mingzhe Li, Xiaoyi Lu, Hari Subramoni, and Dha- Systems Design and Implementation (NSDI), pages 401– baleswar K Panda. Designing registration caching free 414, 2014. high-performance MPI library with implicit on-demand paging (ODP) of InfiniBand. In IEEE International [9] P. Ferguson and D. Senie. Network ingress filtering: Conference on High Performance Computing (HiPC), Defeating denial of service attacks which employ IP pages 62–71, 2017. source address spoofing. BCP 38, 2000. [20] Linux RDMA. RDMA core userspace libraries and [10] Vasilis Gavrielatos, Antonios Katsarakis, Arpit Joshi, daemons. https://github.com/linux-rdma/rdma- Nicolai Oswald, Boris Grot, and Vijay Nagarajan. Scale- core/, 2020. [Online; accessed 19-Sept-2020]. out ccnuma: Exploiting skew with strongly consistent caching. In Proceedings of the Thirteenth EuroSys Con- [21] Linux RDMA. Software RDMA over Converged Eth- ference, pages 1–15, 2018. ernet. https://github.com/SoftRoCE/rxe-dev/, 2020. [Online; accessed 19-Sept-2020]. [11] Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. Efficient memory dis- [22] Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. Octo- aggregation with INFINISWAP. In Proceedings of the pus: an RDMA-enabled distributed persistent memory USENIX Conference on Networked Systems Design and file system. In USENIX Annual Technical Conference Implementation (NSDI), pages 649–667, 2017. (ATC), pages 773–785, July 2017. [23] David JC MacKay. Information theory, inference and [12] Chuanxiong Guo, Haitao Wu, Zhong Deng, Gaurav Soni, learning algorithms. Cambridge university press, 2003. Jianxi Ye, Jitu Padhye, and Marina Lipshteyn. Rdma over commodity ethernet at scale. In Proceedings of the [24] James L Massey. Guessing and entropy. In Proceedings ACM SIGCOMM Conference, pages 202–215, 2016. of 1994 IEEE International Symposium on Information Theory, page 204. IEEE, 1994. [13] Anuj Kalia, Michael Kaminsky, and David G. Andersen. Using RDMA efficiently for key-value services. In [25] Mellanox. Mellanox ConnectX-6 DX. Proceedings of ACM SIGCOMM, pages 295–306, 2014. https://www.mellanox.com/files/doc-2020/ pb-connectx-6-dx-en-card.pdf, 2020. [Online; [14] Antonios Katsarakis, Vasilis Gavrielatos, MR Siavash accessed 19-Sept-2020]. Katebzadeh, Arpit Joshi, Aleksandar Dragojevic, Boris Grot, and Vijay Nagarajan. Hermes: a fast, fault-tolerant [26] Mellanox. Understanding mlx5 Linux Counters and Sta- and linearizable replication protocol. In Proceedings of tus Parameters. https://community.mellanox.com/ the International Conference on Architectural Support s/article/understanding-mlx5-linux- for Programming Languages and Operating Systems, counters-and-status-parameters, 2020. [Online; pages 201–217, 2020. accessed 19-Sept-2020].

[15] Michael Kurth, Ben Gras, Dennis Andriesse, Cristiano [27] Microsoft. Cve-2019-1372, azure stack re- Giuffrida, Herbert Bos, and Kaveh Razavi. NetCAT: mote code execution vulnerability. https: Practical cache attacks from the network. In IEEE Sym- //portal.msrc.microsoft.com/en-US/security- posium on Security and Privacy (S&P), 2020. guidance/advisory/CVE-2019-1372, 2020. [On- line; accessed 19-Sept-2020]. [16] Manhee Lee and Eun Jung Kim. A comprehensive framework for enhancing security in InfiniBand archi- [28] John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita tecture. IEEE Transactions on Parallel and Distributed Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Systems, 18, 2007. Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen

15 Rumble, Ryan Stutsman, and Stephen Yang. The RAM- [40] Fernando Vano-Garcia and Hector Marco-Gisbert. Cloud storage system. ACM Trans. Comput. Syst., KASLR-MT: Kernel address space layout randomiza- 33(3):7:1–7:55, August 2015. tion for multi-tenant cloud systems. Journal of Parallel and Distributed Computing, 137:77–90, 2020. [29] R Kim Peterson. Position independent code location system, 1996. US Patent 5,504,901. [41] Michal Zalewski. Strange attractors and tcp/ip sequence number analysis. RAZOR/Bindview Corporation, 2001. [30] J. Pinkerton and E. Deleganes. Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Proto- A Algorithms of rkey generators col (RDMAP) Security. RFC 5042, October 2007.

[31] Marius Poke and Torsten Hoefler. Dare: High- static uint32_t bnxt_get_key(void){ performance state machine replication on rdma net- static uint32_t key= 0x100; works. In Proceedings of the International Symposium key += 0x100 on High-Performance Parallel and Distributed Comput- return key; } ing (HPDC), pages 107–118, 2015.

[32] Eric Rescorla. The Transport Layer Security (TLS) Protocol Version 1.3. RFC 8446, 2018. Listing 4: rkey generation of bnxt_re

[33] Anna Kornfeld Simpson, Adriana Szekeres, Jacob Nel- son, and Irene Zhang. Securing RDMA for high- static uint32_t rxe_get_key(void){ performance datacenter storage systems. In USENIX static uint32_t base= 0x100; static unsigned key=1; Workshop on Hot Topics in Cloud Computing (Hot- base += 0x100; Cloud), 2020. key= key <<1; key |=(0 != (key&0x100))^(0 != (key&0x10)) [34] Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, Ana ^(0 != (key&0x80))^(0 != (key&0x40)); Klimovic, Adrian Schuepbach, and Bernard Metzler. key &= 0xff; Unification of temporary storage in the nodekernel ar- return base+(( uint8_t)key); } chitecture. In Proceedings of USENIX Conference on Usenix Annual Technical Conference (ATC), page 767–781, 2019. Listing 5: rkey generation of SoftRoCE [35] Patrick Stuedi, Animesh Trivedi, Jonas Pfefferle, Radu Stoica, Bernard Metzler, Nikolas Ioannou, and Ioannis Koltsidas. Crail: A high-performance I/O architecture static Queue key_queue;//queue for deregistered keys static uint32_t base= 0x100; // is device-specific for distributed data processing. IEEE Data Eng. Bull., static uint32_t MASK= 0xFFFFFFFF; // 24bit mask 40(1):38–49, 2017. static uint32_t mlx4_get_key(void){ static uint32_t key= 0x100; [36] Konstantin Taranov, Benjamin Rothenberger, Adrian if(key_queue.is_empty()){ Perrig, and Torsten Hoefler. sRDMA: Efficient nic- key+= 0x100; return base+ (key& MASK); based authentication and encryption for remote direct } memory access. In USENIX Annual Technical Confer- uint32_t old_key= key_queue.pop(); ence (ATC), 2020. return base+ (old_key& MASK); } [37] Mellanox Technologies. RDMA Aware Networks static void mlx4_dereg_key(uint32_t old_key){ https:// base += 0x8000000; Programming User Manual, Rev 1.7. key_queue.push(old_key); www.mellanox.com/related-docs/prod_software/ } RDMA_Aware_Programming_user_manual.pdf, 2015.

[38] Shin-Yeh Tsai, Mathias Payer, and Yiying Zhang. Listing 6: rkey generation of mlx4 Pythia: Remote oracles for the masses. In USENIX Security, pages 693–710, 2019.

[39] Shin-Yeh Tsai and Yiying Zhang. A double-edged sword: Security threats and opportunities in one-sided network communication. In USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), 2019.

16