Detecting TCP/IP Connections Via IPID Hash Collisions

Proceedings on Privacy Enhancing Technologies ; 2019 (4):311–328 Geoffrey Alexander*, Antonio M. Espinoza, and Jedidiah R. Crandall Detecting TCP/IP Connections via IPID Hash Collisions Abstract: We present a novel attack for detecting the channels present in the Linux kernel’s implementation presence of an active TCP connection between a re- of global and per-connection IPv4 IPID values. The at- mote Linux server and an arbitrary client machine. The tack’s requirements are as follows: attack takes advantage of side-channels present in the – A server that is a Linux machine running kernel Linux kernel’s handling of the values used to populate version 4.0 or newer. an IPv4 packet’s IPID field and applies to kernel ver- – Access to multiple IPv4 addresses to use as attacker sions of 4.0 and higher. We implement and test this addresses. attack and evaluate its real world effectiveness and per- formance when used on active connections to popular The proposed attack makes use of the Linux behav- web servers. Our evaluation shows that the attack is ior of responding to “unsolicited” SYN/ACKs with a RST. capable of correctly detecting the IP-port 4-tuple repre- This is the default Linux kernel behavior and is also senting an active TCP connection in 84% of our mock described as the proper behavior for handling “unso- attacks. We also demonstrate how the attack can be licited” SYN/ACKs in RFC 793 [27]. An “unsolicited” used by the middle onion router in a Tor circuit to test SYN/ACK is a SYN/ACK for which no SYN was sent and whether a given client is connected to the guard entry so does not represent a potential connection. The num- node associated with a given circuit. ber of IPv4 addresses required for the attack to be reli- In addition we discuss the potential issues an attacker able is at least hundreds, with thousands increasing the would face when attempting to scale it to real world probability that the attack can be carried out. While attacks, as well as possible mitigations against the at- requiring this many may prove a hindrance to attack- tack. Our attack does not exhaust any global resource, ers with a small amount of resources it is well within and therefore challenges the notion that there is a direct the realm of possibility for large botnets or nation-state one-to-one connection between shared, limited resources attackers. and non-trivial network side-channels. This means that By detecting when the Linux kernels changes from simply enumerating global shared resources and consid- using one of its 2048 global IPID counters to using a per- ering the ways in which they can be exhausted will not connection TCP IPID counter, the attack we describe suffice for certifying a kernel TCP/IP network stack to is able to infer the IP-port 4-tuple that corresponds to be free of privacy risk side-channels. an active TCP connection without being an on-path ob- server. The IP-port 4-tuple representing an active TCP DOI 10.2478/popets-2019-0071 Received 2019-02-28; revised 2019-06-15; accepted 2019-06-16. connection is the source address, source port, destination address, and destination port used for TCP com- munication. Our major contributions are as follows: 1 Introduction – We describe a method for using a side-channel present in the Linux kernel’s implementation of We describe a novel attack for detecting the presence global and per-connection IPv4 packet IPIDs to in- of an active TCP connection between an arbitrary fer an active connection’s IP-port 4-tuple. Ours is client and a remote Linux server using information side- the first such attack to infer the existence of a connection completely off-path without exhausting any global resource. *Corresponding Author: Geoffrey Alexander: Univer- – We design and implement a proof-of-concept attack sity of New Mexico, E-mail: [email protected] for using these side-channels to detect the presence Antonio M. Espinoza: University of New Mexico, E-mail: of active connections between arbitrary Internet end [email protected] hosts. Jedidiah R. Crandall: University of New Mexico, E-mail: [email protected] Detecting TCP/IP Connections via IPID Hash Collisions 312 – We provide a detailed analysis and evaluation of the 2 Motivation attack, analyze possible sources of error, and discuss possible mitigations. One common assumption made by many privacy tools using the TCP protocol is that information about the A key novelty of the side-channel attack described in state of an existing connection does not leak outside of this paper, compared to past work, is that it does not ex- the connection itself. This includes information about haust any global resource. To the best of our knowledge, whether or not a connection exists. Many privacy and there are only two existing side-channel attacks in the censorship circumvention tools rely on this to ensure literature where the existence of a TCP/IP connection that this information could only be discovered by an could be inferred: Knockel and Crandall [21] where the on-path attacker. If an attacker were able to detect the global fragment cache was filled and Cao et al. [7] where existence of a connection between a client and a cir- a global challenge ACK rate limit was reached. Note cumvention tool off-path it could allow the attacker the that we exclude attacks that require malicious code on possibility of deanonymizing a client, detecting a hidden the victim machine or an attacker machine behind the service, or other attack vectors. same NAT as the victim [10, 18–20, 29]. Our attack uses One scenario where the ability to detect off-path a per-destination (i.e., not global) duplicate ACK limit connections is useful is the case of a user accessing a for one non-default corner case that we encountered, sensitive website via a Tor [12] bridge, which is a type but is otherwise based on inferring which resource is of relay that is supposed to be unknown to the censor. being used rather than exhausting a specific resource. The attacker may suspect that the user is connecting to This is a major conceptual difference that challenges a bridge and could try to confirm this suspicion. While the notion that there is a direct one-to-one connection there has been evidence of nation-states using active between shared, limited resources and non-trivial net- probing to identify such hidden machines [15], obfus- work side-channels. While past side-channels have had cation protocols such as obfs4 [3] can be used to im- the property of not exhausting global shared, limited pede such probing. Using an attack that could detect resources, such as Antirez’s idle scan [4] for detecting an off-path TCP connection an attacker could attempt open ports, to date such side-channels have been rel- to detect a TCP connection between a suspected Tor atively trivial and could not reveal information about bridge and a Tor directory server after a user opens active connections. What the attack presented in this a connection to the Tor bridge. Since this would de- paper demonstrates is that simply enumerating glob- tect the connection it would not require active probing ally shared resources (rate limits, buffers, caches, etc.) that could be impeded by obfs4 or similar mitigations. and then considering each in isolation is not sufficient Note that once the connection is open the distinction be- for enumerating all possible side-channels that can be tween client and server is interchangeable for the attack used to infer a connection. we present. Note also that six out of 10 Tor directory The rest of the paper is structured as follows: Sec- servers are dual stack [31] allowing an attacker to use tion 2 discusses scenarios that motivate our work. Sec- both IPv4 and IPv6 address when attempting to find tion 3 reviews what an IPID value is, how the Linux IPID hash collisions. As multiple IPv6 addresses are of- kernel generates IPIDs, what a challenge ACK is, and ten assigned to a single machine or network, compared how the Linux kernel handles challenge ACKs. Section 4 to IPv4 addresses, these additional IPv6 addresses pro- discusses the methods for using IPIDs and challenge vide attackers with a much large pool of addresses that ACKs as side-channels to detect the presence of an active could be used to find IPID hash collision. While the TCP connection. Section 5 describes our experimental attack we present focuses on a simple IPv4 only imple- methodology. Then, we discuss our results in testing the mentation, there are many different variations on the attack in Section 6. In Section 7 we discuss the applica- attack to make it practical for any given application. bility of the attack, the challenges it faces “in the wild”, Generally, the attack we describe in this paper pro- common sources of error, and possible mitigations. We vides the attacker with a primitive for inferring the ex- discuss related work in Section 8 and finish with our istence of connections off-path, which violates assump- conclusions in Section 9. tions often made by privacy tools. We focus our experimental methodology on understanding the base accu- racy and speed of the attack on one client/server pair in isolation, whereas a real attacker may have additional Detecting TCP/IP Connections via IPID Hash Collisions 313 flexibility in carrying out the attack and can use an im- relying on Path MTU (PMTU) Discovery to determine proved implementation and/or different tradeoffs.

Load more