When Tolerance Causes Weakness: The Case of Injection-Friendly Browsers

Yossi Gilad Amir Herzberg Bar-Ilan University, Israel Bar-Ilan University, Israel [email protected] [email protected]

ABSTRACT who are incapable of eavesdropping to communication: mod- We present a practical off-path TCP-injection attack for ern TCP implementations randomize not only the 32-bit se- connections between current, non-buggy browsers and web- quence number [14], but also the 16-bit client port [21]; in servers. The attack allows web-cache poisoning with ma- order to successfully inject data to the TCP stream, the licious objects; these objects can be cached for long time adversary must provide valid values to both fields. period, exposing any user of that cache to XSS, CSRF and This belief is even stated in RFCs and standards, e.g., phishing attacks. in RFC 4953, discussing on TCP spoofing attacks (see Sec- In contrast to previous TCP-injection attacks, we assume tion 2.2 of [33]). Indeed, since its early days, most Inter- neither vulnerabilities such as client-malware nor predictable net traffic is carried over TCP - and is not cryptographi- choice of client port or IP-ID. We only exploit subtle details cally protected, in spite of warnings, e.g., by Morris [23] and of HTTP and TCP specifications, and features of legitimate Bellovin [6, 7]. (and common) browser implementations. An empirical eval- We present an attack that allows an off-path adversary to uation of our techniques with current versions of browsers learn the (randomized) client port and sequence numbers, shows that connections with popular websites are vulnera- and thereby inject traffic to the TCP connection. The tech- ble. Our attack is modular, and its modules may improve nique exploits subtle properties of the TCP specification, other off-path attacks on TCP communication. as well as common - and legitimate - behavior of browsers, We present practical patches against the attack; however, which was introduced in the early versions of browsers and the best defense is surely adoption of TLS, that ensures secu- still exists in the modern browsers. Our TCP injection tech- rity even against the stronger Man-in-the-Middle attacker. nique is independent of the victim’s , and allows the attacker to bypass the browser’s same origin pol- icy (SOP) defense [5, 28, 36]. In particular, this allows in- Categories and Subject Descriptors jection of web-pages and scripts in the context of a third- C.2.2 [Computer Systems Organization]: Computer- party web-server, and can be exploited for cross-site script- Communication Networks—Network Protocols ing (XSS), cross-site request forgery (CSRF) and phishing attacks without relying on a vulnerability in the web-server. Keywords 1.1 Network Settings and Attack Outline Web and Network Security; Off-Path Attacks; Browser Se- curity Figure 1 illustrates our network model and outlines our attack. Mallory, the attacker that we consider, is an off-path (spoofing) attacker. Mallory cannot observe traffic sent to 1. INTRODUCTION others; specifically, she cannot observe the traffic between a TCP is the main transport protocol over the Internet, client C and a server S. However, Mallory can send spoofed ensuring reliable and efficient connections. TCP is trivially packets, i.e., packets with fake (spoofed) sender IP address. vulnerable to man-in-the-middle (MitM) attackers; they can Mostly due to ingress filtering [4, 11, 17], IP spoofing is intercept, modify and inject TCP traffic. However, it seems less commonly available than before, but it is still possible that MitM and eavesdropping attacks are relatively rare in with many ISPs1, see [1, 8, 10]. Mallory can use an ISP practice, since they require the attacker to control routers that allows IP-spoofing; hence, the spoofing attacker model or links along the path between the victims. Instead, many is (still) realistic. practical attacks involve malicious hosts, without MitM ca- Our attack requires that the user enters Mallory’s web-site. pabilities, i.e., the attackers are off-path. This allows Mallory to run a restricted script in the user’s There is a widespread belief that TCP communication is browser sandbox. Specifically, this script is restricted by reasonably immune to off-path attackers; i.e., that such ad- same origin policy [5, 28] and can only communicate via the versaries cannot inject traffic into a TCP connection. The browser, i.e., request (and receive) HTTP objects (no ac- reasoning is that TCP specifications and implementations cess to TCP/IP packet headers). Following [3], we refer to were enhanced to provide security against such adversaries, such attacker-controlled scripts as puppets. Puppets are usu-

Copyright is held by the International World Wide Web Conference 1 Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink Apparently, there is still a significant number (16%-22%) to the author’s site if the Material is used in electronic media. of ISPs that do not perform ingress filtering and allow their WWW 2013, May 13–17, 2013, Rio de Janeiro, Brazil. clients to spoof an arbitrary, routable source address [1, 8]. ACM 978-1-4503-2035-1/13/05.

435  -+).  /                       !"##    $ %&               '&   (           ) (  "#&*  -%+'+,.  !+   $ ,&       "#&*  &           (   )

Figure 1: Network Model and Attack Outline. ally easier to obtain and control compared to zombies, since tacks use the malware to identify a ‘victim TCP connection’ browsers normally run scripts automatically upon opening by probing the client’s system variables (e.g., by executing a web-site, while zombies require installation (of malware). ‘netstat’); then, the attacker learns the server’s sequence The attack has five steps, which are illustrated in Figure 1. number by sending spoofed packets with various sequence In order to circumvent the browser’s same origin policy, the numbers, the malware identifies when a packet is accepted off-path attacker sends (in the final step of the attack) a by the client (has valid a sequence number) by reading sys- forged response for a request that the puppet sends to S. tem counters. A significant challenge in practice, that was This response may contain a malicious object, typically a not considered in [26, 27], is that many clients connect to the script, which we refer to as persistent XSS, since it may Internet via NAT devices; in this case, the external port (al- be cached, for a long time (theoretically forever). Persis- located by the NAT) is likely to differ from the one observed tent XSS can circumvent Content Security Policy (CSP) [30] by the malware, which runs on the client. Moreover, assum- and other defenses (e.g., [15]), and perform cross-site script- ing a local malware agent to perform web-spoofing (injection ing (XSS), cross-site request forgery (CSRF) [32], phishing, of false content) is a strong requirement. In fact, a malware defacement and more. can display false content to the user and trick him or her Organization. In the rest of the Introduction, we discuss to believing it is genuine (without complex TCP injections); related works and summarize our contributions. Section 2 this is a common attack vector. presents a modular overview of our attack, and compares In contrast to [26, 27], the attack that we presented in [13] to related attacks. Section 3 presents our client-port de- requires only a puppet running on the victim machine (sim- randomization technique. Section 4 shows how Mallory can ilar to this work). However, [13] as well as [20] exploit a spe- learn the server sequence number. Section 5 discusses ex- cific operating-system implementation of the TCP/IP stack. ploits, Section 6 presents defenses, and Section 7 concludes. Specifically, the use of globally-incrementing allocation mech- anisms for both ports and IP-IDs, as exists in the Win- 1.2 Related Works dows operating systems. Although Windows is very popu- lar, avoiding these requirements from the operating-system 1.2.1 Off-Path TCP Injection Attacks still significantly expands the base of vulnerable clients; fur- thermore, these specific weaknesses may be removed in fu- TCP injections are easy for implementations that use pre- ture versions of the operating system (as suggested in our dictable initial sequence numbers (ISNs). This was observed correspondence with Microsoft’s security team). already by Morris at 1985 [23] and abused by Mitnick [29]. Later, at 2001, Zalewski found that most implementations still used predictable ISNs [35]. However, by now, most or 1.2.2 Address-Based Authentication and SOP all major implementations ensure sufficiently-unpredictable TCP injection attacks were key to some of the most well ISNs, e.g., following [14]. known exploits, specifically, attacks against address-based Since the adoption of randomized initial sequence numbers client authentication, e.g., see [7]. However, as a result, and until recently, TCP was widely believed to be immune address-based client authentication has become essentially to off-path attacks. One exception was the off-path attacks obsolete, and mostly replaced with cryptographic alterna- on TCP of [34], which disconnected BGP connections that tives such as SSH and SSL/TLS. use constant client ports. However, this attack was consid- However, web security still relies, to large extent, on the ered as reflecting a specific vulnerability of BGP availability. Same Origin Policy (SOP) [5, 28], i.e., on domain/address- In particular, with the notable exception of Windows, most based server authentication. Many attacks are based on cir- current operating systems adopted algorithms to make it cumventing SOP; however, these attacks are usually based harder to predict the client port. This and other counter- on implementation bugs, mostly in the sites, and some in measures make this attack inapplicable today [9]. the browsers or middleboxes; see [36]. Especially related to The first ‘proof of concept’ showing that off-path attackers our attack is the HTTP response splitting attack [18], which may still be able to inject data to the TCP stream, even with exploits the loose separation between HTTP responses. randomly-chosen client ports and initial sequence numbers, Notice that using TCP injections to attack address-based was in [20]. This was recently improved to efficient off-path server authentication is more challenging than using it to TCP injection attacks [13, 26, 27]. However, in this work, we attack address-based client authentication. In attacks on significantly improve upon these works, as we now describe. address-based client authentication, the off-path attacker The attacks in [26, 27] require malware running on the sends the initial SYN to open a new connection; hence, she client machine (albeit with limited privileges). These at- knows the source and destination IP addresses and ports;

436 she ‘only’ needs to predict the server’s sequence number. In 2.1 Learn Connection 4-Tuple contrast, to attack address-based server authentication, the The first task is to identify a TCP connection to attack, i.e., off-path attacker must also identify the client’s port. a ‘victim-connection’. In [20], the adversary actively scans the client machine for an existing connection with a partic- 1.3 Contributions ular server. As indicated in [20], this technique is typically The basic contribution of this work is in showing that detected and blocked by firewalls. In [26, 27], the attacker TCP injections can be very practical, in terms of both effi- runs a rogue application (malware) on the client machine. ciency and of requirements: no dependency on malware or The malware monitors connections that the client has with non-recommended implementations of TCP/IP, as in previ- servers, e.g., by executing netstat. ous works. This has significant implications. In the short This work and [13] rely on a weaker assumption: that the term, patches should be deployed to prevent our techniques; user’s browser runs a puppet, i.e., a malicious script, down- we present such defenses in this paper. Most significantly, loaded and executed automatically from the attacker’s web- we hope that this work will help promote the use of crypto- site, e.g., www.mallory.com, to which the user innocently en- graphic defenses, providing strong security assuming MitM tered. This puppet establishes the victim connection (step 1 attackers, rather than assuming that attackers only have off- in Figure 1). Therefore, Mallory (attacker) knows the client path capabilities. and server IP addresses, as well as the server’s port. It is We identify the main challenges for off-path TCP injec- only left to identify the client port (step 2 in Figure 1). tions, and build our attack modularly, with independent In [13] the attack additionally assumes sequential port al- modules handling different phases and tasks. This allows location; this allows the attacker to guess the correct client- some of our modules to be used independently of others. As port of the connection that the puppet establishes. How- one important example, we present a technique for client ever, many operating systems try to avoid predictable port port de-randomization. Specifically, we show how to predict allocation, as recommended in RFC 6056 [21]. In this paper, the client port when the client’s operating system uses the we successfully attack the Simple Hash-Based Port Selection Simple Hash-Based Port Selection (SHPS) Algorithm recom- (SHPS) algorithm, recommended in [21] and implemented mended in [21]. Since SHPS is embedded into Linux, it is in Linux. SHPS applies to many clients, e.g., running An- extensively used, e.g., by Android and NAT devices, which droid or connect to the Internet through NATs, which often are often based on embedded Linux. run Linux (which uses SHPS). Our technique, described in Our attacks, while efficient and practical, are non-trivial Section 3, is based on an observation from the TCP specifi- and based on in-depth understanding of the operation of cation, i.e., is independent of the platform (cf. to [13, 20]). TCP and HTTP. In particular, we exploit the fact that (cur- rent) browsers process invalid HTTP responses, by handling 2.2 Learn Sequence Numbers them as payload with a default response header. This be- The next step after identifying the victim-connection is havior may have helped in debugging of early HTTP 1.1 learning one or both connection’s sequence numbers (step 3 implementations, but currently seems unnecessary and dan- in Figure 1); knowledge of the server’s (client’s) sequence gerous; browsers should be patched to avoid it. number allows her to inject data to the connection, imper- Lastly, our cache-poisoning exploit significantly extends sonating as the server (client). Observing the sequence num- compared to known exploits for TCP injections. bers directly from traffic requires an on-path attacker (i.e., eavesdropping capability). Off-path TCP injection tech- 2. A MODULAR ATTACK SCHEME niques use different methods to infer the sequence numbers. In this section we present a modular scheme for a TCP in- jection attack, breaking the attack into three separate tasks: 2.2.1 Operating-System Specific  Learn Connection 4-Tuple. The attacker learns the four pa- In the attacks of [13, 20], the adversary exploits the global rameters of a TCP connection between a client and a server, counter IP-ID implementation in Windows. The attacker i.e., their respective IP addresses and ports. observes the difference in the IP-ID field in packets that she  Learn Sequence Number(s). The attacker learns the cur- receives from the client to learn the number of packets that rent sequence number, for packets sent from the server to the client had sent to other destinations (since each packet the client. In some attacks, the attacker also learns the se- increments the IP-ID). quence number for packets from the client to the server. In these attacks, the attacker sends to the client spoofed  Exploit. A non-trivial task is to find how to successfully probe packets (that appear to be from the server). The exploit a TCP injection ability; this task may depend on the client responds to a probe only if it specifies an invalid server properties of the attack, e.g., required length of connection. sequence number, i.e., outside the client’s flow-control win- A modular scheme was not presented in previous off-path dow. The client sends the responses to the server and the injection attacks [13, 20, 26, 27], however, these attacks fol- attacker learns whether the client responded by observing low our scheme. By explicitly stating the scheme, it is easier the IP-ID field (in packets that she receives from the client to understand new attacks and identify cases where a new in a different connection). module, improving the solution to one task, can improve an After learning the server’s sequence number, the tech- earlier attack; and, on the other hand, protocols and systems niques in [13, 20] exploit Windows TCP implementation, should be designed to make each step (task) infeasible. which filters incoming packets according to their acknowl- The following subsections present the three tasks that edgment numbers (this mechanism is non-standard). This compose the scheme; for each task, we compare our im- implementation allows the attacker to learn which acknowl- plementation of a building-block achieving the task, to im- edgment number is valid (passes filtering) by again observing plementations in previous attacks. Table 1 summarizes our the IP-ID side channel; the valid acknowledgment number discussions below. equals the client’s sequence number.

437 Learn Connection 4-tuple Learn Sequence Numbers Exploit Exploit global IP-ID counter impl., Active probing for connection Lkm [20] both seq. # obtained None (Windows client, no firewall) (Windows client) Monitor connections, Read client system counters, Qian XSS, CSRF, phishing e.g., with netstat server’s seq. # obtained et al.[26, 27] (no TLS/SSL) (Malware) (Malware; in [26] also seq. # checking firewall) Establish connection, exploit Exploit global IP-ID counter impl., Gilad and XSS, CSRF, phishing sequential port allocation impl. both seq. # obtained Herzberg [13] (No TLS/SSL) (Puppet, Windows client) (Puppet, Windows client) Establish connection, Exploit browser behavior, As above plus This work client port de-randomization server’s seq. # obtained web-cache poisoning (Puppet, client behind firewall) (Puppet, no TLS/SSL) (No TLS/SSL)

Table 1: Off-Path TCP Injection Attacks: Building Blocks. In brackets: requirements.

2.2.2 Sequence Number Inference Attacks address and port2. In this section we describe a new tech- In the sequence number inference attacks [26, 27] the at- nique that allows Mallory to learn the fourth parameter of tacker sends spoofed packets to the client machine. Each the TCP four tuple: the client port. packet specifies a different sequence number. The observa- In Windows, learning of the client port is trivial, since tion in [26] is that if the sequence number is not close to the port numbers are assigned consecutively (for all destina- value that the client expects, then some network firewalls tions). However, it is widely accepted that this is insecure, will discard the packet. The observation in [27] is that the and that the client port should be ‘unpredictable’ to an off- client will respond to the packet only if its sequence number path attacker. RFC 6056 [21] presents five recommended is in the flow-control window. Both attacks use the mal- client port selection algorithms to secure against off-path ware to read system counters, which tell whether the client adversaries. We focus on their third suggestion: ‘Simple received the attacker’s packet ([26]) or responded to it ([27]). Hash-Based Port Selection’ (SHPS). SHPS is used by the Linux OS kernel in versions 2.6.15 and 2.2.3 Inject and Observe above, i.e., from the year 2006; it is embedded in all Android versions and many NAT devices. Extensive deployment at We use a different approach than the previous attacks; our the NAT level makes SHPS the de facto port selection algo- technique, called ‘Inject and Observe’, assumes a standard rithm for many clients, even if the client machine does not TCP/IP stack and does not rely on an operating system use this algorithm. specific leakage, e.g., via the IP-ID field (cf. to [13, 20]). SHPS chooses a pseudo-random initial port for each des- Additionally, since we assume only a puppet running on the tination (server) IP-address; a new connection between the client machine, Mallory cannot receive feedback from system client and that destination uses the current port which is files (cf. to [26, 27]). then incremented, i.e., a per destination port-counter. SHPS In the Inject and Observe technique, described in Sec- is expected to be secure against off-path adversaries, since tion 4, Mallory sends to the client data which is spoofed as these are not aware of the initial port. coming from the server in response to queries that the pup- However, we show a method allowing an off-path attacker pet sends; this phase relies on a very common browser be- (Mallory) to predict the next port assignment by SHPS. We havior that allows the puppet to retrieve the injected data begin, in Section 3.1, with port elimination and testing. This when buffered in the flow-control window (maintained by is a simple technique, where Mallory eliminates (or ‘marks’) the browser). The data contains the server’s sequence num- a port p, and the puppet tests if the next-assigned port was ber that Mallory guessed; hence, when read by the puppet, supposed to be p. By repeating this for many ports, even- Mallory learns a valid sequence number. tually a match happens, allowing the puppet to predict the 2.3 TCP Injection: Exploits next-assigned port. Then, in Section 3.2, we present a meet in the middle optimization method, which applies elimina- The final building block of the attack is an application of tion and testing concurrently to multiple ports, improving the injection (steps 4, 5 in Figure 1), typically to inject a the efficiency of the prediction technique. We complete this malicious object into the connection. The malicious object section with Subsection 3.3, which discusses practical chal- may be cached, and the attacker can easily make sure it stays lenges and presents an empirical evaluation. in cache (theoretically forever); we refer to such a malicious, long-lived object or script as a persistent XSS. 3.1 Port Elimination and Testing Web-cache poisoning with a persistent XSS allows the at- We now describe a method for eliminating a client port p, tacker long-term use of many exploits, including cross-site and then testing if p is the next port to be assigned by scripting, cross-site request forgery and phishing (suggested the client’s port-selection algorithm. Specifically, Mallory in previous works [13, 26, 27]), bypassing the state of the sends a spoofed SYN packet from the client’s IP address and art defenses such as CSP. See Section 5. port p, to the server (S). This causes S to open a (pending) connection with port p of the client. As a result, the server 3. CLIENT PORT DE-RANDOMIZATION will refuse additional SYN packets from port p of the client, namely, port p is eliminated. After port p was eliminated, The first step in performing a TCP injection is to identify the victim-connection. As described in Section 2.1, Mallory 2Our initial discussion assumes that the server has one IP uses the puppet to establish the victim-connection; there- address; in practice, large servers often have multiple ad- fore, she knows the client’s address as well as the server’s dresses, we refer to this issue in Subsection 3.3.

438 Figure 2: Port De-Randomization, Elimination and Testing. the puppet tries to establish a new connection with S; the a SYN packet, with the same source port (p), but, almost response time gives an indication if the port was eliminated always, with a different sequence number than that set by or not. We now provide the details to our technique, illus- Mallory in the first step. Therefore, S will discard this packet, trated in Figure 2. see TCP specification [25] page 69; the TCP connection will In the first step (see Figure 2), Mallory sends to S a spoofed not be established. The client operating system will retry to TCP SYN with the source address of C and from port p, establish the connection several times and return an answer which is the port that Mallory tests. When S receives this to the puppet only after several seconds. Often, this answer SYN, it creates a connection entry for will be due to exceeding the maximal number of retransmis- and assigns it the SYN-Received state. S then sends a sion attempts; alternatively, the connection may be estab- SYN+ACK response to C; a stateful firewall that connects lished, but again only after several seconds, when the server C to the network (see illustration in Figure 1) will discard closes the (spoofed) pending connection. In both cases, the the unsolicited SYN+ACK packet from S (as C did not send delay is much larger than in the case that the client used a a matching SYN); as a result, S stays at the SYN-Received port different from p. state for a relatively long time (in our experiments below, In order to ‘clean up’ after testing port p, Mallory sends this was typically 10 − 20 seconds for popular web-servers). a reset (RST) packet that corresponds to her spoofed SYN; In the second step (see Figure 2), the puppet establishes a this releases the server’s resources in case that these are still TCP connection with S, by requesting the browser to embed allocated to the connection. an image from S in the puppet’s web-page. The puppet If the port p was indeed the port that the client tried to requests an image from domain i.mallory.com, an attacker- use (in connecting to S), then the attacker can now predict controlled domain that is mapped to the IP address of S. that on the next connection-open by the puppet to S, port The prefix counter i ensures that each request uses a unique p + 1 will be used. Otherwise, the attacker can repeat the sub-domain; this prevents reuse of an existing connection. process, until eventually successful. This would work - but In the third step (see Figure 2), the puppet evaluates not efficiently, requiring approximately 215 iterations until whether p was the connection port and informs Mallory. success (since the port field is 16 bits long). Evaluation is easy, based on the response time, which is very different in the two cases - when the client tried to use 3.2 A Meet-in-Middle Optimization the ‘eliminated’ port p, and when it used a different port. In this subsection we present a meet-in-middle optimiza- If the client port selected by the operating system is not p, tion, that reduces dramatically the time and communica- then C and S will establish a TCP connection, over which tion involved in the port de-randomization process. In order the browser will request the image. Usually the servers will to improve de-randomization performance, Mallory uses the refuse the request immediately, since the browser specifies puppet to establish multiple connections to the server and in the a HTTP request’s ‘Host Header’ a sub-domain of eliminate ports simultaneously. mallory.com and not the server’s domain (e.g., s.com), due Let π denote the number of possible ports for a connec- to the DNS mapping in step 2. Hence, most servers will tion between C and S. Since the port field is 16-bits long, close the connection (others might return a HTTP not found π ≤ 216 (π is often significantly smaller than 216, see next message), and the puppet will receive an error feedback from subsection). In order to improve de-randomization run-time, the browser after roughly two C-S round-trip times (RTTs), Mallory uses the puppet to establish multiple connections to which is normally much less than one second. the server and eliminate ports simultaneously. In contrast, if the operating system selects p as the client In the first phase of the de-randomization process,√ Mallory port, then C will try to establish a connection, i.e., send performs port-elimination (described above) on π ports,

439      during the exhaustive search phase, i.e., overall at most 512 packets of 40 bytes each, in total 20KB.    The puppet requests at most 256 objects during the meet   in the middle phase; since the browser allows simultaneous requests for 15 objects, the number of ‘request-iterations’  256  during the meet in the middle phase is at most 15 = 18.       Each iteration takes roughly two C-S round-trip times (RTTs). In total, the iterations take 36 RTTs, which are 3-7 seconds (for typical Internet RTTs of 100-200 milliseconds). The ex- Figure 3: Port De-Randomization, Meet-in-the- haustive search phase has at most 8 iterations which perform Middle Optimization. At the top are ports allocated one after the other, i.e., requiring 16 RTTs, i.e., typically by the operating system, illustrated by the arrows; about 1.5-3 seconds. numbers with underscore mark the connection num- ber. At the bottom are ports that Mallory eliminates. 3.3 Real-World Challenges and Evaluation This subsection describes practical challenges in perform- ing client port de-randomization and presents an evaluation √ √ π−1 of our technique on connections with popular web-servers. specifically, the√ ports {di πe}i=0 . In this phase, the pup- pet establishes π connections to S, which we number by the order of establishment. See illustration in Figure 3. 3.3.1 Challenges In order to use the puppet to establish multiple connec- A. Multiple Server IP Addresses. Large web-sites often map tions to S, Mallory must circumvent the fact that browsers their domains to multiple IP addresses; this allows load dis- which support HTTP 1.1 would normally send multiple re- tribution on several server-machines and shorter round-trip quests to the same server using the same ‘persistent’ connec- time to the client, who connects to a physically close server. tion. Circumvention of this mechanism is performed by ma- However, this induces a difficulty on our attack since we wish to learn the port-counter associated with the specific server nipulating DNS mapping of attacker controlled√ domains: the π−1 IP address that the client uses. puppet requests objects from {i.mallory.com}i=0 ; Mallory controls the DNS records for these domains and maps them Usually, the attacker can identify a small set of possible IP to the IP address of S. Browsers use domain-names to iden- addresses just by the client’s physical location or ISP (e.g., our ISP provides six addresses for www.google.com). These tify servers and not IP addresses; hence, this technique,√ which we verified on Chrome and , opens π new possibilities are eliminated with a short validation phase at connections to S. the end of the port de-randomization process: after Mallory We assume that the user does not create an independent learns the value of the port-counter for some server IP, she connection with S during port de-randomization. According sends a spoofed SYN to the server using the next port; the to the SHPS algorithm, client port allocation is sequential; puppet tries to retrieve an object from the server’s domain therefore, one of these connections will use a port eliminated (cf. to attacker controlled domains as described in Subsec- by Mallory, the wait-time for feedback from that connection tion 3.1). If the puppet receives a feedback after a relatively is significantly longer than in other connections and this is long delay, then Mallory de-randomized the port counter for the correct IP address; otherwise, Mallory performs the de- identified by the puppet; let x denote the number√ of that connection. Since the puppet had established π−x connec- randomization process again for another IP address. B. SYN Flooding. Our port de-randomization technique tions after connection x, the current√ value of the operating√ √ system’s client-port counter is k π−x for some 0 ≤ k < π. requires sending π SYN packets during the meet in the This completes the meet in the middle phase. middle phase, i.e., create up to 256 ‘half-open’ connections. The following phase of the de-randomization process is an This might be identified by some web-servers as a SYN flood- ing attack [9], i.e., an attempt to clog the server’s connec- exhaustive search that is performed√ in iterations to identify the current port of the remaining π possibilities. In each tions backlog; we now discuss the defenses suggested in [9] exhaustive search iteration, Mallory performs the elimina- that might be triggered and influence our technique. tion process simultaneously on half of the remaining ports The first defense is to filter connections from the client’s and the puppet requests only a single object. If the port IP address. This defense blocks our attack, but fails to miti- allocated by the operating system is one of those tested by gate SYN flooding when the attacker can spoof her address. Mallory, then the feedback from S to the puppet is delayed. Moreover, this defense may be abused by such IP-spoofing Since each iteration eliminates half the possibilities, the ex- attackers to deny service from legitimate clients by sending √ spoofed SYNs using their addresses. haustive search requires dlog2 πe iterations to complete. The second defense is to use SYN-cookies, i.e., avoid state keeping at the web-server until the TCP handshake com- 3.2.1 Analysis pletes. In this case, the server will reply to the client’s SYN The maximal number of simultaneous connections that even if it uses a port that was ‘eliminated’ by Mallory. SYN- the puppet may open changes according the version of the cookies encode the connection state in the server’s sequence browser; this value is at least 15 in all modern browsers number, which is returned to the client in the SYN+ACK and typically increases with new releases. Based on this, we packet; this allows the server to reconstruct its state when estimate the amount of transmitted data and time required receiving the following ACK packet from the client. How- to perform port de-randomization.√ √ ever, SYN-cookies are not widely used, since they come ‘at Mallory sends π ≤ 216 = 256 spoofed SYNs during a high price’; they allow the server to specify only one of the meet in the middle phase and a similar number of SYNs four options for maximal segment size (MSS), which may

440 degrade service for some clients. Furthermore, SYN-cookies reduce the entropy in the server’s sequence number which 0.08 1. All Failures may allow an attacker to guess its value, see [16]. 0.07 2. ‘Filtering’ Server Failures Finally, the server may reduce its TCP timers; this will 0.06 3. ‘Stateless’ Server Failures release server resources faster, but may deny service from 0.05 clients with long response time. This defense does not pre- 0.04 vent our attack, but forces a tighter time constraint on it: 0.03 the puppet must perform all requests until timeout or Mallory 0.02 must ‘refresh’ her spoofed SYN packets. 0.01 All these defenses have disadvantages which may discour-

Port De-Rand. Failure Rates 0 age servers from deployment. A typical solution, suggested 8 16 32 64 128 256 512 1024 in [9] and [22], is a hybrid approach: the server keeps a small Number of Top Sites Tested (log-scale) state for each connection, e.g., using SYN cache [9], and em- ploys one of the defenses described above when it identifies a SYN flooding attack. Indeed, the majority of servers in Figure 4: Port de-randomization failure rates, as a our experiments did not employ IP filtering or SYN-cookies function of web-site popularity. Rates are the aver- even after we sent the spoofed SYNs; this allowed us to de- age of two runs: one when puppet runs on Firefox randomize the client port with high success rates (see next). and the other on Chrome. Error-bars mark stan- dard deviations.

3.3.2 Evaluation Setup. We evaluated our technique on connections with 4. LEARNING THE SEQUENCE NUMBER popular web-sites, specifically, the top 1024 sites in the Alexa We now proceed to the second building block (and attack ranking [2]. We used a Linux client (kernel version 3.2.0) phase), as in the design presented in Section 2.2.3. At the with a local IP-tables host level firewall (version 1.4.12). end of this phase Mallory learns the 32-bit server’s sequence The Linux kernel uses the range [32768, 61000] for choosing number; this allows her to send data to the client, imper- client ports; this is a significantly smaller range than all pos- sonating as the server. We assume that Mallory has the sibilities for the 16-bit port field. This observation helps to parameters of the victim-connection, in particular, that she improve the search run time. identified the client port; e.g., by executing the technique We placed the attacker and client machines in the same described in Section 3 (or other methods, see Table 1). network, which allowed the attacker to send packets to the Our attack exploits an under-specification of HTTP 1.1 [12]. Internet using the client’s IP address (in reality, the attacker Subsection 4.1 provides required background, explaining how would connect through an ISP that does not perform ingress browsers handle HTTP responses that they receive. Sub- filtering, see discussion in Section 1.1). The client and at- section 4.2 describes our search technique. Subsection 4.3 tacker connect through different physical interfaces of a net- presents the requirements of our search technique and presents work switch, this prevents the attacker from observing pack- an empirical evaluation in the real-world. ets to/from the client, i.e., attacker is off-path. The client and attacker connect through 10Mbps link to the Internet. We performed our experiments when the puppet runs in 4.1 HTTP Request/Response Handling Mozilla Firefox (version 16.0.2) and (version As of HTTP 1.1 [12], clients can send multiple requests to 23.0.1271.64). We verified our port prediction by executing the same server in a single (‘persistent’) HTTP connection; netstat on the client side and observing the selected client furthermore, clients can send these requests in pipeline, i.e., port in the following connection. without waiting for response to one request before sending Results. Figure 4 shows the failure rates as a function of the next request. In order to allow browsers to match be- web-site popularity. Port de-randomization failed for ap- tween each response and the corresponding request, the re- proximately 7% of the 1024 websites that we tested (see sponses are sent by the server, exactly in the order in which Figure 4 line 1); i.e., a 93% success rate. the client had sent the requests. We also measured the deployment and effect of the SYN More specifically, the browser (client) keeps a FIFO queue flooding defenses described above: when failed to de-randomize of pending HTTP requests for each connection, and handles the port, we tested whether the web-site allows new connec- them one by one, as follows. In order to handle the (oldest) tions from the client, i.e., whether the client’s IP address is request, the browser reads the bytes in TCP’s receive-buffer filtered; these ‘filtering’ servers were approximately 3% of (allocated per-connection) when they become available. The the servers (see Figure 4 line 2). If the client’s IP address browser expects to find the matching response in the begin- was not filtered, we tested whether the client can connect ning of TCP’s receive-buffer. Next, the browser parses the to the server using an ‘eliminated’ port; if it can, then the response as per [12], embedding it in the web page. This server either has a short timer, that had elapsed by the time process continues until there are no more requests awaiting we tested the correct port, or uses the SYN-cookies defense reply from the particular connection. (i.e., server does not ‘remember’ the spoofed SYN); these Unfortunately, the HTTP standard [12] does not specify ‘stateless’ servers were approximately 1% of the servers (see what the browser should do when the receive-buffer contains Figure 4 line 3). Port de-randomization for other servers data which is not a valid HTTP response. We tested the (approximately 3%) failed due to other errors; e.g., some- current versions of the three most popular browsers (Internet times we were not able to retreive the server’s IP address Explorer, Firefox and Chrome), and all of them handled this (probably due to DNS filtering at the network or ISP level). situation as follows: the browsers treat all available data in

441         % & '(( %%(()    *  +#% ,-)   ) +. $   /0 +   !!!  ! $ ! +1 ,),-!.+ 123  45*1 ,),6!.+ 45*1 ,),7!.+  "# $    9**: :  %   45*1 ,),8!.+      % %+ % #% 45*1!.+

Figure 5: Server Sequence Number Learning Technique. the receive-buffer as payload of a response with the following   ‘default’ HTTP header:

HTTP/1.1 200 OK           Content-Type: text/html; charset=us-ascii Content-Length: available-data-size Figure 6: The state of wnd after ‘inject’ step. The browser returns this ‘response’ to the requesting mod- ule, normally, the browser’s rendering engine or a script/applet. The browsers do not break the existing TCP connection, and the cyclic space of acknowledgment numbers), see TCP spec- continue processing responses to requests sent over it3. The ification [25] page 72. Therefore, if we select the acknowl- following subsection explains how we exploit this behavior edgment number randomly, there is a 50% chance that it to learn the server’s sequence number. would be ignored. The solution is simple: Mallory sends two packets for each sequence number, one specifies Ack = α for 4.2 Inject and Observe some α ∈ {0, 231}, and the other specifies Ack = α + 231; In this subsection we present the server sequence num- this ensures that the Ack number in one of the two pack- ber learning technique which is illustrated in Figure 5. The ets is valid. Hence, exactly one of the packets will contain technique has two steps: (1) Inject and (2) Observe. ‘good’ sequence and acknowledgment numbers, and its data (1) Inject step. In this step, Mallory injects data into the is saved in the receive-buffer (wnd). Namely, after this step, stream of HTTP responses sent from the server (S) to the C’s victim-connection wnd is as illustrated in Figure 6. client (C). This data is ‘observed’ (read) in the following During the ‘inject’ step, the puppet ensures that there is step, which allows Mallory to determine the server’s sequence always at least one request waiting for reply in the browser’s number. queue; this by generating two initial requests and sending a Let wnd denote the browser’s receive-buffer for the con- new request when a response arrives (these requests were nection and |wnd| denote its size. In order to inject the data, removed for readability from Figure 5). The reason that 32 one request must always be enqueued is that when there are Mallory sends to the browser 2 packets, spoofed to ap- |wnd| no pending requests, some browsers clear the receive-buffer pear to be from S (on its victim-connection with C). The ith (those will discard the injected data). packet has server sequence number i · |wnd|, and contains as (2) Observe step. In this step, the puppet makes preva- payload pad||page(i), where pad is an easily-removable ‘pad’4 lent requests to the server, until it reaches the data injected and page(i) is a simple web-page defined as follows: by Mallory in the previous step. Similarly to the previous ‘inject’ step, the puppet maintains at least one request en- queued until this phase completes (see Figure 5). Each re-