Lost in Network Address Translation: Lessons from Scaling the World's
Total Page:16
File Type:pdf, Size:1020Kb
Lost in Network Address Translation: Lessons from Scaling the World’s Simplest Middlebox Vladimir Olteanu Felipe Huici Costin Raiciu U. Politehnica of Bucharest NEC Europe Ltd. U. Politehnica of Bucharest Abstract work Function Virtualisation (NFV) and is more than just To understand whether the promise of Network Function hype: all of the major network operators have gotten to- Virtualization can be accomplished in practice, we set out gether to specify an architecture for Network Function Vir- to create a software version of the simplest middlebox that tualization [20] and to allow chaining different functions on keeps per flow state: the NAT. the same traffic flow [10]. While there is a lot of literature in the wide area of SDN in There is already a growing body of research on how we general and in scaling middleboxes, we find that by aiming should approach NFV: the basic recipe is to use hardware to create a NAT good enough to compete with hardware ap- load balancing (e.g. OpenFlow) to split traffic to a number pliances requires a lot more care than we had thought when of commodity servers, as proposed by Flowstream [7]. To we started our work. In particular, limitations of OpenFlow implement middlebox functionality, the simplest choice is to switches force us to rethink load balancing in a way that use existing apps running over Linux; however major gains does not involve the centralized controller at all. The result can be made in both performance and ease of deployment is a solution that can sustain, on six low-end commodity if we restrict the programming language for middleboxes as boxes, a throughput of 40Gbps with 64B packets, on par proposed by ClickOS [12] and FlowOS [3]. with industrial offerings but at a third of the cost. One basic question, though, remains unanswered: can we To reach this performance, we designed and implemented implement scalable network functions on commodity hard- our NAT from scratch to be migration friendly and opti- ware that achieve packet rates similar to hardware appli- mized for common cases (inbound traffic, many mappings). ances? In this paper we set out to answer this question for Our experience shows that OpenFlow-based load balanc- network address translators, or NATs. Our goal is to build a ing is very limited in the context of NATs (and by relation scalable software-based NAT with comparable performance NFV), and that scalability can only be ensured by keeping to hardware middleboxes. the controller out of the data plane. We are acutely aware that NATs are far from exciting. However they also are the simplest and most popular form of middlebox that maintains per-flow state. NAT function- 1. INTRODUCTION AND MOTIVATION ality is embedded in almost all network appliances including Hardware middleboxes are as numerous as switches and routers, application security appliances, service gateways in routers in enterprise networks [19] and can handle packet cellular networks, etc. Scaling a NAT is therefore the lowest- processing speeds in the tens of millions. However, they common denominator in scaling more complex appliances: come with steep prices and are very difficult to scale or up- if we can't build a fast, scalable NAT, there is little hope grade: such operations require hardware purchases. Run- of building more complex functionality that performs well. ning network processing on commodity hardware is the ob- Finally, the depletion of the IPv4 address space means these vious solution: software processing can sustain reasonable boxes are in high demand, as more network operators are packet-level speeds for simple processing by bypassing the contemplating deploying carrier-grade NATs. network stack [9,18], and its biggest selling point is the ease Designing a scalable NAT seems an easy task at first sight, with which software can be scaled or upgraded. Scalabil- as there is a large body of literature on Software Defined ity in particular is crucial, as it allows the network to dy- Networking that we can rely upon including [6, 16, 17, 21]. namically dimension its resources in response to load, lead- Existing works do not focus on specific network functions ing to energy savings and smaller up-front costs. Running or hardware, and thus fail to take into account crucial lim- middleboxes on commodity hardware has been termed Net- itations of OpenFlow switches or NAT requirements (e.g., require that both traffic directions hit the same box). In Permission to make digital or hard copies of all or part of this work for personal or fact, these restrictions heavily restrict the set of feasible so- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation lutions to the point where existing works do not apply. on the first page. Copyrights for components of this work owned by others than the Instead, we took a clean-slate approach at designing a fast, author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or scalable NAT for our specific OpenFlow switch. We have de- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. veloped novel data structures that allow fine-grained locking HotMiddlebox ’15 August 21, 2015, London, United Kingdom to enable cheap migration, while offering high throughput. c 2015 Copyright held by the owner/author(s). Publication rights licensed to ACM. Finally, we present daisy migrate, a state migration that ISBN 978-1-4503-3540-9/15/08. $15.00 keeps the centralized controller off the data path. DOI: http://dx:doi:org/10:1145/2785989:2785994 10Gbps Our experimental results show it is possible to perform NAT network address translation at 40Gbps with 64B packets on Inside 40Gbps NAT six low-end commodity servers coupled to a 10Gbps Open- network NAT Flow switch. Additionally, our NAT can seamlessly scale Outside 40Gbps NAT up and down as load fluctuates, and can be easily extended network by adding more servers: e.g., adding 9 more servers would IBM G8264 NAT allow us to process 100Gbps with 64B packets. This implies NAT that NFV seems indeed feasible: at around thirty thousand 1Gbps dollars, this software NAT is cheaper than existing CGN CONTROLLER appliances1. This paper begins by discussing in x2 the design choices we made while building our NAT, most of which are heavily in- Figure 1: Software Carrier-Grade NAT: an OpenFlow fluenced by the hardware limitations of OpenFlow switches. Switch, six commodity servers and a controller. Next, we describe our implementation in x3 and our evalu- ation in x4. We make a quick suvey of existing literature in x5 and discuss the lessons learnt in x6. box (say N0) to process the packet and install a rule that matches future packets from this connection, rewrites their 2. DESIGNING A CARRIER-GRADE NAT destination MAC address and forwards them to the appro- priate NAT box. N0 creates a new mapping for the flow Our carrier-grade NAT is presented in Figure 1, and con- and then translates the packet, changing the source address sists of an IBM G8264 10Gbps OpenFlow switch connected and port of IPn and Pn, and forwards the packet back to to six commodity servers that perform the actual transla- the switch. The new packet belongs to a flow not known by tion, and another server that controls both the switch and the switch, so this packet will also be sent to the controller. the machines. The controller connects to the servers and The controller now knows the translation and installs an- the switch via a standard Gigabit switch, not shown in the other rule that forwards traffic from the server to N0. picture. To build a scalable software NAT, we need the fol- There are at least three problems with this strawman load- lowing building blocks: balancing algorithm: 1. A fast NAT implementation for a single machine. Al- • The OpenFlow switch has to maintain two rules per though arbitrary performance can be achieved by scal- connection. Our IBM G8264 switch supports around ing out (at least in principle), having a fast NAT im- 80 thousand rules, which would imply a maximum of plementation is key to ensure the resulting solution is 40 thousand connections, which is very small. economically feasible. • The control plane of the OpenFlow switch can only 2. A load balancing algorithm that spreads the traffic process around 200 packets (i.e. at most 100 new con- over the existing NAT instances. nections per second), which is insufficient. 3. A migration algorithm that allows the controller to add • Even if a switch existed with no scalability limita- more machines to the NAT when load grows, and turn tions, our software controller would become a bottle- off machines when load shrinks to save energy / costs. neck since it sees every new connection twice. For the first part, there are plenty of implementations To avoid stressing the control plane of the OpenFlow switch, for NATs that we can rely on: Linux supports NAT in its we must proactively insert load-balancing rules into the iptables suite, and the Click modular router [11] also has switch to spread the traffic to the NAT boxes. At first configurations for NATs. To decide which ones to choose, sight, a hash-based load-balancing algorithm would be ideal we profiled both, finding that native Linux can only handle to spread traffic coming from the inside network to the NAT around 500K packets per second, which is rather low. If we boxes; indeed, hash-based load balancing is supported in run our NAT in Click [11] in user-mode over netmap [18] we OpenFlow 1.1.