Addressing the Scalability of Ethernet with MOOSE

Addressing the Scalability of Ethernet with MOOSE Malcolm Scott, Daniel Wagner-Hall, Andrew Moore and Jon Crowcroft University of Cambridge Computer Laboratory fMalcolm.Scott, daw63, Andrew.Moore, [email protected] Abstract necessity for virtualised infrastructure [3]. While IP Mo- bility [4] addresses the problem of maintaining higher- Ethernet does not scale well to large networks. The flat layer connections when roaming between subnets, it re- MAC address space, whilst having obvious benefits for quires client support that is neither ubiquitous or reli- the user and administrator, is the primary cause of this able. Common practice sees the provision of one phys- poor scalability; other recent efforts to improve upon ical Ethernet network covering an entire data centre, or Ethernet’s scalability have addressed symptoms, rather even an entire WAN of data centres. than this underlying cause. In this paper we present Our approach, Multi-level Origin-Organised Scalable MOOSE, Multi-level Origin-Organised Scalable Ether- Ethernet (MOOSE), provides all the advantages of an net, an Ethernet switch architecture that performs in- Ethernet network without the capital and running costs place rewriting of MAC addresses in order to impose and administrative overhead of a IP router-based ap- a hierarchy upon the address space without reconfigu- proach. MOOSE does this by providing a hierarchical ration or modification of connected devices. This re- addressing scheme without requiring host reconfigura- moves the need for switches to maintain large forward- tion or modification. ing databases, is of direct use in implementing improved Ethernet’s scalability is limited firstly by the forward- routing, and allows for a variety of other scalability ing database that every switch in an Ethernet network and security innovations. We also present a globally- must maintain [5, x7.8–7.9]. A switch’s forwarding scalable, distributed and resilient protocol for the auto- database contains one entry per source address seen in matic assignment of addresses to switches, and for de- any frame passing through that switch, and stores that tecting and cheaply resolving addressing conflicts. MAC address together with the learnt location of that address—the port on which packets from that address 1 Introduction were last seen. This is later used to determine on which Ethernet has lasted well since its inception in the ’70s [1] port to transmit frames destined for that address. De- with Ethernet frame-structure and addressing remaining vices frequently broadcast frames throughout the net- ubiquitous in the data centre environment as in many work (e.g. ARP queries) so active devices on the net- others. Alongside IP and IP-transported services such as work are listed in most switches’ forwarding databases iSCSI, it is now commonplace to see converged network most of the time. services such as physical disk interfaces and cluster in- In modern switches the capacity of this database is terconnects layered directly over Ethernet (e.g. ATA- generally of the order of 16,000 entries [6]. (Higher- over-Ethernet and variants of Infiniband). However, Eth- capacity forwarding databases exist but are currently ernet exhibits scalability issues on networks of more constrained to very high-end switches.) On a moderately than a few thousand devices, such as costly and energy- large network, full databases are a serious risk. If the dense address table logic and storms of broadcast traffic. database becomes full, entries will be discarded; frames Aside from more physical devices, virtualised infras- for unknown addresses are flooded to all ports and the tructure further increases the density of Ethernet ad- resulting traffic storm could cause major problems, es- dresses in data centres. Widely-used layer-2 virtualisa- pecially in the presence of low-capacity edge links. tion mandates a unique Ethernet address per virtual ma- Traditionally the forwarding database has been stored chine [2]. This means that each physical machine in a in a content-addressable memory (CAM) as lookups data centre may represent many tens of Ethernet devices. must be very fast, particularly as 10 Gbit/s Ethernet be- The traditional method of avoiding such problems is comes ubiquitous. As networks grow, the number of en- the artificial subdivision of a network, but this introduces tries in a switch’s forwarding database must naturally in- an administrative burden, requires significant routing crease; however, increasing the capacity of CAMs with- equipment and also precludes seamless migration—a out sacrificing speed whilst constraining energy con- sumption is proving to be challenging [7, 8]. Cheaper wireless LANs are the one remaining vestige of Ethernet switches use DRAM in place of a CAM, but this is likely operating over shared media, where one switch (access to remain slower especially for large tables. point) serves many hosts on the same radio channel. Secondly, Ethernet’s inability to handle networks con- Ethernet’s poor scalability arises in various guises, as taining loops also presents a scalability problem. The outlined in section 1. It would seem at first glance that Rapid Spanning Tree Protocol, RSTP [5, x17], must re- these are entirely distinct and unrelated. However, there move loops by disabling any redundant links. On a dense is a common underlying cause: that MAC addresses pro- mesh network, RSTP will disable a large proportion of vide no location information. links; this constrains frames to suboptimal routes and Globally-unique MAC addresses are structured such may introduce bottlenecks in the network, particularly that the first three bytes of a device’s address contain an around the root of the spanning tree. In a data centre en- organisationally unique identifier (OUI) allocated to the vironment, this potentially amounts to a very large pro- device’s manufacturer by the IEEE, with the remaining portion of capacity being wasted wherever redundant fi- three bytes allocated by the manufacturer. This hierar- bres are installed, e.g. between cabinet switches and be- chy exists solely for the purpose of allocating unique ad- tween data centres. dresses in a decentralised fashion, and is of no use to Thirdly, not only does Ethernet flood frames des- Ethernet switches, which must treat the unicast address tined for unknown hosts, but it also uses—and encour- space as flat. ages higher-layer protocols to use—broadcast for con- A flat address space has the advantage that no configu- trol messages. For example, ARP [9] performs address ration of devices is required; a device can use its unique, resolution via broadcast queries, and DHCP [10] uses manufacturer-assigned MAC address anywhere on any broadcast messages for automatic configuration. It is im- network. However, this leaves each switch with the task practical to replace these protocols entirely as this would of discovering and storing the location of every address- require software upgrades to every device, but it would able device. be desirable for the network to minimise the amount of If the MAC address space were not flat, but instead broadcast traffic required to be forwarded. contained enough information to locate the device pos- In this paper we identify the relevant underlying prob- sessing the address, several advantages would be gained. lems in the design of Ethernet (section 2), review pre- Firstly, large forwarding databases would no longer have vious work (section 3) and present the MOOSE switch to be maintained on every switch. This location infor- architecture, which addresses inadequacies in the funda- mation could instead be distributed across the network mental operation of Ethernet in a novel yet backwards- so that frames are directed towards their destinations ac- compatible way (section 4). By revisiting the address- cording to successive stages of a hierarchy. ing scheme itself, rather than simply addressing symp- Secondly, a hierarchical MAC address space would toms of the problem as many previous proposed solu- also make the addition of shortest-path routing consid- tions have done, we can go about solving all of the above erably easier. Shortest-path routing is clearly a desir- scalability problems and more. able property for a network, yet it is one that Ethernet A working high-level software implementation of does not provide. Flat addressing does not lend itself MOOSE is described and evaluated in section 6. to easy routing: any address can be located anywhere on This work expands on our previous paper presented the network, which means either advertising every host’s at the First Workshop on Data Center – Converged and MAC address via the routing protocol—which scales Virtual Ethernet Switching (DC CAVES) [11]. We have very poorly—or providing some other location lookup added a crucial piece of the architecture—an automatic service. The use of hierarchical addresses, with each addressing scheme with cheap conflict resolution—and switch handling a block of sequential addresses akin to have better addressed the key issue of compatibility with an IP subnet, would reduce the routing problem to the existing protocols. one that routing protocols were designed to solve. Thirdly, this would allow for reduction of broadcast 2 Ethernet’s Underlying Problem traffic in a variety of different ways. Hierarchical MAC The original Ethernet was a shared-medium network, addresses could, for example, be mapped directly and where every frame was broadcast and no switching took deterministically onto the IP address space, if appro- place. Modern-day wired Ethernet-based networks in- priate for the specific deployment. This would allow stead consist almost entirely of point-to-point links; as switches to respond directly and simply to DHCP and a result of this, the distinction between unicast, broad- ARP queries, avoiding the need to forward the most cast and multicast has become more important. 802.11 common sources of broadcast frames. Alternatively, a distributed directory service can be used, which is less sulating switches must obtain a new destination address limiting and is thus our preferred approach as detailed in for every frame using a lookup table that—like Ether- section 4.5.

Load more