Bachelor Informatica

Open-source network operat- ing systems: feature evalua- tion of SONiC

Erik Puijk, 11017651

June 7, 2018 Informatica — Universiteit van Amsterdam

Supervisor(s): Dr. Paola Grosso,Lukasz Makowski MSc Signed: 2 Abstract Open network switches are increasing in popularity and allow the deployment of dif- ferent open-source network operating systems (NOS). In contrast to locked-in switches, open switches with an open-source NOS have been tested less extensively as they are quite new phenomena. This thesis examines whether open switches with an open-source network , namely SONiC, can be deployed successfully to perform fundamental networking operations. Furthermore, it examines for what use cases SONiC is suitable and beneficial. We experiment with open switches in various topologies to examine the deploy- ment of fundamental OSI Layer 2 and Layer 3 networking features. We conclude that all SONiC-supported features we tested can be deployed successfully. Moreover, we examine several use cases of open switches with SONiC and conclude that SONiC is most suitable for use cases in large-scale data centers and enterprise networks.

3 4 Contents

1 Introduction 7

2 Open networking background 9 2.1 Open switches ...... 9 2.2 Network operating systems ...... 10 2.2.1 ASIC communication ...... 10 2.2.2 ASIC control module ...... 10 2.3 SONiC ...... 11 2.3.1 Quagga ...... 12

3 Networking features 15 3.1 Layer 2 features ...... 15 3.2 Layer 3 features ...... 17

4 Experiments 19 4.1 Preparatory phase ...... 19 4.1.1 Mellanox SN2100 and ONIE ...... 20 4.1.2 Arista 7050QX-32S and Aboot ...... 20 4.2 Feature tests ...... 21 4.2.1 Layer 2 features ...... 21 4.2.2 Layer 3 features ...... 26 4.2.3 Result overview ...... 28

5 Discussion 29 5.1 Ease of use ...... 29

6 Use case scenarios for open switches 31 6.1 Current use cases ...... 31 6.2 Example use cases ...... 31

7 Conclusions 35 7.1 Future research ...... 35

8 Acknowledgements 37

9 Bibliography 39

10 Appendix 41

5 6 CHAPTER 1 Introduction

Network switches are essential in computer networks, to connect devices and forward frames between them on the OSI data link layer (Layer 2)[1]. Some switches are also capable of OSI Layer 3 features, such as the routing of segments among networks using IP. In this thesis, we discuss exactly this category of devices. Traditionally, switches are sold with locked-in hardware with a pre-installed (NOS) on them, without the possibility for a network administrator to install third-party NOS’s or other software on them. An open switch, in contrast, does allow the user to install another operating system on the device. Open switches thus give network administrators more possibilities to customize the switch to their own needs, possibly explaining their rising popularity. Another advantage of this category of switches is the reduction of expenses, due to the possibility to install low-cost software. In the past, this cost reduction would be compensated by an increase in operational costs, because using the switches would require hiring external -expertise to configure the switches [2]. However, as Linux expertise has grown over the past few years, this barrier for using open switches has become much thinner. Also, large manufacturers such as Dell and HP have been developing open switches on which NOS’s like Cumulus Networks or are already installed [3]. This removed another obstacle from companies using the switches because no manual installation is required. The increasing popularity raises questions about the suitability for use in real networks, the ease of use and how their feature sets compare to those of traditional switches. Considering that open switches are relatively new compared to locked-in switches, there is still need for testing and evaluation to assess whether open switches can replace traditional switches without loss of functionality, performance or ease of use, which the reduction in costs might not weigh up to. In this context, we examine the functioning of open switches running an open-source network operating system, namely SONiC, in a network. We study whether open switches with SONiC are able to deploy several fundamental networking features. In addition, we examine which use cases benefit from the open and flexible nature of open switches with SONiC. We therefore set out to answer the following two research questions: 1. Which networking features can be successfully deployed on open switches with SONiC? 2. Which use cases are (more) easily supported by open switches with SONiC?

Chapter 2 provides background information about open networking, open switches, network operating systems (SONiC specifically) and routing suites. Chapter 3 will briefly discuss sev- eral networking features that will later be used in our experiments. Chapter 4 contains the experiments we performed to answer our first research question. In chapter 5, we discuss the experimental methods and results and other findings obtained during this research. In chapter 6, we examine several use cases of open switches with SONiC. Lastly, in chapter 7 we return to our research questions, formulate a conclusion and suggest possibilities for future research.

7 8 CHAPTER 2 Open networking background

2.1 Open switches

Generally, a switch can be represented as a stack of four layered components. Figure 2.1 shows these components.

Control and management plane

Network operating system

Hardware

Silicon/ASIC

Figure 2.1: Layered component stack of a switch.

The silicon, or ASIC (application-specific integrated circuit), is a specified hardware element designed for a specific task. In the case of switches, this task is to quickly send packets through the network [4]. The hardware-layer includes all other physical components of the switch, like the interfaces, the input/output-ports, the LEDs and the power supply [5]. The network operating system (NOS) controls the hardware and the underlying ASIC for networking purposes and allows control and management plane applications to use the hardware. Control plane and management plane applications provide particular features to the user of the switch, in addition to those of the underlying operating system [6]. To understand the difference between open switches and traditional switches, one needs to consider the manner in which the above components interact with each other. Switches in which the NOS and the underlying hardware are disintegrated, meaning that they can be changed independently of each other, are called open switches. In traditional (locked-in) switches, this is not possible, for the switch is delivered with pre-installed software that cannot be changed. Open switches thus give the user more choice in what NOS to run. Open switches can be separated into subcategories. Bare metal switches provide the hardware required to run, and allow the user to load the NOS of choice. The manufacturers of bare metal switches are original design manufacturers (ODMs) for well-known switch merchants, which means that the ODMs products are re-branded and sold by other companies. A boot-loader allows the user to boot an NOS of choice on the device [7].

9 2.2 Network operating systems

A network operating system is a key component in the aforementioned component stack of open switches. The NOS controls the hardware of the device and provides applications on the switch with the hardware and software resources they need. These resources might for example be memory allocation or input and output resources.

2.2.1 ASIC communication The aforementioned application-specific integrated circuits (ASICs) are designed to handle a specific task. In networking switches, the ASIC is designed and optimized for quickly processing incoming packets according to the routing table. In order for an NOS to be able to program the ASIC, several have been developed to communicate with the ASIC. The Switch Abstraction Interface1 (SAI) is a well-known method. It is an open source framework that aims to abstract away from the ASIC, which differs for each vendor, so software can be programmed for use in multiple different switches without any changes. This allows for more freedom in the use of software independently from the hardware choice [8]. Another, less adopted method for ASIC communication is the use of SDKs developed by the ASIC vendor. In practice, this approach is not included in open-source software for open switches considering changes in the SDKs would need the application to be modified. Examples of these SDKs are SwitchX SDK by Mellanox [9] and OpenNSL by Broadcom [10].

2.2.2 ASIC control module On top of the ASIC API, the ASIC control module provides an interface for control plane ap- plications to communicate with the hardware. It also presents the current state of the hardware to the user of the switch. Control plane apps can use the ASIC control module to read from or write to data stored in the hardware. These applications therefore are independent from the hardware in the machine they are running on. Figure 2.2 shows the role of the ASIC control module and the ASIC API in the component stack illustrated before.

Control and management plane

Network operating system

ASIC control module

ASIC API

Hardware

Silicon/ASIC

Figure 2.2: ASIC control module and ASIC API in the layered component stack.

1https://github.com/opencomputeproject/SAI

10 2.3 SONiC

SONiC2 (Software for Open Networking in the Cloud) is an open-source network operating system that claims to include all features to have a fully functional Layer 3 network device. It is under constant development by . The latest release, SONiC.201803, supports features including BGP, LLDP, link aggregation/LACP and VLAN (trunking) [11]. SONiC is Linux-based and runs on Debian Jessie. The SONiC architecture is depicted in figure 2.3.

Figure 2.3: An overview of SONiC components [12].

SONiC uses SAI for programming the ASIC, which makes SONiC compatible with all ASICs that are supported by SAI. The SONiC Object Library3 allows external applications to interact with each other and with SONiC applications.

Switch State Service The Switch State Service (SwSS) acts as the ASIC control module for SONiC, using a database to interface between the network applications running on the switch and the switch hardware. Figure 2.4 shows the SwSS architecture. Network applications use a database named APP DB for reading and writing. Orchestration agents are responsible for synchronization between APP DB and another database, namely ASIC DB. To provide universality, the databases are set up as key- value storage. SwSS allows network applications running on SONiC to be completely independent from the hardware they are running on.

2https://github.com/Azure/SONiC/wiki 3https://github.com/Azure/sonic-object-library

11 Figure 2.4: The design of Switch State Service, showing the relation between APP DB, ASIC DB and the orchestration agents [12].

Since SONiC is open-sourced, the networking community can contribute to the improvement of SONiC. In addition, SONiC can be extended with other open-source software, be it third-party or proprietary. Figure 2.5 shows how SONiC is a part of the , which is an organization that has as its main goal to design and deploy the most efficient and scalable hardware for use in data centers. OCP certified switches can use SONiC through SAI to run a Layer 2 or Layer 3 switch, and in addition allows the user to extend the NOS with third-party or OCP software components [13].

Figure 2.5: SONiC as an open switching platform for OCP users [13].

2.3.1 Quagga Routing suites are control plane software collections that offer a variety of routing protocols to run on top of a network operating system. A routing suite exchanges routing information with other routers and updates routing information in the kernel. Quagga4 is an example of a routing suite and provides BGP routing functionality for SONiC. It is open-source and currently supports GNU/Linux and BSD. There are two main Quagga processes present on a SONiC device: bgpd and zebra. Figure 2.6 shows how Quagga and SONiC interact when a BGP route advertisement is received. First, this advertisement is passed to bgpd, short for BGP daemon. The BGP daemon processes this route and determines whether it should be placed in the kernel routing table. If so, the route is passed to zebra, which after additional filtering updates the routing table in APP DB. This is

4https://www.quagga.net/

12 done with fpmsyncd, which is a Forwarding Plane Manager and programs the forwarding plane. Fpmsyncd uses the SONiC Object Library API. SONiC handles updating the kernel routing table. Zebra is also capable of selecting the best route across different routing protocols, but as SONiC supports only BGP this is not applicable to a SONiC device.

Figure 2.6: The interaction between Quagga and SONiC when a new BGP route is received [12]. Quagga determines whether a new route should be placed in the routing table, after which SONiC takes care of updating the kernel routing table.

13 14 CHAPTER 3 Networking features

This chapter briefly covers several networking features or protocols that are relevant to this research. We selected both OSI Layer 2 and 3 features that we consider fundamental to a switch.

3.1 Layer 2 features

Layer 2 of the OSI model is the data link layer. It provides mechanisms to allow communication between devices within the same (LAN). Data elements transferred on the data link layer are called (link-layer) frames. A frame is transferred between two devices on a physical link, and MAC addresses are used to address link-layer frames. The protocol is a well-known example of a Layer 2 protocol and is used often in local area networks and wide area networks. WiFi is an alternative technology, although it is used only in local area networks [14].

LLDP The Link Layer Discovery Protocol (LLDP) allows networking devices to advertise information about their identity and capabilities. LLDP is useful for managing network resources and provides a vendor-independent mechanism for devices to exchange device information within a network. Each interface of a device sends out a frame consisting of an LLDP Data Unit (LLDPDU). An LLDPDU consists of a sequence of type-length-value (TLV) structures, each containing specific information. Each LLDPDU contains the following mandatory TLVs: Chassis ID, Port ID and Time To Live. Furthermore, optional TLVs can be included as well, such as system name, system capabilities and port description. LLDPDU frames are sent at a fixed interval, to keep the information up-to-date [15].

LACP The Link Aggregation Control Protocol (LACP) can be used to combine multiple physical ports and form a single logical channel. This is also called link aggregation. To higher layer pro- tocols, the aggregated links form a single channel. Link aggregation provides increased through- put due to the use of multiple physical links. It also adds redundancy because even with a physical link-failure, traffic is still capable of flowing through the logical link by making use of the other (operative) physical links [16].

STP The (STP) aims at constructing and managing a loop-free topology in a bridged network. STP is supposed to prevent loops in a topology and guarantees that there are unique paths to all destinations within a topology. This is critical in local area networks, for

15 loops can cause Ethernet frames to be switched around endlessly. For instance, when a switch receives an ARP packet, it broadcasts this packet through all of its interfaces (except the one the ARP packet was received on). If a loop exists in the network of the switch, it will eventually receive its own broadcast ARP packet, which will be broadcast again. This endless process is known as a broadcast storm and it is one of the phenomena STP should prevent [17]. It does so by selecting a Root Bridge, and then creates a loop-free topology by blocking particular interfaces such that all other bridges have a shortest path to the Root Bridge [18].

Figure 3.1: Example of how STP creates a loop-free topology with shortest paths to the Root Bridge [19]

Figure 3.1 shows an example of a topology with a loop. Switch A is selected as Root Bridge, thus all other switches should have a shortest path to switch A. By blocking the link between switch B and switch , the loop in the topology is removed and all switches have a shortest path to switch A.

VLAN (trunking) A Virtual Local Area Network (VLAN) can be described as an isolated and partitioned broadcast domain at OSI Layer 2. A Local Area Network (LAN) can be partitioned into several Virtual LANs to create multiple (logically) separate networks despite the (possible) presence of a physical connection to other devices that are not in the VLAN. This is achieved by configuring VLANs on a switch and specifying which switch interface belongs to which VLAN. The ability to define isolated broadcast domains using software makes VLAN a well-known networking feature. It allows network administrators to create multiple separated broadcast domains without having to change the physical wiring of the network [20]. A trunk link can be used to transport frames from multiple VLANs, for instance to inter- connect two switches that have configured the same VLANs. This is achieved by using tags in link-layer frames to specify what VLAN a frame belongs to. These trunk links are therefore called ”tagged” links. Links that can only transport frames from one specific VLAN are called ”untagged” links and thus the frames that are transported on these links do not contain a VLAN tag. Figure 3.2 shows an example. The entirely blue or entirely red links are untagged links for VLAN 100 and 200 respectively, and the frames transported on these links do not contain a VLAN tag. The link between switch 1 and 2 is a tagged link, for it must transport frames from both VLAN 100 and 200 and thus a tag must be used to indicate the corresponding VLAN for a particular packet.

16 Figure 3.2: Example of a VLAN configuration with two switches and two VLANs [21]. The trunk link between switch 1 and 2 transports traffic from both VLAN 100 and VLAN 200.

3.2 Layer 3 features

Layer 3 of the OSI model, the , is responsible for the routing of data between different local area networks (LANs). Network-layer packets are addressed using unique and hierarchical IP addresses [14]. Routers are devices that forward packets to their destination by determining the route to be taken by the packet to reach its destination. In a network, routers communicate with each other to exchange information about the availability of subnets. This is done by routing protocols, which in addition define policies or algorithms that determine the best route to a destination when multiple routes are available.

Inter-VLAN routing Inter-VLAN routing is used when traffic must flow between different VLANs, which in some cases may be necessary. Typically, this is accomplished by specifying a VLAN interface for each relevant VLAN, allowing the switch to route the traffic between VLANs through these VLAN interfaces. Figure 3.3 shows an example of a (R1) that routes between VLANs. When PC A sends traffic destined for PC B, the traffic is first tagged with VLAN 20. When this traffic reaches router R1, it routes this traffic to VLAN 30 by using the configured VLAN interfaces. Then, this traffic can proceed to PC B with the VLAN 30 tag.

Figure 3.3: Example of a topology in which inter-VLAN routing is needed [22]. Router R1 uses the VLAN interface addresses to route the traffic from VLAN 20 to VLAN 30.

17 BGP An autonomous system (AS) is a selection of IP prefixes that is managed by a certain network administrator. Commonly, such autonomous systems are managed by internet providers or large companies. The (BGP) is a well-known routing protocol used for routing between different autonomous systems, but can also be used to route within an AS. Routers running BGP can set up sessions with each other using TCP to exchange routing information, for instance subnet availability. BGP provides several mechanisms to control the propagation of routes, such as route-maps. Route maps allow the network administrator to define rules for certain prefixes [23]. For instance, there may be rules to drop an incoming route or change it before placing it in the routing table. BGP is used commonly in large-scale data centers because it scales well compared to other routing protocols [24].

18 CHAPTER 4 Experiments

4.1 Preparatory phase

To examine whether open switches with SONiC can be deployed successfully in a network, we decided to perform experiments on the deployment of the fundamental networking features that were described in chapter 3. The SNE OpenLab has provided two open switches for this research.

• Mellanox SN21001 • Arista 7050QX-32S2 The ASICs in these switches are manufactured by different vendors: Mellanox and Broadcom, respectively. This allows us to test two different ASIC types at once, and shows whether SONiC can run on two different ASICs. Both switches are capable of operating on OSI Layer 2 and Layer 3. Figures 4.1 and 4.2 show photos of Mellanox SN2100 and Arista 7050QX-32S, respectively. More details of the switches can be found in Appendix A.

Figure 4.1: Mellanox SN2100.

Figure 4.2: Arista 7050QX-32S.

1http://www.mellanox.com/related-docs/prod_eth_switches/PB_SN2100.pdf 2https://www.arista.com/assets/data/pdf/Datasheets/7050QX-32_32S_Datasheet.pdf

19 The open-source NOS used in the experiments is SONiC. Section 2.3 has briefly explained the architecture of SONiC and what features it supports. SONiC is under constant development and new features are added regularly. In addition, the absence of other network operating systems that support a wide variety of open switches made SONiC an obvious NOS to experiment with. Aside from SONiC, there are various open-source network operating systems, such as OPX. SONiC, however, supports a wide variety of switches and thus allows for the straightforward extension of this research to other supported devices. Details about the version of SONiC used in this research are provided in Appendix B1.

4.1.1 Mellanox SN2100 and ONIE Mellanox SN2100 comes with Mellanox Onyx as NOS. Other operating systems can be installed over ONIE3 (Open Network Install Environment). It is an open-source project by Cumulus Networks that allows creating an open networking environment and is present by default on various open switches [25]. Figure 4.3 shows how ONIE can be used to boot a network operating system. When a device boots for the first time, a low-level boot loader boots ONIE from flash

Figure 4.3: Process of first time boot using ONIE [25]. memory. ONIE then fetches the image of the NOS supplied by the switch vendor and installs this image. The next time the device boots, ONIE is not used by default and the device goes straight in the NOS. ONIE can, however, still be used to uninstall the vendor NOS and install another (open source) NOS from an image provided over the network or via USB for instance [25].

4.1.2 Arista 7050QX-32S and Aboot Arista 7050QX-32S runs Arista EOS by default. A component of this operating system is Aboot4, which can be used to boot image files to run other operating systems. Boot parameters can be configured in files which are stored in flash memory. In addition, Aboot provides a which can be used to modify boot configuration [26].

3https://opencomputeproject.github.io/onie/ 4https://www.arista.com/ko/um-eos/eos-6-1-boot-loader--aboot

20 4.2 Feature tests

In the last chapter, we discussed several networking features we consider crucial to a . The objective was to test whether each of the features worked correctly on Mellanox and Arista running SONiC. The results of the experiments possibly provide an insight into whether the switches can be deployed in a data center network as a replacement of traditional switches. For most experiments, we used SONiC’s config db. configuration file to set up our testing environment. Appendix C contains the most relevant configurations. Complete configuration files can be found in the GitHub repository5.

4.2.1 Layer 2 features LLDP Methods The first feature to be tested was LLDP. To verify whether LLDP correctly exchanged device information, we considered the topology depicted in figure 4.4. We wanted to verify whether all mandatory LLDP TLVs were exchanged between Mellanox and Arista. This mandatory information is as follows [15]: • Chassis ID • Port ID • Time To Live

Furthermore, we examined whether any additional information, such as system information and device capabilities, is exchanged through LLDP as well.

Ethernet32 Ethernet16 Mellanox SN2100 Arista 7050QX-32S Ethernet36 Ethernet20

Figure 4.4: Topology to analyze LLDP with Mellanox and Arista.

Results SONiC provides the command show lldp neighbors to view the LLDP neighbor information registered by SONiC. Appendix D1 presents the complete output of this command. Both devices presented the mandatory LLDP information in their LLDP tables. SONiC runs LLDP so that the Chassis ID that was included is taken from the MAC address of the eth0 interfaces on both switches. In addition, LLDP exchanged correct system name and system description information, and also included device capabilities and which of these capabilities were currently in use.

LACP Methods In addition, we tested LACP/link aggregation behaviour. With the topology depicted in figure 4.5, we tried to configure link aggregation combining the double links between Mellanox and Arista. We considered both Layer 2 link aggregation and Layer 3 link aggregation. The latter has an IP address configured for each link aggregation interface, while the former does not. A working port channel should provide redundancy in the sense that when one of the links fail, communication is still possible through the port channel by making use of the other link. Thus, to test an aggregated link, one can simulate a physical breakdown of one of the two links and examine whether communication is still possible between the two devices.

5https://github.com/erikpu/open-switch-configurations

21 PortChannel1

Ethernet32 Ethernet16 Mellanox SN2100 Arista 7050QX-32S Ethernet36 Ethernet20

Figure 4.5: Topology to analyze LACP with a port channel consisting of two physical links between Mellanox and Arista.

Results Appendix C1.1 shows the configuration we set for a Layer 2 port channel between Mellanox and Arista. We were unable to configure a successful Layer 2 port channel. That is, SONiC could not successfully set up a port channel with no interface addresses. PortChannel1 failed to go UP on both devices. Listing 4.1 shows that PortChannel1 was stuck in DOWN state. admin@-mellanox:~$ ip a show dev PortChannel1 4: PortChannel1: mtu 9100 qdisc noqueue state DOWN group default link/ether ec:0d:9a:8d:f1:c0 brd ff:ff:ff:ff:ff:ff Listing 4.1: PortChannel1 failed to go UP.

Moreover, with teamdctl we were able to view the state of PortChannel1 within the SONiC teamd docker, which is responsible for link aggregation. On both Mellanox and Arista, no ports were present as slave of PortChannel1. Listing 4.2 shows this for Mellanox. It was confirmed by a SONiC collaborator that port channel functionality is currently aimed at Layer 3 port channels and not at Layer 2 port channels6. admin@sonic-mellanox:~$ docker exec -it 7e504bdc03e0 bash root@sonic-mellanox:/# teamdctl PortChannel1 state view setup : runner: lacp runner : active : yes fast rate: no Listing 4.2: No slave ports present for PortChannel1.

Appendix C1.2 shows the configuration of the Layer 3 port channel. Layer 3 link aggregation did work correctly. In this case, PortChannel1 did go UP and allowed us to communicate between the interface IP addresses configured on PortChannel1 (172.16.0.10/24 on the Mellanox side and 172.16.0.20/24 on the Arista side). Also, teamdctl now showed us the correct slave ports for PortChannel1. In addition, Layer 3 link aggregation passed our redundancy tests. Listing 4.3 shows an example of this (some output has been omitted for brevity). Appendix D2 shows more complete output of the example. In listing 4.3, it can be seen on lines 4-14 that when both links are operational, communication through the port channel is possible. In lines 20-21, one can see that one of the links is not operational anymore. Line 27 shows that despite the fact that one link broke down, PortChannel1 is still operational. Next, lines 32-36 show that communication is possible through the port channel even when one of the two links is not operational. This indicates that Layer 3 link aggregation provides a redundant connection between Mellanox and Arista.

6https://github.com/Azure/SONiC/issues/186

22 1 # (both links up) 2 3 #(...) 4 Ethernet32 32,33,34,35 N/A 9100 Ethernet32 up up 5 Ethernet36 36,37,38,39 N/A 9100 Ethernet36 up up 6 #(...) 7 8 # (communication is fine) 9 10 admin@sonic-mellanox:~$ sudo ping 172.16.0.20 11 PING 172.16.0.20 (172.16.0.20) 56(84) bytes of data. 12 64 bytes from 172.16.0.20: icmp_seq=1 ttl=64 time=0.362 ms 13 64 bytes from 172.16.0.20: icmp_seq=2 ttl=64 time=0.220 ms 14 64 bytes from 172.16.0.20: icmp_seq=3 ttl=64 time=0.320 ms 15 #(...) 16 17 # (link failure Ethernet32!) 18 19 #(...) 20 Ethernet32 32,33,34,35 N/A 9100 Ethernet32 down up 21 Ethernet36 36,37,38,39 N/A 9100 Ethernet36 up up 22 #(...) 23 24 # (PortChannel1 still UP) 25 26 #(...) 27 4: PortChannel1: mtu 9100 qdisc noqueue state UP group default 28 #(...) 29 30 # (communication still working) 31 32 admin@sonic-mellanox:~$ sudo ping 172.16.0.20 33 PING 172.16.0.20 (172.16.0.20) 56(84) bytes of data. 34 64 bytes from 172.16.0.20: icmp_seq=1 ttl=64 time=0.294 ms 35 64 bytes from 172.16.0.20: icmp_seq=2 ttl=64 time=0.310 ms 36 64 bytes from 172.16.0.20: icmp_seq=3 ttl=64 time=0.296 ms 37 #(...) Listing 4.3: Behaviour of PortChannel1 when link Ethernet32 (Mellanox)-Ethernet16 (Arista) fails (some output has been omitted for brevity).

STP Methods We also examined the Spanning Tree Protocol. Considering the topology in figure 4.6, if no STP is configured at all, a broadcast storm may occur because there is a network loop. Broadcast messages sent from Mellanox to Arista out of both Ethernet32 and Ethernet36 will be circulated because Arista will broadcast these back to Mellanox through Ethernet20 and Ethernet16, re- spectively, leading to an endless broadcast storm. STP must prevent this by selecting a Root Bridge and blocking one of the links between Mellanox and Arista.

Ethernet32 Ethernet16 Mellanox SN2100 VLAN 100 Arista 7050QX-32S Ethernet36 Ethernet20

Figure 4.6: Topology to analyze STP.

The current version of SONiC does not have support for STP, but we decided to try to con- figure STP nevertheless using brctl. We placed the relevant ports in VLAN 100 and we first set the priority of the Mellanox bridge to 100 and of the Arista bridge to 200, meaning that Mellanox should be selected as Root Bridge and thus one of the Arista ports should be set in blocking state (lower priority indicates higher chance to be selected as Root Bridge). Also, we

23 reversed the priorities so that Arista should be selected as Root Bridge and one of the Mellanox ports should be in blocking state. The relevant SONiC configurations (of the VLANs) can be found in Appendix C2.

Results In both our tests the devices selected themselves as Root Bridge, meaning that all ports were set in forwarding state and the loop in our topology remained existent. Listings 4.4 and 4.5 show the results of the STP status on Mellanox and Arista (some output has been omitted for brevity). The full output can be found in Appendix D3. One can notice on lines 8 and 12 of both listings that all four ports are in forwarding state and in both cases the devices selected themselves as Root Bridge, which can be concluded from lines 3 and 4 in both listings. 1 admin@sonic-mellanox:~$ sudo brctl showstp Bridge 2 Bridge 3 bridge id 0064.ec0d9a8df1c0 4 designated root 0064.ec0d9a8df1c0 5 #(...) 6 7 Ethernet32 (1) 8 portid 8001 state forwarding 9 #(...) 10 11 Ethernet36 (2) 12 portid 8002 state forwarding 13 #(...) Listing 4.4: Mellanox selected itself as Root Bridge (some output was omitted for brevity).

1 admin@sonic-arista:~$ sudo brctl showstp Bridge 2 Bridge 3 bridge id 00c8.001c737bf75c 4 designated root 00c8.001c737bf75c 5 #(...) 6 7 Ethernet16 (1) 8 portid 8001 state forwarding 9 #(...) 10 11 Ethernet20 (2) 12 portid 8002 state forwarding 13 #(...) Listing 4.5: Arista selected itself as Root Bridge (some output was omitted for brevity). To investigate this behaviour, we used to capture STP messages between Mellanox and Arista. We found that both devices were sending STP messages to each other, but no receiving STP messages were passed to the control plane. It was confirmed by a SONiC contributor that in the current version of SONiC, no interface trap is configured for STP messages, meaning that incoming STP messages are not passed to the control plane and thus STP is unable to operate correctly on SONiC7.

VLAN (trunking) Methods Additionally, we examined VLAN (trunking). We decided that for this experiment, and the other experiments following this one, it would be interesting to use several hosts in our testing topologies. In order to do so, we attached two physical servers to our topology and set up two virtual machines (VMs) to run on each host . We used the same operating system on all four VMs (Linux 4.9.82-1+deb9u3). We used Vagrant8 to build and manage our virtual machine environment, and VirtualBox9 for the virtual machine itself. Appendix B2 and B3 specify the Vagrant and VirtualBox versions

7https://github.com/Azure/SONiC/issues/184#issuecomment-387942526 8https://www.vagrantup.com/ 9https://www.virtualbox.org/

24 used in the experiments, respectively. To allow the VMs to participate in the network, we configured a bridge between the physical interfaces of the servers and the virtual interfaces of the VMs using the Vagrantfiles. The virtual interfaces could then be configured as if they were physical interfaces on four separate machines. The Vagrantfiles used in this experiment can also be found in the previously mentioned GitHub repository. To verify whether the switches are able to perform correct VLAN (trunking) functionality, we used the topology depicted in figure 4.7. For the SONiC configuration, Appendix C3 can be consulted. The link between Mellanox and Arista should be configured as a ”tagged” (trunk) link, for it must carry frames that can belong to either VLAN 100 or VLAN 200. We examined whether the open switches allowed us to configure such ”tagged” ports and whether packets can be delivered correctly within the same VLANs using the trunk link. For instance, VM A1 must be able to communicate with VM B1, because they are configured to be in the same VLAN. VM A1 should not be able to communicate with VM B2, because they are in different VLANs and should therefore be completely isolated from each other.

VLAN interfaces VLAN interfaces VLAN 100: 172.16.100.3/24 VLAN 100: 172.16.100.4/24 VLAN 200: 172.16.200.3/24 VLAN 200: 172.16.200.4/24

Mellanox SN2100 Ethernet32 VLAN trunk Ethernet16 Arista 7050QX-32S

Ethernet56 Ethernet60 Ethernet40 Ethernet44

VLAN 100 VLAN 200 VLAN 100 VLAN 200

172.16.100.1/24 172.16.200.1/24 172.16.100.2/24 172.16.200.2/24

VM A1 VM A2 VM B1 VM B2

Phys. server A Phys. server B

Figure 4.7: Topology to analyze VLAN (trunking) and inter-VLAN routing. Two VLANs are configured with two virtual machines in each VLAN. A trunk link is present between Mellanox and Arista.

Results We were able to communicate within VLANs, but not between VLANs, which is the correct and expected behaviour. As an example, listing 4.6 shows with ping that communication is possible from VM A1 to VM B1, which are in the same VLAN. vagrant@vmA1:~$ ping 172.16.100.2 PING 172.16.100.2 (172.16.100.2) 56(84) bytes of data. 64 bytes from 172.16.100.2: icmp_seq=1 ttl=64 time=0.602 ms 64 bytes from 172.16.100.2: icmp_seq=2 ttl=64 time=0.580 ms 64 bytes from 172.16.100.2: icmp_seq=3 ttl=64 time=0.591 ms Listing 4.6: ping from VM A1 to VM B1. In contrast, communication between devices configured in different VLANs is not possible. List- ing 4.7 displays this behaviour. The example shows that we cannot communicate between VLAN

25 100 to VLAN 200.

vagrant@vmA1:~$ ping 172.16.200.2 PING 172.16.200.2 (172.16.200.2) 56(84) bytes of data. From 172.16.100.1 icmp_seq=1 Destination Host Unreachable From 172.16.100.1 icmp_seq=2 Destination Host Unreachable From 172.16.100.1 icmp_seq=3 Destination Host Unreachable Listing 4.7: ping from VM A1 to VM B2.

This indicates that the VLAN trunk succeeded in tagging frames with the correct VLAN number, allowing for complete VLAN isolation shown in the above listings.

4.2.2 Layer 3 features Inter-VLAN routing Methods Besides VLAN trunking functionality, we also tested inter-VLAN routing using the same topol- ogy as depicted in figure 4.7. Figure 4.7 shows the VLAN interface addresses we used for both Mellanox and Arista. For the relevant sections of the SONiC configuration, Appendix C4 can be consulted. If inter-VLAN routing works correctly, the switches will route packets from one VLAN to another and thus we should be able to communicate between VLANs using the VLAN interfaces. For instance, VM A1 should be able to ping VM B2 and vice versa.

Results Using inter-VLAN routing we are able to communicate between different VLANs. Listing 4.8 shows an example of a traceroute from VM A1 to VM B2. It shows on line 3 that the VLAN interface of VLAN 100 on Mellanox (172.16.100.3) is used to route the packet to VLAN 200, after which Arista delivers the packet to VM B2.

1 vagrant@vmA1:~$ traceroute 172.16.200.2 2 traceroute to 172.16.200.2 (172.16.200.2), 30 hops max, 60 byte packets 3 1 172.16.100.3 (172.16.100.3) 0.399 ms 0.307 ms 0.227 ms 4 2 172.16.200.2 (172.16.200.2) 0.696 ms 0.619 ms 0.526 ms Listing 4.8: traceroute from VM A1 to VM B2.

BGP Methods To verify whether BGP operates correctly on the switches, we set up the topology in figure 4.8. SONiCs BGP configuration can be found in Appendix C5. We have configured the switches to be in different autonomous systems, so as to create a BGP session between Mellanox and Arista. We expect this BGP session to exchange routes between Mellanox and Arista. Specifically, Mel- lanox has knowledge of subnets 10.0.0.0/24 and 10.0.1.0/24 and Arista does not. BGP must propagate the routes to these subnets to Arista. Similarly, BGP must propagate routes to the subnets 10.0.4.0/24 and 10.0.5.0/24 to Mellanox. In the end, we want to be able to commu- nicate between the virtual machines in hosts A and B using these propagated routes.

26 AS 65000 subnet AS 65100 10.0.2.0/24 10.0.2.10 10.0.2.20 Mellanox SN2100 Arista 7050QX-32S 10.0.3.10 10.0.3.20 subnet 10.0.0.10 10.0.1.10 10.0.3.0/24 10.0.4.20 10.0.5.20

subnet subnet subnet subnet 10.0.0.0/24 10.0.1.0/24 10.0.4.0/24 10.0.5.0/24

10.0.0.1 10.0.1.1 10.0.4.2 10.0.5.2

VM A1 VM A2 VM B1 VM B2

Phys. server A Phys. server B

Figure 4.8: Topology to analyze BGP. Mellanox and Arista are placed in two separate au- tonomous systems, each containing two subnets.

Results SONiC succesfully set up the two BGP sessions we configured. For example, listing 4.9 shows the current BGP sessions Mellanox has with Arista on lines 8 and 9.

1 admin@sonic-mellanox:~$ show ip bgp summary 2 Command: sudo vtysh -c "show ip bgp summary" 3 BGP router identifier 10.1.0.10, local AS number 65000 4 RIB entries 21, using 2352 bytes of memory 5 Peers 2, using 9312 bytes of memory 6 7 Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/ PfxRcd 8 10.0.2.20 465100 34 37 0 0 000:30:30 8 9 10.0.3.20 465100 38 41 0 0 000:30:27 8 10 11 Total number of neighbors 2 Listing 4.9: BGP neighbors on Mellanox, showing the two sessions with Arista.

In addition, Mellanox and Arista successfully exchanged routing information regarding the sub- nets that were directly connected to them. Mellanox propagated the subnets 10.0.0.0/24 and 10.0.1.0/24 to Arista and Arista propagated the subnets 10.0.4.0/24 and 10.0.5.0/24 to Mellanox. For instance, listing 4.10 shows that subnets 10.0.4.0/24 (line 11) and 10.0.5.0/24 (line 13) are present in the routing table on Mellanox, and that these subnets can be reached through both 10.0.2.20 (via interface Ethernet32) and 10.0.3.20 (via interface Ethernet36) and that this reachability was sourced from BGP. Similarly, in the Arista routing table, subnets 10.0.0.0/24 and 10.0.1.0/24 are present and can be reached through 10.0.2.10 (via inter- face Ethernet16) and 10.0.3.10 (via interface Ethernet20). Again, these routes were learned through BGP.

27 1 admin@sonic-mellanox:~$ show ip route 2 Command: sudo vtysh -c "show ip route" 3 Codes: K - kernel route, C - connected, S - static, R - RIP, 4 O - OSPF, I - IS-IS, B - BGP, P - PIM, A - Babel, 5 > - selected route, * - FIB route 6 7 C>* 10.0.0.0/24 is directly connected, Ethernet56 8 C>* 10.0.1.0/24 is directly connected, Ethernet60 9 C>* 10.0.2.0/24 is directly connected, Ethernet32 10 C>* 10.0.3.0/24 is directly connected, Ethernet36 11 B>* 10.0.4.0/24 [20/0] via 10.0.2.20, Ethernet32, src 10.1.0.10, 00:30:05 12 * via 10.0.3.20, Ethernet36, src 10.1.0.10, 00:30:05 13 B>* 10.0.5.0/24 [20/0] via 10.0.2.20, Ethernet32, src 10.1.0.10, 00:30:05 14 * via 10.0.3.20, Ethernet36, src 10.1.0.10, 00:30:05 15 C>* 10.1.0.10/32 is directly connected, lo 16 B>* 10.1.0.20/32 [20/0] via 10.0.2.20, Ethernet32, src 10.1.0.10, 00:30:05 17 * via 10.0.3.20, Ethernet36, src 10.1.0.10, 00:30:05 18 #(...) Listing 4.10: Routing table on Mellanox, showing the routes to the subnets we configured before (some output was omitted for brevity).

Ultimately, to complete the verification of BGP, we used traceroute between the virtual ma- chines to determine whether communication is possible and if yes, what routes are taken. Listing 4.11 shows that VM A1 and VM B2 are able to communicate through Mellanox and Arista. First, a packet goes through the connected interface on the Mellanox device (10.0.0.10) (line 3). Then, it is routed to the Arista device (10.1.0.20) (line 4), which routes it via its interface in subnet 10.0.5.0/24. SONiC uses the loopback address configured in config db.json as Router ID in the BGP sessions, which for Mellanox is 10.1.0.10/32 and for Arista 10.1.0.20/32. 1 vagrant@vmA1:~$ sudo traceroute 10.0.5.2 2 traceroute to 10.0.5.2 (10.0.5.2), 30 hops max, 60 byte packets 3 1 10.0.0.10 (10.0.0.10) 0.333 ms 0.229 ms 0.258 ms 4 2 10.1.0.20 (10.1.0.20) 0.381 ms 0.264 ms 0.250 ms 5 3 10.0.5.2 (10.0.5.2) 0.523 ms 0.416 ms 0.312 ms Listing 4.11: traceroute from VM A1 to VM B2.

4.2.3 Result overview Table 4.1 presents an overview of the obtained results, including several remarks.

Feature Results Comments LLDP Pass - LACP Pass L2 link aggregation not working STP Fail Not supported by SONiC; packets dropped VLAN (trunk) Pass - Inter-VLAN routing Pass - BGP Pass -

Table 4.1: An overview of the obtained results.

28 CHAPTER 5 Discussion

The performed experiments provided insights into the shortcomings of several fundamental net- working features when deployed on open switches with SONiC. Aside from commenting on some of the obtained results, this chapter will provide general insights into the ease of use of open switches with SONiC.

LACP The first notable observation in our LACP experiments was that none of our port channels were showing up when issuing the command show interfaces portchannel. This command should show all configured port channels in SONiC. It turns out SONiC takes the configuration of minigraph.xml, which is a deprecated configuration method of SONiC, for the command show interfaces portchannel1 instead of the main configuration file config db.json. The Layer 3 port channel we set up in the experiments has shown that it is possible to deploy a Layer 3 port channel that provides redundant communication between the devices running SONiC. In order to get this working, aside from configuring the port channel in SONiCs configuration file, we had to edit SONiCs teamd template2. By default, SONiC sets the field min ports, which is the minimum amount of active ports in a port channel for the port channel to be operational, to ceil(0.75∗number of port channel members), which was 2 in our topology. In order to test the redundancy of our port channel, we set min ports to 1.

BGP Similarly, successful BGP functionality was implied by the experiments we performed. Configur- ing BGP neighbors and AS numbers was done in config db.json. Nonetheless, we had to edit the BGP configuration template3 of SONiC to specify that BGP should be configured such that it also advertises directly connected networks (which was the case in our topology) and not only networks obtained from other BGP neighbors.

5.1 Ease of use

SONiC allowed for straightforward configuration in the config db.json configuration file. Sev- eral examples of configuration have been shown in this thesis already. Moreover, SONiC provided some CLI configuration commands but not for all features we tested. For instance, configuration commands could be used to shut down interfaces and BGP sessions, but not to set interface addresses or configure BGP peers. Command-line configuration also allowed the user to set up VLANs and add or remove interfaces. In general, the command-line configuration provided a small subset of configuration possibilities compared to configuration in config db.json. For

1https://github.com/Azure/sonic-utilities/blob/master/scripts/teamshow 2https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-teamd/teamd.j2 3https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-quagga/bgpd.conf.j2

29 users that prefer command-line configuration this might make SONiC less straightforward to use. SONiC did provide a wide variety of commands to view current configuration or device sta- tus. Several examples have been provided in this research. For all supported (thus not STP) features we tested there existed one or more corresponding show commands to view configu- ration or state information. One exception, as mentioned before, was that show interfaces portchannel takes its information from a deprecated configuration method to show the config- ured port channels. In short, the ease of use of SONiC depends on the preferences of the user. Users that prefer working in a configuration file will probably experience SONiC as a user-friendly NOS, while users that prefer CLI configuration might be less comfortable with SONiCs configuration.

30 CHAPTER 6 Use case scenarios for open switches

The use of open switches with SONiC can be beneficial in various use cases. For example, SONiC allows the user to look into the source code and then use or extend it for a certain objective. In this chapter, we briefly examine a current use case that exploits the properties of open switches with SONiC, and we provide two example use cases ourselves.

6.1 Current use cases

Microsoft Global Cloud Microsoft developed SONiC for use in their Microsoft Global Cloud. Since Microsoft runs one of the largest networks in the world, Microsoft sets out strict requirements for the technology that it deploys. One of these requirements is that new features can be implemented without having an impact on the end users. Since SONiC consists of several containers, each containing the resources to deploy a certain networking feature (e.g. BGP, LAG), only one container needs to be updated when there is an update or a bug fix for a certain feature, instead of replacing the whole switch image (which will result in data plane downtime). This makes SONiC suitable for use cases in which no downtime is permitted while deploying updates [13]. Another requirement is that SONiC can be used on the newest and most innovated hardware platforms. Because SONiC uses SAI, a data center can constantly innovate with newer switch hardware without having to change the software stack. Regardless of what switch hardware is used, as long as SAI is supported, all switches can be configured the same way, for their software stacks can be identical. Operators are thus able to preserve their software investments while keeping up with hardware-wise innovation. Thus, SONiC is suitable for use cases in which there may be regular changes in hardware [13]. In addition, Microsoft states the requirement that cloud-scale deep telemetry and fully au- tomated failure mitigation has to be utilized in their cloud. Since SONiC has innovations such as NetBouncer and Everflow available, SONiC meets these requirements. NetBouncer can be deployed to automatically detect faulty devices or links within a large data center accurately [27]. Everflow debugs several network faults such as packet drops or loops. Furthermore, it can quickly identify devices that cause high latency in a network [28]. In short, SONiC is suitable for use cases in which large networks have to be monitored constantly to automatically respond to potential problems.

6.2 Example use cases

In addition to the above use case of SONiC in the Microsoft Global Cloud, we provide two example use cases in which the open nature of SONiC can be deployed for useful applications.

31 Plug-and-play VLAN The first use case relates to companies that want to provide flexible working spots to their employees, who in turn own a laptop provided by the company. On different days the same employee might want to work on different physical locations within the company office. To gain access to the company’s network, the employees can use Ethernet cables to connect their laptops with the network. Since generally a company consists of several departments with partitioned and isolated computer networks (VLANs) and an employee from any department can sit and connect anywhere, how does the network decide to which VLAN the newly connected laptop should be added? Employees want to start working right away after connecting to the network, so it is preferred their laptops are added to the correct VLAN without intervention by the network administrator. If the company network consists of open switches running SONiC, the network administrator could decide to develop an application that automatically places newly connected hosts in a particular VLAN, depending on the MAC address of the host, so that the VLAN functionality on the switch is essentially plug-and-play. That is, no manual configuration is needed every time a new host connects. This can be implemented as follows. Via SONiCs logs or LLDP entries, newly connected hosts (including their MAC address) can be discovered by the application. The network administrator can define a policy that states the relation between the host MAC address and the VLAN that host will be placed in, if at all. This is illustrated in the right side of figure 6.1. For instance, the application can decide to block hosts with unknown MAC addresses, or place hosts manufactured by the same vendor in the same VLAN (MAC addresses are vendor-specific). Next, the application can simply edit SONiCs config db.json configuration file to change the VLAN settings. We suppose that the network administrator has a list of all company laptops in use by the employees, including their MAC addresses. The network administrator can easily define a policy that maps each laptop to a certain department (VLAN). This way, no matter where an employee connects its laptop to the network, the SONiC switch will detect the MAC address of this laptop and determine in which VLAN the switch interface this new host is connected to should be placed. Similarly, if a laptop not belonging to one of the employees uses one of the Ethernet cables, its MAC address is not recognized and can be blocked by the switch.

Office

Other laptops

Laptop with MAC x

Switch

SONiC Policy table

Host detection on y, MAC x interface y

Add interface y to y, VLAN z VLAN z

Block Block

Figure 6.1: An overview of the plug-and-play VLAN application. The switch detects a new host with MAC address x on switch interface y. The policy table maps from MAC address x to VLAN number z, after which the application places interface y in VLAN z.

32 Automatic port channel Our second use case focuses on data centers that need to provide a redundant network, because they handle important data that must reach its destination. The network administrator can decide to aggregate all connections between each switch in the data center, to provide redundancy. Such a data center may have a large number of switches, and configuring link aggregation for the connections between these switches can by time-consuming. LACP provides a way to do this, because it is able to automatically negotiate between two devices to set up a port channel. However, before this can happen, the network administrator needs to specify which switch interfaces belong to which port channel, if at all. The application in the next paragraph proposes a more flexible alternative for this LACP feature by allowing the network administrator to connect neighbors to any interface on a switch without the need of specifying that these interfaces will be used to create a port channel. The application will dynamically determine whether a port channel should be set up, and if so, with which interfaces. If the data center contains open switches with SONiC, the network administrator can develop an application that automatically sets up a port channel if it discovers that the switch has multiple physical links with the same neighbor (for instance by comparing MAC addresses or Chassis IDs of all connected devices). This way, when the network administrator connects two switches with multiple physical links, no further configuration is necessary to set up the port channel.

Datacenter

Other switches

Switch with ID x

Switch

SONiC Policy table

Neighbor detection y, ID x on interface y

Create new port y, PC z channel z

Add to existing port channel z

No port channel modifications

Figure 6.2: An overview of the automatic port channel application. The switch detects a new neighbor with Chassis ID x on switch interface y. Using neighbor and port channel information, the application decides whether 1) there should be no port channel modifications because the new link is the only link with the neighbor device, or 2) a new port channel z should be created with interface y and the other interface already connected to the same neighbor, or 3) the new interface y should be added to port channel z because there already is a port channel with the neighbor.

When the switch detects a newly connected neighbor, the application uses neighbor and port channel information to determine whether there is already a port channel with this neighbor and if not, whether there is already a single link with this neighbor, for instance by comparing the Chassis ID of all connected neighbors. Figure 6.2 shows how this determination process. There are three possible cases: 1. There is no other link with the neighbor device.

2. There is one other link with the neighbor device.

33 3. There is already a port channel with the neighbor device. In case 1, no port channel needs to be created since it would not provide redundancy. In case 2, a port channel needs to be created with the interface of the already present link and the interface of the new link. In case 3, the interface of the new link needs to be added to the port channel that is already present with the neighbor. Editing port channels can be done by editing SONiCs configuration file. To set up a working port channel, the neighbor device will also have to go through this same process to set up the port channel on the neighbors side. Thus, the application must run on both devices.

34 CHAPTER 7 Conclusions

Open switches play an increasingly significant role in innovative networks. We have examined the deployment of several networking features that we believe are essential for a network switch, and did so using multiple topologies with open switches to simulate different scenarios. In the discussion, we provided remarks about some experimental results and shed some light on the ease of use of a SONiC switch. In the last chapter, we briefly examined a current use case of open switches with SONiC and provided two possible use cases ourselves. We can conclude that the networking features that we tested and SONiC states to support, LLDP, LACP (Layer 3), VLAN trunking, inter-VLAN routing and BGP, can in fact be success- fully deployed on open switches with SONiC. In the discussion, we stated that some of these features required additional configuration, but the features were deployed successfully after all. In addition, we can conclude that open switches with SONiC are suitable for use cases in large data centers as well as in enterprise networks, because SONiC allows the network administrator to update features without data plane downtime. In addition, the network administrator is able to keep up with hardware innovations while no changes to the software stack need to be made. Also, SONiC provides innovative applications to monitor the network and automatically detect network failures. The openness of SONiC gives the network administrator more possibilities to customize and extend software on the switch.

7.1 Future research

In future research, more (different) open switches can be used to experiment with. Examples are several Dell and Edgecore devices, which are supported by SONiC. This would allow us to set up more interesting and realistic network topologies to test features with. In addition, since SONiC develops constantly, retesting with SONiC after a certain amount of time could in principle provide different results. Furthermore, other open-source network operating systems such as OPX can be tested as well, to allow for a comparison with SONiC. Interoperability between SONiC and traditional, closed network operating systems can also be performed in future research, since a network generally consists of devices with different operating systems. Lastly, future research can be done on the use cases of SONiC. Additional use cases can be examined and the described example use cases can be implemented and tested.

35 36 CHAPTER 8 Acknowledgements

I would like to thank my supervisors, Paola Grosso andLukasz Makowski, for their support and guidance throughout this research and the writing of this thesis. Moreover, I would like to thank the SNE OpenLab for providing the devices that were required in this research.

37 38 CHAPTER 9 Bibliography

[1] Neil Briscoe. “Understanding the OSI 7-layer model”. In: PC Network Advisor 120.2 (2000). [2] Zeus Kerravala. White boxes are now ready for prime time. Consulted on 03-04-2018. 2016. url: https://www.networkworld.com/article/3100927/network- switch/white- boxes-are-now-ready-for-prime-time.html. [3] Mike Sheldon. The future of networking: Its in a white box. Consulted on 03-04-2018. 2017. url: https://www.networkworld.com/article/3182138/hardware/the-future-of- networking-its-in-a-white-box.html. [4] Margaret Rouse. ASIC (application-specific integrated circuit). Consulted on 09-04-2018. 2005. url: https://whatis.techtarget.com/definition/ASIC-application-specific- integrated-circuit. [5] Microsoft Azure. Software for Open Networking in the Cloud SONiC. Consulted on 03-04- 2018. 2018. url: http://azure.github.io/SONiC/. [6] Ivan Pepelnjak. Management, Control and Data Planes in Network Devices and Systems. Consulted on 09-04-2018. 2013. url: http://blog.ipspace.net/2013/08/management- control-and-data-planes-in.html. [7] Jeff Doyle. Clearing the fog around open switching terminology. Consulted on 09-04-2018. 2015. url: https : / / www . networkworld . com / article / 2919599 / cisco - subnet / clearing-the-fog-around-open-switching-terminology.html. [8] Kamala Subramaniam. Switch Abstraction Interface (SAI) officially accepted by the Open Compute Project (OCP). Consulted on 12-04-2018. 2015. url: https://azure.microsoft. com/en-us/blog/switch-abstraction-interface-sai-officially-accepted-by- the-open-compute-project-ocp/. [9] Mellanox Technologies. Switch Software Development Kit. Consulted on 14-04-2018. 2018. url: http://www.mellanox.com/page/products_dyn?product_family=124&mtag= switchx_sdk. [10] Broadcom. OpenNSL. Consulted on 14-04-2018. 2018. url: https://www.broadcom.com/ products/ethernet-connectivity/software/opennsl/. [11] Lihua Yuan. Features and Roadmap. Consulted on 16-04-2018. 2018. url: https : / / github.com/Azure/SONiC/wiki/Features-and-Roadmap. [12] Microsoft Azure. Architecture. Consulted on 16-04-2018. 2017. url: https://github.com/ Azure/SONiC/wiki/Architecture. [13] Yousef Khalidi. SONiC: The networking switch software that powers the Microsoft Global Cloud. Consulted on 01-06-2018. 2017. url: https://azure.microsoft.com/en- us/ blog / sonic - the - networking - switch - software - that - powers - the - microsoft - global-cloud/.

39 [14] James F. Kurose and Keith W. Ross. Computer Networking: A Top-Down Approach. Sixth edition. Pearson, 2013. [15] “IEEE Standard for Local and metropolitan area networks - Station and Media Access Con- trol Connectivity Discovery”. In: IEEE Std 802.1AB-2016 (Revision of IEEE Std 802.1AB- 2009) (2016), pp. 1–146. doi: 10.1109/IEEESTD.2016.7433915. [16] Mick Seaman. “Link aggregation control protocol”. In: IEEE http://grouper. ieee. org/- groups/802/3/ad/public/mar99/seaman 1 (1999), p. 0399. [17] Alan Elder and Jonathan Harrison. Spanning tree protocol. US Patent App. 14/673,652. 2015. [18] “IEEE Standard for Local and metropolitan area networks: Media Access Control (MAC) Bridges”. In: IEEE Std 802.1D-2004 (Revision of IEEE Std 802.1D-1998) (2004), pp. 1– 281. doi: 10.1109/IEEESTD.2004.94569. [19] Spanning Tree Concepts. Consulted on 06-06-2018. 2016. url: https : / / seeseenayy . blogspot.com/2016/09/ccnav3-chapter-2-notes-spanning-tree.html. [20] David J Husak. Direct addressing between VLAN subnets. US Patent 6,157,647. 2000. [21] ExamCollection. How to verify and configure VLANs trunking. Consulted on 03-05-2018. 2018. url: https : / / www . examcollection . com / certification - training / ccnp - configure-and-verify-vlans-and-trunking.html. [22] CNNA Blog. Introduction to inter-vlan routing. Consulted on 06-06-2018. 2018. url: http: //www.ccnablog.com/inter-vlan-routing/. [23] Kunihiro Ishiguro et al. System Architecture. Consulted on 16-04-2018. 2005. url: https: //www.quagga.net/docs/quagga.html#System-Architecture. [24] Onsel Kuluk. BGP in the Data Center: Part One. Consulted on 07-06-2018. 2018. url: https://www.packetdesign.com/blog/bgp-in-the-data-center-part-1/. [25] Cumulus Networks. Open Network Install Environment. Consulted on 19-04-2018. 2018. url: https://opencomputeproject.github.io/onie/. [26] . Boot Loader Aboot. Consulted on 20-04-2018. 2018. url: https://www. arista.com/ko/um-eos/eos-6-1-boot-loader--aboot. [27] Microsoft. CloudBrain for Automatic Troubleshooting for the Cloud. Consulted on 05-06- 2018. 2016. url: https://www.microsoft.com/en-us/research/project/cloudbrain/. [28] et al. Yibo Zhu. “Packet-level telemetry in large datacenter networks”. In: ACM SIGCOMM Computer Communication Review. Vol. 45. 4. ACM. 2015, pp. 479–491.

40 CHAPTER 10 Appendix

A: Platform summary

Listings 10.1 and 10.2 show a platform summary of Mellanox SN2100 and Arista 7050QX-32S, respectively.

Mellanox admin@sonic−mellanox:˜$ show platform summary Platform: x86 64−mlnx msn2100−r0 HwSKU: ACS−MSN2100 ASIC: mellanox Listing 10.1: Mellanox SN2100 device information.

Arista admin@sonic−arista:˜$ show platform summary Platform: x86 64−a r i s t a 7 0 5 0 q x 3 2 s HwSKU: Arista −7050−QX−32S ASIC: broadcom Listing 10.2: Arista 7050QX-32S device information.

41 B: Software information

B1: SONiC version Listings 10.3 and 10.4 show information about the versions of SONiC that were used on Mellanox SN2100 and Arista 7050QX-32S, respectively.

Mellanox admin@sonic−mellanox:˜$ show version SONiC Software Version : SONiC .HEAD.574 − ed915e3 Distribution: Debian 8.10 Kernel: 3.16.0 −5 −amd64 Build commit: ed915e3 Build date: Wed Apr 4 06:58:48 UTC 2018 Built by: johnar@jenkins −worker −3

Docker images: REPOSITORY TAG IMAGE ID SIZE docker−orchagent−mlnx HEAD.574 − ed915e3 32203598a5bc 287MB docker−orchagent−mlnx latest 32203598a5bc 287MB docker−syncd−mlnx HEAD.574 − ed915e3 b7ab299d2758 350.9MB docker−syncd−mlnx latest b7ab299d2758 350.9MB docker−lldp −sv2 HEAD.574 − ed915e3 d8f65d12b406 297.2MB docker−lldp −sv2 latest d8f65d12b406 297.2MB docker−dhcp−r e l a y HEAD.574 − ed915e3 46c2a839b5ba 280.1MB docker−dhcp−relay latest 46c2a839b5ba 280.1MB docker−database HEAD.574 − ed915e3 622f0f354847 278.8MB docker−database latest 622f0f354847 278.8MB docker−teamd HEAD.574 − ed915e3 9dd25e367798 284.1MB docker−teamd latest 9dd25e367798 284.1MB docker−snmp−sv2 HEAD.574 − ed915e3 4b4277d6cc32 319.3MB docker−snmp−sv2 latest 4b4277d6cc32 319.3MB docker−router −advertiser HEAD.574 − ed915e3 6c2adf7743ec 276.4MB docker−router −advertiser latest 6c2adf7743ec 276.4MB docker−platform−monitor HEAD.574 − ed915e3 9b31194ff812 298.3MB docker−platform−monitor latest 9b31194ff812 298.3MB docker−fpm−quagga HEAD.574 − ed915e3 67c77efc3d4a 290.6MB docker−fpm−quagga latest 67c77efc3d4a 290.6MB Listing 10.3: SONiC version information on Mellanox SN2100.

42 Arista admin@sonic−arista:˜$ show version SONiC Software Version : SONiC .HEAD.547 −4754 b43 Distribution: Debian 8.10 Kernel: 3.16.0 −5 −amd64 Build commit: 4754b43 Build date: Fri Apr 6 07:38:17 UTC 2018 Built by: johnar@jenkins −worker −4

Docker images: REPOSITORY TAG IMAGE ID SIZE docker−syncd−brcm HEAD.547 −4754b43 145a93bf2613 358.1MB docker−syncd−brcm latest 145a93bf2613 358.1MB docker−orchagent−brcm HEAD.547 −4754b43 32fdab0a0a85 287MB docker−orchagent−brcm latest 32fdab0a0a85 287MB docker−lldp −sv2 HEAD.547 −4754b43 9ce7dc0f55f6 297.2MB docker−lldp −sv2 latest 9ce7dc0f55f6 297.2MB docker−dhcp−r e l a y HEAD.547 −4754b43 17fd00cd2091 280.1MB docker−dhcp−relay latest 17fd00cd2091 280.1MB docker−database HEAD.547 −4754b43 5af52a038baf 278.8MB docker−database latest 5af52a038baf 278.8MB docker−teamd HEAD.547 −4754b43 24d8a02873a7 284.1MB docker−teamd latest 24d8a02873a7 284.1MB docker−snmp−sv2 HEAD.547 −4754b43 129e2b96b2c4 319.3MB docker−snmp−sv2 latest 129e2b96b2c4 319.3MB docker−router −advertiser HEAD.547 −4754b43 58be616fcdb1 276.4MB docker−router −advertiser latest 58be616fcdb1 276.4MB docker−platform−monitor HEAD.547 −4754b43 6b68d76ae87b 298.3MB docker−platform−monitor latest 6b68d76ae87b 298.3MB docker−fpm−quagga HEAD.547 −4754b43 0ff216021fea 290.6MB docker−fpm−quagga latest 0ff216021fea 290.6MB Listing 10.4: SONiC version information on Arista 7050QX-32S.

43 B2: Vagrant version Listing 10.5 specifies the version of Vagrant used in the experiments. root@hosta:˜/vagrantfiles# vagrant version Installed Version: 2.0.4 Latest Version: 2.1.1 root@hostb:˜/vagrantfiles# vagrant version Installed Version: 2.0.4 Latest Version: 2.1.1 Listing 10.5: Vagrant version used on servers A and B.

B3: VirtualBox version Listing 10.6 specifies the version of VirtualBox used in the experiments. root@hosta:˜/ vagrantfiles# vboxmanage −−v e r s i o n 5.2.10r122088 root@hostb:˜/ vagrantfiles# vboxmanage −−v e r s i o n 5 . 2 . 1 0 Debianr121806 Listing 10.6: VirtualBox version used on servers A and B.

44 C: Configurations

This section lists the relevant configurations for each experiment in config db.json. Complete configuration files can be found in the GitHub repository1.

C1.1: LACP (L2) Mellanox

"PORTCHANNEL_INTERFACE":{ "PortChannel1|": {} }, "PORTCHANNEL":{ "PortChannel1": { "members": [ "Ethernet32", "Ethernet36" ] } },

Arista

"PORTCHANNEL_INTERFACE":{ "PortChannel1|": {} }, "PORTCHANNEL":{ "PortChannel1": { "members": [ "Ethernet16", "Ethernet20" ] } },

1https://github.com/erikpu/white-label-configurations

45 C1.2: LACP (L3) Mellanox

"PORTCHANNEL_INTERFACE":{ "PortChannel1|172.16.0.10/24": {} }, "PORTCHANNEL":{ "PortChannel1": { "members": [ "Ethernet32", "Ethernet36" ] } },

Arista

"PORTCHANNEL_INTERFACE":{ "PortChannel1|172.16.0.20/24": {} }, "PORTCHANNEL":{ "PortChannel1": { "members": [ "Ethernet16", "Ethernet20" ] } },

46 C2: STP Mellanox

"VLAN":{ "Vlan100": { "members": [ "Ethernet32", "Ethernet36" ], "vlanid": "100" } }, "VLAN_MEMBER":{ "Vlan100|Ethernet32": { "tagging_mode": "untagged" }, "Vlan100|Ethernet36": { "tagging_mode": "untagged" } },

Arista

"VLAN":{ "Vlan100": { "members": [ "Ethernet16", "Ethernet20" ], "vlanid": "100" } }, "VLAN_MEMBER":{ "Vlan100|Ethernet16": { "tagging_mode": "untagged" }, "Vlan100|Ethernet20": { "tagging_mode": "untagged" } },

47 C3: VLAN (trunking) Mellanox

"VLAN":{ "Vlan100": { "members": [ "Ethernet56", "Ethernet32" ], "vlanid": "100" }, "Vlan200": { "members": [ "Ethernet60", "Ethernet32" ], "vlanid": "200" } }, "VLAN_MEMBER":{ "Vlan100|Ethernet32": { "tagging_mode": "tagged" }, "Vlan100|Ethernet56": { "tagging_mode": "untagged" }, "Vlan200|Ethernet32": { "tagging_mode": "tagged" }, "Vlan200|Ethernet60": { "tagging_mode": "untagged" } },

48 Arista

"VLAN":{ "Vlan100": { "members": [ "Ethernet40", "Ethernet16" ], "vlanid": "100" }, "Vlan200": { "members": [ "Ethernet44", "Ethernet16" ], "vlanid": "200" } }, "VLAN_MEMBER":{ "Vlan100|Ethernet40": { "tagging_mode": "untagged" }, "Vlan100|Ethernet16": { "tagging_mode": "tagged" }, "Vlan200|Ethernet44": { "tagging_mode": "untagged" }, "Vlan200|Ethernet16": { "tagging_mode": "tagged" } },

C4: Inter-VLAN routing Mellanox

"VLAN_INTERFACE":{ "Vlan100|172.16.100.3/24": {}, "Vlan200|172.16.200.3/24": {} },

Arista

"VLAN_INTERFACE":{ "Vlan100|172.16.100.4/24": {}, "Vlan200|172.16.200.4/24": {} },

49 C5: BGP Mellanox

"BGP_NEIGHBOR":{ "10.0.2.20": { "name": "Arista", "local_addr": "10.0.2.10", "asn": "65100" }, "10.0.3.20": { "name": "Arista", "local_addr": "10.0.3.10", "asn": "65100" } },

Arista

"BGP_NEIGHBOR":{ "10.0.2.10": { "name": "Mellanox", "local_addr": "10.0.2.20", "asn": "65000" }, "10.0.3.10": { "name": "Mellanox", "local_addr": "10.0.3.20", "asn": "65000" } },

50 D: Output

This section contains complete output of commands used in the experiments. Output that was too extensive to be placed in the report itself can be found in this section.

D1: LLDP Mellanox

1 admin@sonic-mellanox:~$ show lldp neighbors 2 Command: sudo lldpctl 3 ------4 LLDP neighbors: 5 ------6 Interface: Ethernet32, via: LLDP, RID: 1, Time: 0 day, 00:04:48 7 Chassis : 8 ChassisID: mac 00:1c:73:7b:f7:5c 9 SysName: sonic-arista 10 SysDescr: Debian GNU/Linux 8 (jessie) Linux 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 11 TTL : 120 12 MgmtIP: 10.1.0.32 13 MgmtIP: fe80::21c:73ff:fe7b:f75c 14 Capability: Bridge, on 15 Capability: Router, on 16 Capability: Wlan, off 17 Capability: Station, off 18 Port : 19 PortID: local Ethernet16 20 PortDescr: Ethernet16 21 ------22 Interface: Ethernet36, via: LLDP, RID: 1, Time: 0 day, 00:04:47 23 Chassis : 24 ChassisID: mac 00:1c:73:7b:f7:5c 25 SysName: sonic-arista 26 SysDescr: Debian GNU/Linux 8 (jessie) Linux 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 27 TTL : 120 28 MgmtIP: 10.1.0.32 29 MgmtIP: fe80::21c:73ff:fe7b:f75c 30 Capability: Bridge, on 31 Capability: Router, on 32 Capability: Wlan, off 33 Capability: Station, off 34 Port : 35 PortID: local Ethernet20 36 PortDescr: Ethernet20 37 ------LLDP neighbors on Mellanox.

51 Arista

1 admin@sonic-arista:~$ show lldp neighbors 2 Command: sudo lldpctl 3 ------4 LLDP neighbors: 5 ------6 Interface: Ethernet20, via: LLDP, RID: 1, Time: 0 day, 00:05:42 7 Chassis : 8 ChassisID: mac ec:0d:9a:8d:f1:e4 9 SysName: sonic-mellanox 10 SysDescr: Debian GNU/Linux 8 (jessie) Linux 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 11 TTL : 120 12 MgmtIP: 10.1.0.32 13 MgmtIP: fe80::ee0d:9aff:fe8d:f1e4 14 Capability: Bridge, on 15 Capability: Router, on 16 Capability: Wlan, off 17 Capability: Station, off 18 Port : 19 PortID: local Ethernet36 20 PortDescr: Ethernet36 21 ------22 Interface: Ethernet16, via: LLDP, RID: 1, Time: 0 day, 00:05:44 23 Chassis : 24 ChassisID: mac ec:0d:9a:8d:f1:e4 25 SysName: sonic-mellanox 26 SysDescr: Debian GNU/Linux 8 (jessie) Linux 3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08) x86_64 27 TTL : 120 28 MgmtIP: 10.1.0.32 29 MgmtIP: fe80::ee0d:9aff:fe8d:f1e4 30 Capability: Bridge, on 31 Capability: Router, on 32 Capability: Wlan, off 33 Capability: Station, off 34 Port : 35 PortID: local Ethernet32 36 PortDescr: Ethernet32 37 ------LLDP neighbors on Arista.

52 D2: LACP (L3) redundancy

1 # (both links up) 2 3 admin@sonic-mellanox:~$ show interfaces status 4 Command: intfutil status 5 Interface Lanes Speed MTU Alias Oper Admin 6 ------7 #(...) 8 Ethernet32 32,33,34,35 N/A 9100 Ethernet32 up up 9 Ethernet36 36,37,38,39 N/A 9100 Ethernet36 up up 10 #(...) 11 12 # (communication is fine) 13 14 admin@sonic-mellanox:~$ sudo ping 172.16.0.20 15 PING 172.16.0.20 (172.16.0.20) 56(84) bytes of data. 16 64 bytes from 172.16.0.20: icmp_seq=1 ttl=64 time=0.362 ms 17 64 bytes from 172.16.0.20: icmp_seq=2 ttl=64 time=0.220 ms 18 64 bytes from 172.16.0.20: icmp_seq=3 ttl=64 time=0.320 ms 19 ^C 20 --- 172.16.0.20 ping statistics --- 21 3 packets transmitted, 3 received, 0% packet loss, time 1998ms 22 rtt min/avg/max/mdev = 0.220/0.300/0.362/0.062 ms 23 24 # (link failure Ethernet32!) 25 26 admin@sonic-mellanox:~$ show interfaces status 27 Command: intfutil status 28 Interface Lanes Speed MTU Alias Oper Admin 29 ------30 #(...) 31 Ethernet32 32,33,34,35 N/A 9100 Ethernet32 down up 32 Ethernet36 36,37,38,39 N/A 9100 Ethernet36 up up 33 #(...) 34 35 # (PortChannel1 still UP) 36 37 admin@sonic-mellanox:~$ ip a show dev PortChannel1 38 4: PortChannel1: mtu 9100 qdisc noqueue state UP group default 39 link/ether ec:0d:9a:8d:f1:c0 brd ff:ff:ff:ff:ff:ff 40 inet 172.16.0.10/24 brd 172.16.0.255 scope global PortChannel1 41 valid_lft forever preferred_lft forever 42 inet6 fe80::ee0d:9aff:fe8d:f1c0/64 scope link 43 valid_lft forever preferred_lft forever 44 45 # (communication still working) 46 47 admin@sonic-mellanox:~$ sudo ping 172.16.0.20 48 PING 172.16.0.20 (172.16.0.20) 56(84) bytes of data. 49 64 bytes from 172.16.0.20: icmp_seq=1 ttl=64 time=0.294 ms 50 64 bytes from 172.16.0.20: icmp_seq=2 ttl=64 time=0.310 ms 51 64 bytes from 172.16.0.20: icmp_seq=3 ttl=64 time=0.296 ms 52 ^C 53 --- 172.16.0.20 ping statistics --- 54 3 packets transmitted, 3 received, 0% packet loss, time 1998ms 55 rtt min/avg/max/mdev = 0.294/0.300/0.310/0.007 ms Behaviour of PortChannel1 when link Ethernet32 (Mellanox)-Ethernet16 (Arista) fails (comments are added with ’#’ for clarity). Some output was omitted for brevity.

53 D3: STP Mellanox admin@sonic-mellanox:~$ sudo brctl showstp Bridge Bridge bridge id 0064.ec0d9a8df1c0 designated root 0064.ec0d9a8df1c0 rootport 0 pathcost 0 maxage 20.00 bridgemaxage 20.00 hellotime 2.00 bridgehellotime 2.00 forwarddelay 15.00 bridgeforwarddelay 15.00 ageingtime 300.00 hellotimer 0.98 tcntimer 0.00 topologychangetimer 0.00 gctimer 57.79 flags

Ethernet32 (1) portid 8001 state forwarding designatedroot 0064.ec0d9a8df1c0 pathcost 100 designated bridge 0064.ec0d9a8df1c0 message age timer 0.00 designatedport 8001 forwarddelaytimer 0.00 designatedcost 0 holdtimer 0.00 flags

Ethernet36 (2) portid 8002 state forwarding designatedroot 0064.ec0d9a8df1c0 pathcost 100 designated bridge 0064.ec0d9a8df1c0 message age timer 0.00 designatedport 8002 forwarddelaytimer 0.00 designatedcost 0 holdtimer 0.00 flags STP state on Mellanox: both ports in forwarding state.

Arista admin@sonic-arista:~$ sudo brctl showstp Bridge Bridge bridge id 00c8.001c737bf75c designated root 00c8.001c737bf75c rootport 0 pathcost 0 maxage 20.00 bridgemaxage 20.00 hellotime 2.00 bridgehellotime 2.00 forwarddelay 15.00 bridgeforwarddelay 15.00 ageingtime 300.00 hellotimer 1.52 tcntimer 0.00 topologychangetimer 0.00 gctimer 161.18 flags

Ethernet16 (1) portid 8001 state forwarding designatedroot 00c8.001c737bf75c pathcost 100 designated bridge 00c8.001c737bf75c message age timer 0.00 designatedport 8001 forwarddelaytimer 0.00 designatedcost 0 holdtimer 0.52 flags

Ethernet20 (2) portid 8002 state forwarding designatedroot 00c8.001c737bf75c pathcost 100 designated bridge 00c8.001c737bf75c message age timer 0.00 designatedport 8002 forwarddelaytimer 0.00 designatedcost 0 holdtimer 0.52 flags STP state on Arista: both ports in forwarding state.

54