Documentation

Introduction Installation Guide User Guide System Architecture Test Infrastructure

Introduction:

Project Atrium develops an open SDN Distribution - a vertically integrated set of open source components which together form a complete SDN stack.

Motivation

The current state of SDN technology suffers from two significant gaps that interfere with the development of a vibrant ecosystem. First, there is a large gap in the integration of the elements that are needed to build an SDN stack. While there are multiple choices at each layer, there are missing pieces and poor or no integration.

Second, there is a gap in interoperability. This exists both at a product level, where existing products from different vendors have limited compatibility, and at a protocol level, where interfaces between the layers are either over or under specified. For example, differences in the implementations of OpenFlow v1.3 makes it difficult to connect an arbitrary switch and controller. On the other hand, the interface for writing applications on top of a controller platform is mostly under specified, making it difficult to write a portable application. Project Atrium attempts to address these challenges, not by working in the specification space, but by generating code that integrates a set of production quality components. Their successful integration then allows alternative component instances (switches, controllers, applications) to be integrated into the stack.

Most importantly we wish to work closely with network operators on deployable use-cases, so that they could download near production quality code from one location, and trial functioning software defined networks on real hardware. Atrium is the first fully open source SDN distribution (akin to a distribution). We believe that with operator input, requirements and deployment scenarios in mind, Atrium can be useful as distributed, while also providing the basis for future extensions and alternative distributions which focus on requirements different from those in the original release.

Atrium Release 2015/A

In the first release (2015/A), Atrium is quite simply an open-source that speaks BGP to other routers, and forwards packets received on one port /vlan to another, based on the next-hop learnt via BGP peering.

Atrium creates a vertically-integrated stack to produce an SDN based router. This stack can have the forms shown below.

On the left, the stack includes a controller (ONOS) with a peering application (called BGP Router) integrated with an instance of Quagga BGP. The controller also includes a device-driver specifically written to control an OF-DPA based OpenFlow switch (more specifially it is meant to be used with OF- DPA v2.0). The controller uses OpenFlow v1.3.4 to communicate with the hardware switch. The hardware switch can be a bare-metal switch from either Ac cton (5710) or from Quanta (LY2). On the bare-metal switch we run an open switch operating system (ONL) and an open install environment (ONIE) from the . In additon, we run the Indigo OpenFlow Agent on top of OF-DPA, contributed by Big Switch Networks and Broadcom.

On the right, the control plane stack remains the same. The one change is that we use a different device-driver in the controller, depending on the vendor equipment we work with. Currently Atrium release 2015/A works with equipment from Noviflow (1132), Centec (v350), Corsa (6410), Pica8 (P-3295), and N etronome. The vendor equipment exposes the underlying switch capabilties necessary for the peering application via an OpenFlow agent (typically OVS) to the control plane stack.

Details can be found in the Atrium 2015/A release contents.

Installation Guide:

Distribution VM

To get started with Atrium Release 2015/A, download the distribution VM (Atrium_2015_A.ova) from here: size ~ 2GB https://dl.orangedox.com/TfyGqd73qtcm3lhuaZ/Atrium_2015_A.ova login: admin password: bgprouter

NOTE: This distribution VM is NOT meant for development. Its sole purpose is to have a working system up and running for test/deployment as painlessly as possible. A developer guide using mechanisms other than this VM will be available shortly after the release.

The VM can be run on any desktop/laptop or server with virtualization software (VirtualBox, Parallels, VMWare Fusion, VMWare Player etc.). We recommend using VirtualBox for non-server uses. For running on a server, see the subsection below.

Get a recent version of VirtualBox to import and run the VM. We recommend the following:

1) using 2 cores and at least 4GB of RAM.

2) For networking, you can "Disable" the 2nd Network Adapter. We only need the 1st network adapter for this release. 3) You could choose the primary networking interface (Adapter 1) for the VM to be NATted or "bridged". If you choose to NAT, you would need to create the following port-forwarding rules. The first rule allows you to ssh into your VM, with a command from a Linux or MAC terminal like this:

$ ssh -X -p 3022 admin@localhost

The second rule allows you to connect an external switch to the controller running within the VM (the guest-machine) using the IP address of the host- machine (in the example its 10.1.9.140) on the host-port 6633.

If you chose to bridge (with DHCP) instead of NAT, then login to the VM to see what IP address was assigned by your DHCP server (on the eth0 interface). Then use ssh to get in to the VM from a terminal:

$ ssh -X admin@

You can login to the VM with the following credentials --> login: admin, password: bgprouter

Once in, try to ping the outside world as a sanity check (ping www.cnn.com). Running the Distribution VM on a Server

The Atrium_2015_A.ova file is simply a tar file containing the disk image (vmdk file) and some configuration (ovf file). Most server virtualization software can directly run the vmdk file. However, most people prefer to run qcow2 format in servers. First untar the ova file

$ tar xvf Atrium_2015_A.ova

Use the following command to convert the vmdk file to qcow2. You can then use your server's virtualization software to create a VM using the qcow2 image.

$ qemu-img convert -f vmdk Atrium_2015_A-disk1.vmdk -O qcow2 Atrium_2015_A-disk1.qcow2

Running the Distribution VM on the Switch

While it should be possible to run the controller and other software that is part of the distribution VM directly on the switch CPU in a linux based switch OS, it is not recommended. This VM has not been optimized for such an installation, and it has not been tested in such a configuration.

Installation Steps

Once you have the VM up and running, the following steps will help you to bring up the system.

You have two choices:

A) You can bring up the Atrium Router completely in software, and completely self-contained in this VM. In addition, you will get a complete test infrastructure (other routers to peer with, hosts to ping from, etc.) that you can play with (via the router-test.py script). Note that when using this setup, we emulate hardware-pipelines using software switches. Head over to the "Running Manual Tests" section on the Test Infrastruture page.

B) Or you could bring up the Atrium Router in hardware, working with one of the seven OpenFlow switches we have certified to work for Project Atrium. Follow the directions below:

Basically you need to configure the controller/app, bring up Quagga and connect it to ONOS (via the router-deploy.py script), and then configure the switch you are working with to connect it to the controller - 3 easy steps! The following pages will help you do just that:

1. Configure and run ONOS 2. Configure and run Quagga 3. Configure and connect your Switch Accton 5710 Centec v350 Corsa 6410 Netronome NoviFlow 1132 Pica8 P-3295 Quanta LY2

User Guide:

Control Plane User Guide

In this section we will introduce some of the CLI commands available on ONOS and Quagga, at least the ones that are relevant to Atrium. We will use the following network example.

With reference to the picture above, the Atrium router comprises of a dataplane OF switch (with dpid 1), ONOS, Quagga BGP and a control plane switch (OVS with dpid: aa) that shuttles BGP and ARP traffic between ONOS and Quagga. More details of the internals can be found in the System Architecture section. Normally the Dataplane Switch would be the hardware switch of your choice, but for the pupposes of this guide we have chosen a software switch (also OVS) that we use to emulate a hardware pipeline.

The Atrium Router has an AS number 65000. It peers with two other traditional routers (peers 1 and 2) which have their own AS numbers (65001 and 65002 respectively). Hosts 1 and 2 are reachable via the peers 1 and 2 resp, and these peers advertise those networks to our Atrium router. The traditional routers, peers 1 and 2, could be any regular router that speaks BGP - we have tried Vyatta and Cisco - but in this example they are Linux hosts behaving as routers with Quagga running in them.

Here is a look at the BGP instances in the Atrium Router as well as the peers. The BGP instance in the Atrium Router control plane is simply called "bgp1" (you can change it if you want to, in the router-deploy.py script). The "show ip bgp summary" command shows the peering session status for the Atrium Router's quagga instance. We see that there are 3 peering sessions that are Up (for roughly 2 minutes when this screenshot was taken). The 1.1.1.1 peering session is with ONOS - there is a lightweight implementation of I-BGP within ONOS which is used by ONOS to pull best-route information out of Quagga BGP. Both the Quagga BGP and ONOS BGP are in the same AS 65000 (for I-BGP). The E-BGP sessions with the peers (AS 65001 and 65002). We can see that Atrium Quagga BGP has received 1 route each from the neighbors. The "show ip bgp" command gives details on the received routes (or networks). We see that the 1.0/16 network was received from peer1, and the AS path is simply [65001] (which is via the next-hop peer1). Similar information can be seen at the peers' BGP instances. In peer1, we see that there is only one peering session, which is with the Atrium router (actually its Quagga BGP instance). The 1.0/16 network is directly attached, and the 2.0/16 network is reachable via the AS path [65000, 65002] where the next hop is the Atrium router.

More info on Quagga BGP CLI can be found here: http://www.nongnu.org/quagga/docs/docs-info.html#BGP

Lets see what we can find on the ONOS Karaf CLI.

"summary" gives a summary of params as seen by ONOS. For example, we are running a single-instance of ONOS (nodes=1), there are two switches attached (devices=2, more on that later), there are no links as Link Discovery is turned off (Atrium is a single router, not a collection of switches), there are 4 host detected (more on that later), 15 flows, and we do not use ONOS intents (which are meant for network-wide intents, not switch-level intents)."apps - s" shows the apps that are active (with *).

"devices" shows switch information. There are two switches - one is the data plane switch (dpid:0x1) which is normally a hardware switch in Atrium, but in this example we are emulating the hardware switch with an OVS - note that the "device driver" we are using is the "softrouter" pipeline, which is also used by the Netronome and NoviFlow switches; the other "device" is the control-plane OVS (dpid:0xaa) which to ONOS also shows up as a dataplane switch (this is just a result of the system architecture). "ports" and "portstats" are self-explanatory.

"hosts" gives info on what endpoints ONOS has discovered as being attached to the devices. Of significance are the peer-routers that ONOS has discovered as being attached to the dataplane switch (0x1) on ports 1 and 2. Note that even though this information was configured (in sdnip.json), ONOS still needs to discover them and resolve their IP<->MAC address via ARP.

"routes" tells us the BGP routes learnt via the I-BGP session with Atrium Quagga BGP. "bgp-routes" gives more details - this is showing the same info you get with "show ip bgp" on the Quagga CLI. "bgp-neighbors" shows the only neighbor for the ONOS BGP implementation, which is the Quagga BGP speaker over the I-BGP session between them. "flows" shows the flows ONOS is trying to install in the dataplane switch (0x1) and in the control plane OVS (0xaa). The number of flows in the control- plane OVS will never change; they are statically put in as soon as the switch connects, but it is really important that all 3 flows are in there. The number of flows in the dataplane switch will be different depending on which hardware switch you are using and what pipeline it exposes (via the ONOS driver for that switch). In this example, since we are using the "softrouter" driver, the pipeline has only 2 tables (tableids 0 and 1). The number of flows in the dataplane switch will also vary over time, depending on the routes that are advertised by peers.

In ONOS terminology, "selector" corresponds to OF matches, and "treatment" corresponds to OF actions or instructions. Note that in the softrouter pipeline, Table 1 is the LPM table, and you can find rules which match on the learnt 1.0/16 and 2.0/16 dstIP addresses.

"flows" also shows a state: "ADDED" means that ONOS has verified that the flows are indeed in the data-plane hardware switch; "PENDING ADD" usually means that ONOS is trying to add the flows to the switch -- if the state remains that way for a while, it usually means there is some problem getting them in there -- check the logs.

"flows" also shows flow-stats (packets and bytes that have matched on the flows in the dataplane), but this is dependent on whether your hardware switch supports flowstats - many don't :(

Also the "softrouter" driver does not use OpenFlow groups, so "groups" on the ONOS cli shows no groups, but other piplines/drivers like centec, corsa and ofdpa will show the group entries.

Arbitrary networks which use BGP as the routing protocol, can be created with Atrium routers that peer with each other and with traditional routers. Here is one we demonstrated at the recent Open Networking Summit.

Head over to the User Guide on Switches for assorted tips and tricks on using the switch CLI of your hardware switch vendor.

System Architecture:

Architecture of the Atrium Router

The Atrium Router uses an open-source featureful BGP implementation like Quagga for peering with other routers using E-BGP. Note that the E-BGP arrows shown in the diagram above are merely conceptual - actual E-BGP communication paths are through the dataplane ports (in-band, like in any router). Atrium is currently built using ONOS as the controller. The ONOS controller itself peers with Quagga using I-BGP. As such I-BGP is the primary northbound interface exposed by the controller’s application layer. Other BGP implementations (like Bird or proprietary implementations) can potentially plug-in to Atrium by replacing Quagga.

The routing application in ONOS performs the functions shown in the figure on the right. The routing application is actually a combination of several applications. ARP and ICMP handlers are self-explanatory. To support learning routes from Quagga or any other BGP speaker, the ONOS routing application supports a lightweight implementation of I-BGP (see the apps "routing" and "routing-api" in ONOS). These routes are resolved by the RIB component of the application and exported to other apps that are known as FIB Listeners. A FIB listener/Flow-Installer is responsible for installing flows in the underlying switch hardware. Atrium uses an application known as "BGP Router" as the FIB Listener/Flow Installer.

The BGP-OF encap/decap module requires more explanation. The Quagga BGP instance communicates with external peers using E-BGP. Traffic for these E-BGP sessions use the data-plane switch interfaces like any normal router (in-band). In the SDN based router, the E-BGP traffic has to be delivered from the data-plane to the external controller via some mechanism.

One way in which the E-BGP traffic can be delivered to Quagga, is shown in the figure above - encapsulating the E-BGP traffic via OpenFlow packet-ins and packet-outs. The Quagga speaker expects the BGP traffic on Linux interfaces. The easiest way to deliver this traffic to Quagga is by using a control- plane vswitch (typically OVS) as the delivery mechanism. The dotted line in Figure 9 shows the path followed by the E-BGP traffic from an external BGP peer. The traffic enters the dataplane switch interface and is punted to the controller via an OpenFlow packet-in on the switch’s out-of-band OpenFlow channel. This reaches the “BGP-OF encap/decap” module which is part of the Routing Application in ONOS (see TunnelConnectivityManager in BGPRouter app). This module redirects the E-BGP traffic to the control-plane OVS via an OpenFlow packet-out. Finally, the E-BGP traffic reaches the vEth interface on the Quagga BGP process. Similar encap/decap happens on the reverse path.

So far we have discussed the "upper" parts of the architecture. Equally important are the "southbound" parts. Atrium currently works on 7 different hardware switches that expose 5 different OF 1.3 pipelines. How can a controller manage the differences between these pipelines? How can an application work across all these differences? In Project Atrium, we have come up with a solution to these important questions that have in the past plagued the advancement of OF1.3 solutions.

Our architectural contributions include device-drivers and the Flow Objective API. Controllers manage the differences in OF hardware pipelines via device- drivers written specifically for those pipelines. Applications work across all the pipeline differences by mostly ignoring pipeline level details. This is made possible by the Flow Objectives API and the implementation of the API in the drivers and the FlowObjective Manager. Basically, Flow Objectives offer a mechanism for applications to be written in pipeline agnostic ways, so that they do not have to be re-written every time a new switch/pipeline comes along. The applications make Flow Objective calls, and then it is the job of the device driver to translate that call into actual flow-rules that are sent to the switch.

More details on Flow Objectives can be found here: https://community.opensourcesdn.org/wg/Atrium/document/15

The API is in the process of being documented. For now the code can be found here: https://github.com/opennetworkinglab/onos/tree/master/core/api/src /main/java/org/onosproject/net/flowobjective

Test Infrastructure

The test infrastrure for Atrium right now is completely in software and self contained in the distribution VM. What that means is that we do not provide any test infrastructure for testing your hardware switch for Atrium. You could however duplicate our software-based setup in hardware to test your hardware switch. In future releases, we will look to add support for hardware testing. For now, we have two options: we can run manual tests with a test setup, or run more automated tests.

Running Manual Tests

A test script has been provided in the top-level directory: router-test.py Here are the steps to bring up the test:

1. Install and launch ONOS as described here but with a couple of changes. First the setup we will use is shown in the diagram below.

You will need to configure ONOS with the details it needs. We have provided such config files in the Applications/config folder. Run the following commands:

$ cd Applications/config

$ cp mn_addresses.json addresses.json

$ cp mn_sdnip.json sdnip.json

$ cd ~

2. Next go ahead and launch ONOS with 'ok clean'. From another shell, enter the following command:

$ onos-topo-cfg localhost ~/Applications/config/mnrouterconfig.json

The file looks like this:

{

"devices": [{ "alias": "s1", "uri": "of:0000000000000001", "annotations": { "driver": "softrouter"}, "type": "SWITCH" }

]

}

What this does is that it tells ONOS to expect the switch 0x1 and assign it the "softrouter" driver. Essentially what we are doing here is that we are emulating a hardware pipeline with a software switch. We are going to launch the entire setup shown in the picture above. The switch 0x1 would normally be a hardware switch, but here it will be an OVS. Normally ONOS will treat OVS as a single table software switch (default driver). But by running the command above, we are telling ONOS to assign the "softrouter" driver. In essence we are emulating the NoviFlow and Netronome pipelines we used in Atrium because they both use the "softrouter" driver as well.

3. Now we are ready to launch the rest of the setup. From another shell, run the following command:

$ sudo ./router-test.py &

4. The switches should have already connected to ONOS. On the ONOS CLI you can check: onos> devices id=of:0000000000000001, available=true, role=MASTER, type=SWITCH, mfr=Nicira, Inc., hw=Open vSwitch, sw=2.3.1, serial=None, driver=softrouter, protocol=OF_13, channelId=127.0.0.1:55944 id=of:00000000000000aa, available=true, role=MASTER, type=SWITCH, mfr=Nicira, Inc., hw=Open vSwitch, sw=2.3.1, serial=None, protocol=OF_13, channelId=127.0.0.1:55945 And if you run admin@atrium:~$ ps aux | grep mininet root 3625 0.0 0.0 21140 2128 pts/2 Ss+ 17:01 0:00 bash --norc -is mininet:c0 root 3631 0.0 0.0 21140 2128 pts/4 Ss+ 17:01 0:00 bash --norc -is mininet:bgp1 root 3637 0.0 0.0 21140 2128 pts/5 Ss+ 17:01 0:00 bash --norc -is mininet:host1 root 3639 0.0 0.0 21140 2132 pts/6 Ss+ 17:01 0:00 bash --norc -is mininet:host2 root 3641 0.0 0.0 21140 2128 pts/7 Ss+ 17:01 0:00 bash --norc -is mininet:peer1 root 3644 0.0 0.0 21140 2128 pts/8 Ss+ 17:01 0:00 bash --norc -is mininet:peer2 root 3648 0.0 0.0 21140 2128 pts/9 Ss+ 17:01 0:00 bash --norc -is mininet:root1 root 3653 0.0 0.0 21140 2128 pts/10 Ss+ 17:01 0:00 bash --norc -is mininet:router root 3656 0.0 0.0 21140 2128 pts/11 Ss+ 17:01 0:00 bash --norc -is mininet:s1 admin 4012 0.0 0.0 11736 936 pts/1 S+ 17:06 0:00 grep --color=auto mininet you see all the hosts and peers in their own linux containers instantiated by mininet.

5. Now you have the ability to poke around in any of them using the ./mininet/util/m utility.

For example, use

$ ./mininet/util/m host1 to enter host1 to ping 2.0.0.1 (host2) - it should work :)

Or go into peer2

$ ./mininet/util/m peer2 to enter into the quagga router peer2 admin@atrium:~$ ./mininet/util/m peer2 root@atrium:~# telnet localhost 2605

Trying 127.0.0.1...

Connected to localhost.

Escape character is '^]'.

Hello, this is Quagga (version 0.99.22.4).

Copyright 1996-2005 Kunihiro Ishiguro, et al.

User Access Verification

Password: sdnip peer2> sh ip bgp summary

BGP router identifier 192.168.20.1, local AS number 65002

RIB entries 3, using 336 bytes of memory

Peers 1, using 4560 bytes of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd

192.168.20.101 4 65000 251 254 0 0 0 00:12:27 1

Total number of neighbors 1 peer2>

You can even run OVS commands to see the flow-table in the data-plane switch admin@atrium:~$ sudo ovs-ofctl dump-flows router -O OpenFlow13

Note that the "router" is the OVS switch (dpid:0x1) pretending to be a NoviFlow or Netronome switch due to the use of the "softrouter" pipeline/driver.

6. Feel free to edit the script to create whatever topology you wish to test. Also change the driver configuration you feed into ONOS (in step 2 using onos- topo-cfg) to whichever pipeline you wish to test - you can choose between the 5 that have been certified to work for Atrium.

7. Remember to cleanup after you are done: ./router-cleanup.sh

Running Automated Tests

The distribution VM also ships with experimental support for running automated tests for Atrium. Here is the topolology we use:

Here are the Test cases we run:

CASE5 - Basic Route advertisement and connectivity in tagged network

Generates 10 routes in each ASs ( 30 routes total in three ASs) Check if ONOS controller receives all route advertisements Check if ONOS controller pushes the flow rules for all the routes by sending ICMP packets among three ASs Remove all the routes from the three ASs and check all routes are removed from ONOS controllers, and also connectivities are removed using ICMP.

CASE7 - Scale test with 6000 routes

Generates 6000 routes total and check the routes in ONOS and connectivities of 30 routes of them.

CASE8 - Flap a route 20 times with 100 msec interval

Add and remove a route 20 times with 100 msec interval and add the route finally Check the route in ONOS controller and the connectivity using ICMP. Add and remove a route 20 times with 100 msec interval and remove the route finally Check no route in ONOS controller and the connectivity using ICMP.

CASE9 - Flap a next-hop 20 times with 100 msec interval

Change a route 20 times with 100 msec interval Check the route in ONOS controller and the connectivity using ICMP.

CASE31 - Route convergence due to bgp peering session flapping in tagged network

Change a route and wait until the route converges and check the final route in ONOS and the connectivity using ICMP CASE32 - Basic Route advertisement and connectivity in tagged network with Route server

Repeat the CASE5 now using a Route server.

Here is how to run these Tests:

1. $ cd TestON/tests/PeeringRouterTest

2. Edit PeeringRouterTest.params to select the test you wish to run out of the choices above. Test case 1 must always be run as it sets up the environment. Running all the tests above currently can take an hour :( It will be improved in subsequest releases. This is why it is experimental support :)

3. The tests themselves are in PeeringRouterTest.py and the configuration files (for Quagga, ONOS etc.) for each test case are in the "vlan" directory

4. To run the tests:

$ cd ~/TestON/bin/

$ ./cli.py run PeeringRouterTest

5. Remember to cleanup after you are done: ./router-cleanup.sh