Bachelor Informatica

Utilizing p4 switches in con- tainer networking

Ries van Zon - 11302836

June 15, 2020 Informatica — Universiteit van Amsterdam

Supervisor(s): Msc Sara Shakeri, dr. Paola Grosso Signed: Signees page 2 Abstract Containers are very popular these days and there are several software tools by which a container can be created. To connect containers the most popular method is utilizing an overlay network method. A container overlay network is easy to deploy. The drawback is that with creating an overlay network each packet will get an extra layer of encapsulation. We try to find an alternative for the overlay solution by creating connectivity between containers using a p4 switch. A TCP connection is created between local containers. Currently, there is no TCP connection possible between two containers on a different host. We narrowed down what the problem could be. Our findings show that the problem occurs at L4. L3 and L4 (concerning TCP) connections between Docker containers on the same host are possible. For containers on different hosts (After NAT is applied) L3 connectivity is possible but the connection at L4 is refused.

page 3 page 4 Contents

1 Introduction 7 1.1 Introduction...... 7

2 Theoretical background9 2.1 Process isolation...... 9 2.1.1 Namespaces...... 9 2.2 Containers...... 10 2.2.1 Linux container...... 10 2.2.2 Docker containers...... 10 2.3 p4 switches...... 11 2.3.1 Defenition of a simple p4 program...... 12 2.3.2 Deployment of a p4 program...... 15 2.4 Overlay network...... 16

3 Related work 17 3.1 Connectivity of docker containers...... 17 3.2 Usage of p4 in SDN...... 18

4 Method 19 4.1 Establishing connectivity between containers using a p4 switch...... 19 4.1.1 Connecting a Docker container to a p4 switch...... 19 4.2 TCP connection between containers using a p4 switch...... 20 4.2.1 Network topologies...... 24 4.2.2 TCP connection between containers in a single host...... 26 4.2.3 TCP connection between containers in multiple hosts...... 30

5 Discussion 33 5.1 Multiple containers on the same host...... 33 5.2 Single container, multiple hosts...... 34 5.3 Firewall settings...... 34

6 Conclusions 37 6.1 Conclusion...... 37 6.2 Future work...... 37 6.2.1 Ethics...... 38

Appendices 43

A Appendix A 45

5 CONTENTS

page 6 CONTENTS CHAPTER 1 Introduction

1.1 Introduction

Containers are very popular these days and there are several software tools by which a container can be created. Like Docker. A container is just another way of . Despite virtual machines that use separated hardware and on top of that separated operating system, in con- tainers, the virtualization is at OS level and they share the host’s kernel. Therefore, comparing with Virtual Machines (VMs) the number of resources that are utilized in container technologies is much less. [8]. Another advantage of containers is that they are quick to set up and easy to deploy[8]. Containers can also run on any distribution not dependent on the hardware of the distribution. To connect containers the most popular method is utilizing overlay network methods. A container overlay network is easy to deploy. The drawback is that with creating an overlay network each packet will get an extra layer of encapsulation. This decreases performance. Since there is more and more data send through the network nowadays, it’s important to minimize the overhead where possible. The challenge is to create a network of containers, with the benefits of containers but without or at least with reduced overhead. We will try to accomplish this by implementing a p4 switch. A p4 switch is a switch from which we can program the data plane. With a p4 switch, we have more control over the incoming packets and how to forward them. P4 allows building more intelligent control mechanisms with regards to packets. To implement the p4 switch we have to find a way to change the docker default settings making a container more flexible and giving us more control over a container. To do this we have to be careful not to lose the advantages of a docker container which docker provides. To answer the research question

• how can we minimize the overhead in Docker container networks by using mod- ern technologies such as p4 switches?

First, we will focus on establishing connectivity between containers. In the chapter 2 (background) and chapter 3 (related work), we will show how a container can be con- nected to a p4 switch. In the chapter 4 (method), we will examine several implementations with two network settings, for a docker container, to set up a TCP connection between containers. The method chapter focusses on the question: – how can we create connectivity between containers by utilizing a p4 switch?

In the chapter 5 (Discussion) we will discuss and evaluate the results after which will we draw some conclusions and propose some topics for future work in the chapter 6 (Conclusion).

7 1.1. INTRODUCTION

page 8 CHAPTER 1. INTRODUCTION CHAPTER 2 Theoretical background

2.1 Process isolation

Around ’79 the command was introduced in Linux systems. The chroot command on Unix operating systems is an operation that changes the apparent root directory for the current running process and its children. When you change root to another directory you cannot access files and commands outside that directory. With the chroot command we could create the impression of an isolated process. Chroot simply modifies pathname lookups for a process and its children, prepending the new root path to any name starting with /. The current directory is not modified and relative paths can still have access to any location outside of the new root. This is seen as the start of process isolation which has led to the containers of today.

2.1.1 Namespaces Namespaces are used to isolated different sets of global resources and restrict access to an inner process to its sandbox. There are seven types of namespaces [9] of which three are of special importance concerning a container: • Network namespace (netns) • Control group namespace (cgroup) • Process ID (pid)

Network namespace Network namespaces visualize the network stack. On creation, a network namespace contains only a loopback interface. Each network interface (physical or virtual) is present in exactly one namespace and can be moved between namespaces. A network namespace isolates net- work devices, ports and has a private set of ip addresses, its routing table, ARP table, and other network-related resources. Destroying a network namespace destroys any virtual interfaces within it and moves any physical interfaces within it back to the initial network namespace.

Initially, all the processes share the same default network namespace from the init process. By convention when a named network namespace is created it’s an object at /var/run/netns/name that can be opened and listed by the command ip netns. The network namespace does not belong to any process yet. By attaching the network namespace to the network namespace of a process we isolate the network resources of that process.

Cgroups A control group (cgroup) is a collection of processes that are bound by the same criteria and associated with a set of parameters or limits. These groups can be hierarchical, meaning that each

9 2.2. CONTAINERS group inherits limits from its parent group. The kernel provides access to multiple controllers (also called subsystems) through the cgroup interface A cgroup namespace virtualizes the contents of the /proc/self/cgroup file. Processes inside a cgroup namespace are only able to view paths relative to their namespace root.

PID namespace PID namespaces are nested, meaning when a new process is created it will have a PID for each namespace from its current namespace up to the initial PID namespace. Hence the initial PID namespace can see all processes, but with different PIDs than other namespaces will see processes with.

2.2 Containers

2.2.1 Linux Containers is an virtualization method for running multiple isolated Linux systems (con- tainers) on a host using a single (Linux) kernel. The Linux kernel provides the functionality that allows limitation and prioritization of resources (CPU, memory, block I/O, network, etc.) without the need for starting any virtual machines, and also namespace isolation functionality that allows complete isolation of an appli- cation’s view of the operating environment, including process trees, networking, user IDs and mounted file systems. LXC combines the kernel’s cgroups and support for isolated namespaces to provide an isolated environment for applications. A container is a process from which different namespaces are isolated by attaching different customized namespaces to the namespaces of the process. Each process has its own idea of its root. By default, the root of a process is set to the system root. We can change the root of a process with the chroot command. When we create a container we create it from the context of the current process where from where the command is executed. This means that by default the root of the container is set to the root of the process from which the container was created. Today’s container makes use of two features of the kernel: cgroups and namespaces. Before cgroups every process resources were managed individually. Applications that had more processes running using were using more system resources. With cgroups we can create a subgroup and assign a certain amount of system resources to that subgroup. All sub-processes belonging to that subgroup can only use the resources that were assigned to their subgroup they belong too. If we list (command -cgls cpu) the cgroups we see that for each container a subgroup is created. To manage the resources manually we can go to the directory /sys/fs/cgroup/. In this directory, we find the three main resources cpu, memory and blocking i/o. We can enter one of these directories and we will find the user.slice which manages the resources for user tasks and system.slice which manages the resources for system services. In case if docker is installed we will also find a docker directory that manages the resources concerning docker devices.

2.2.2 Docker containers Docker containers look a lot like network namespaces. But when we install docker there is also a new cgroup created. All the containers created by docker will belong to the docker subgroup/cgroup. Docker will set up some default configuration files but we can find the id of the container and go to /sys/fs/cgroup/cpu/containerID. and manage the cpu resources for that container manually. We can do the same for other system resources. Besides creating a cgroup docker also creates a network namespace each time we created a new container. Usually, if a new network namespace is created we can see the name of the network namespace if we list all the newly created network namespaces with the command ip netns. The network namespace of a docker container though will not appear in this list. This is because docker removes all the network namespaces for their containers from the /var/run/netns/ folder and manages them from /proc/containerID/ns/net file. page 10 CHAPTER 2. THEORETICAL BACKGROUND 2.3. P4 SWITCHES

Figure 2.1: Pipeline of a p4 switch

By default, all containers have the PID namespace enabled. PID namespace provides sepa- ration of processes. The PID Namespace removes the view of the system processes and allows process ids to be reused including pid 1. In certain cases you want your container to share the host’s process namespace, basically allowing processes within the container to see all of the pro- cesses on the system. For example, you could build a container with debugging tools like strace or gdb, but want to use these tools when debugging processes within the container. [5].

2.3 p4 switches

In this section, we will explain the basics of a p4 program. A p4 switch is a switch from which we can program the data plane with the domain-specific language p4. The key elements of a p4 switch/program are: the parser/deparser, ingress/egress control, match tables, actions and the CLI. These are software elements, therefore for an element to exists it must be defined in code. P4 is the language that is used to define these elements definitions. The elements are regular code constructs that can be interpreted by the p4 com- piler. A programmer is bounded by the set of elements which can be interpreted by the p4c-bmv2 compiler just as in any other programming language. One main difference with other programming languages is that p4 does not know the repetition construct. Also, the if statement can’t be used everywhere. P4 does know the select construct, which can be thought of as the equivalent of an if-statement.

A p4 switch can be considered as a pipeline (as is shown in figure 2.1). The pipeline has three stages in which a packet an be controlled by the switch. These stages are the parser, ingress, egress. The parser defines which layers of the packets must be parsed. Notice that the switch can only access the parameters from the headers/layers that have been parsed. Before a L2 device would only look at L2 of a packet. With p4 all layers can be accessed. When the parser is finished the packet will be sent to ingress. In the ingress construct will be specified which matching tables will be used to apply on the packets. A table definition contains the characteristics of a packet and the action(s) it must apply to this packet. When the table sees a packet with this set of characteristics the action(s) are applied to this packet.

CHAPTER 2. THEORETICAL BACKGROUND page 11 2.3. P4 SWITCHES

The action is a construct where can be specified what has to happen to a packet when it has met the criteria of a table. Once the switch has applied the action(s), the packet will be sent to egress. Also at egress tables and their set of actions can be applied to packets. Once a packet is finished at egress the deparser will reassemble the packet and send the packet to a specified port. In p4, new header types can be specified as well as metadata objects. Metadata can be used to store (as long the packet is in the pipeline) information about the packet. The standard metadata object is already defined by default. Metadata can be accessed by the switch without parsing the packet. The standard metadata contains information like at which port did the packet arrive. To forward a packet, an action must be specified which sets the egress parameter of the standard metadat object. Not setting the egress port will lead to packet loss. The whole standard metadata object can be found in the Appendix A.1. The values on which a table has to match packets can be set through rules added to the CLI. The format of a rule is given by: table add

... => . s not consider the container configuration that is needed to bind a container to the switch.

2.3.1 Defenition of a simple p4 program A p4 program starts with a definition of it’s headers we want the parse. After parsing we have access to each header with it’s parameters.

1 # Define headers which we can use in our program. 2 header_type ether_hdr_t { 3 fields { 4 dst : 48; 5 src : 48; 6 ethertype : 16; 7 } 8 } 9 header ether_hdr_t ether_hdr; 10 11 header_type ipv4_hdr_t { 12 fields { 13 version : 4; 14 ihl : 4; 15 tos : 8; 16 total_len : 16; 17 id :16; 18 flags : 3; 19 offset : 13; 20 ttl : 8; 21 proto : 8; 22 checksum : 16; 23 src : 32; 24 dst : 32; 25 // Note: IP option fields are not parsed 26 } 27 } 28 header ipv4_hdr_t ipv4_hdr; 29 30 header_type ipv6_hdr_t { 31 fields { 32 version : 4; 33 traffic_class : 8; 34 flow_label : 20; 35 payload_len : 16; 36 next_hdr :8; 37 hop_limit :8; 38 src : 128; 39 dst : 128; 40 } 41 } 42 header ipv6_hdr_t ipv6_hdr; Listing 2.1: Header defenitions

page 12 CHAPTER 2. THEORETICAL BACKGROUND 2.3. P4 SWITCHES

The packet will enter the pipeline in the parser start block. We have two options or we extract the ether hdr or we send the packet to the ingress control construct. If we don’t specify anything no op (no operation) is applied which results in the packet not being forwarded. In this example, we check what the type is of the next header and parse the header accordingly. After we have parsed a header we can access its parameters in the following code.

1 # Here we parse the packet headers. We start with extracting the first header. 2 # Since we have defined the ether_hdr we can use its parameters to performa 3 # select in this case. 4 5 parser start { 6 # after line7 we can access all the parameters of the defined ether_hdr. 7 extract(ether_hdr); 8 return select(ether_hdr.ethertype) { 9 0x0800 : ipv4; 10 0x86dd : ipv6; 11 } 12 } 13 14 parser ipv4 { 15 # after line 16 we can access all the parameters of the defined ipv4_hdr. 16 # We can use these parameters in the ingress, table, action and egress constructs. 17 extract(ipv4_hdr); 18 return ingress; 19 } 20 21 parser ipv6 { 22 extract(ipv6_hdr); 23 return ingress; 24 } Listing 2.2: Code block parser. Resembled by Parser in the pipeline as shown in figure 2.1

After a packet has been parsed the packet is sent to the ingress control block. In the control ingress block, we have to define which table(s) we want to apply on a packet and in which order. If a packet does not match (miss) the table the next table (if there is a next table) will be applied. If none of the tables result in a hit the packet will be dropped by default.

1 control ingress { 2 apply(forwarding_table); 3 apply(forwarding_router_table); 4 5 } Listing 2.3: Ingress control block

In the table construct, we have to state which data we want to match on and how we want to match this data. In the example of Listing 2.4, we match on the destination ip address and it has to be an exact match.

1 table forwarding_table { 2 reads { 3 ipv4_hdr.dst : exact; 4 } 5 actions { 6 set_egress; 7 drop_packet; 8 } 9 } Listing 2.4: Match Action table. In the pipeline a packet will be matched to this table and (in case of a hit) send to the buffer (see figure 2.1

Other types of matches are lpm (longest prefix match), and ternary. The type of matching determines how we have to pass the argument to the CLI. In case of an exact match we have to pass an exact ip address and in case of lpm/ternary we pass an ip address and it’s mask or ip address/mask. The actions defined in the actions construct of the table can be triggered by a match. Only the stated actions can be applied.

CHAPTER 2. THEORETICAL BACKGROUND page 13 2.3. P4 SWITCHES

In this example, we have two actions. This means we can only trigger (in case of a hit) set egress or drop packet.

1 action set_egress(port) { 2 # set the egress port thourgh which the packet has to leave the switch. 3 modify_field(standard_metadata.egress_spec, port); 4 } 5 6 action drop_packet() { 7 # drop() isa built in function which drops te packet. 8 drop (); 9 } Listing 2.5: Defenition of a action

Through the CLI we pass the arguments for the match table and actions. I.e. we have a table forwarding table which matches on the source ip with action set egress: in the CLI we would set the arguments like:

1 2 table_add forwarding_table set_egress ip_address => egress_port 3 # if we want to forward all traffic from ip address 192.168.174.106 to port3 4 table_add forwarding_table set_egress 192.168.174.106 => 3 Listing 2.6: Defenition of a action

In the above example, 3 will be the argument (port) for the set egress action.

page 14 CHAPTER 2. THEORETICAL BACKGROUND 2.3. P4 SWITCHES

2.3.2 Deployment of a p4 program The process of deploying a new p4 program is shown in figure 2.2. First, a p4 file must be defined. Then, the file has to be compiled. This will result in a JSON file that can be run by the switch. Depending on the topology we have to move the file to the VM(s) where the file must be deployed. CLI rules have to be written, the interfaces that have to be bounded to the switch have to be found (the code how to find a docker veth is shown in Appendix A.2) after which the switch can be set up. The CLI rules (as discussed in the previous section) are implementation dependent. The argu- ments passed through the rules depend on the L2, L3 and L4 addresses used by the containers. If an address changes, all CLI rules that rely on that address have to be changed. To do so the invalid rules must be deleted manually after which the new rules can be (manually) added through the CLI. There must be tshark/wireshark queries written to analyze if the implementa- tion is behaving as we expect it to behave. Once all queries are written the switch can be set up, after which the rules can be added through the CLI. For all hosts in the network, the same steps have to be repeated. The experiments can start after all hosts are up. When the implementation has to change the deployment process starts from step one. This deployment graph does not consider the container configuration that is needed to bind a container to the switch.

Figure 2.2: Deployment process of a p4 switch

CHAPTER 2. THEORETICAL BACKGROUND page 15 2.4. OVERLAY NETWORK

2.4 Overlay network

Overlay network An overlay network runs on top of another network to build customized virtual links between nodes. Common forms of overlay network include IPIP, virtual extensible LAN (VXLAN), virtual private network (VPN), etc. There exist many overlay networks for Docker containers. Although they differ in implementation, the key idea is similar. Containers save the mapping between their private IP addresses and their host IP in a key-value (KV) store, which is accessible from all hosts. Containers use private IP addresses in a virtual subnet to communicate with each other. The overlay inserts an additional layer in the host network stack. When a packet is sent by a container, the overlay layer looks up the destination host IP address in the KV store using the private IP address of the destination container in the original packet. It then creates a new packet with the destination host IP address and uses the original packet sent by the container as the new packet’s payload. This process is called packet encapsulation. Once the encapsulated packet arrives at the destination host, the host network stack decapsulates the wrapped packet to recover the original packet and delivers it to the destination container using the container’s private IP address.

page 16 CHAPTER 2. THEORETICAL BACKGROUND CHAPTER 3 Related work

3.1 Connectivity of docker containers

In the research of [12] we see that we can set the driver of a container. The driver determines the visibility and connectivity of a container. In the docker documentation, there are three op- tions listed: host, bridge and none. The host option infers that a container shares the host’s networking namespace, and the container does not get its own IP-address allocated [5]. A con- tainer created in bridge mode will use a bridge network. The bridge network uses a software bridge that allows containers connected to the same bridge network to communicate while pro- viding isolation from containers which are not connected to that bridge network [5]. Containers created with the none mode provide no access to the external network as well as for other contain- ers. This mode will not configure any IP for the container. It only contains the loopback address.

Another way to configure the connectivity of containers is to expose ports and add rules to the firewall. From [5] we can read that docker adds three chains to the firewall. The documen- tation recommends only to alter the docker-user chain in case there is a need to filter, change or redirect packets. To expose ports a container can be started with the -p flag. This gives the user the opportunity the specify the port of the container which must be exposed to the host. I.e. docker run -p 127.0.0.1:80:8080/TCP ubuntu bash will expose port 8000 of the container to port 80 of the host 127.0.0.1 and accepts TCP traffic [5]. Ports can also be exposed by changing the docker file. Docker provides also an option to manually add connectivity by using the ip netns command. Ip netns finds the container by using namespaces pseudo-files.The namespaces of a container are materialized under: /proc/$Pid/ns. I.e. the network namespace of a container can be found at: /proc/$Pid/ns/net. When we run ip netns exec containerID , it expects /var/run/netns/containerID to be one of those pseudo-files [5]. To be able to use the ip netns command we have to find the pid of the container. We can find the pid with sudo docker inspect −f 0.State.P id0 containerID. Then we have copy the pseudo file found at /proc//ns/net to /var/run/netns/containerID [5]. From now on we can treat the network namespace of the container containerID like a Linux network namespace created with the ip netns command. The docker overlay network causes extra overhead because the packets get an extra layer of encapsulation in the form of a VXLAN header. A script was produced to avoid all the above steps have to be repeated by hand when there was the need to create a new container. The script can be found in Appendix A.3. The changes to the containers are not persistent. In case the host is restarted the containers do exist but have to be configured again. The configuration, after the host was restarted, is done by the script in Appendix A.3

17 3.2. USAGE OF P4 IN SDN

3.2 Usage of p4 in SDN

P4 is a relatively new language. There is not a lot of research done in general in this area. Espe- cially p4 related to its performance is not widely examined by the science community. Currently, p4 is mostly used to replace, improve SDN functionalities implemented through predecessors of p4 such as OpenFlow/ open vSwitch or create a functionality that was not possible with ear- lier techniques [3][11][6]. P4 was used to program the data plane of an SDN switch to achieve multiple routing configurations in case that a node in the network goes down, the switches in the network would be updated and another routing configuration would be deployed by switches itself [6]. The switches were incorporated in a mininet network. Although the paper does not state which type of switch (i.e. BMV2 or Tofino TM Switch) was used, this does show that p4 can be used to program a SDN switch and forward internet traffic. On the Github repository of the p4-bmv2 development team [2] a bmv2 switch was implemented in a mininet network. In both studies [2][6] NAT was applied only at L2. Nodes in mininet are processes with their own network namespace to isolate the network stack of the process to create the illusion of a separate node. The nodes in this research use the same network namespaces to isolate the network stack of a container. P4 gives the user a powerful tool to control packets. This must not be abused. The number of field writes and the number of tables in a p4 program will increase the latency [4]. The number of tables can increase latency significantly especially when the p4 program is using the bmv2 model [4]. While designing our program we should be aware of the negative impact header alterations, field writes and the number of tables has on the latency of the network. The performance of a SDN switch can also be improved by increasing the TCP window [1]. In this study, the private container addresses will be translated into public addresses of the host of a container using. This involves L4 NAT. Therefore, to find an alternative for the Docker overlay solution it’s important to examine whether a connection can be made between Docker containers when NAT at L4 is applied. None of the found researches combine a p4 switch with Docker containers nor do they apply NAT at L4.

page 18 CHAPTER 3. RELATED WORK CHAPTER 4 Method

4.1 Establishing connectivity between containers using a p4 switch

In our experiments, we used two virtual machines functioning as the host of containers. Each is running Ubuntu 18.04 and Linux kernel 4.15.0, and for running containers, we used Docker Community Edition 18.09. We used p4 behavioral model (bmv2) for running p4 switches. P4 programs were compiled with the p4c-bmv2 compiler. In subsection 4.2 we will present the files that were used in the experiments. We run p4 switch in each host, connect the containers of each host to the p4 switch at the same host and then check the connectivity between them.

4.1.1 Connecting a Docker container to a p4 switch By default, a docker container is connected to the docker0 bridge. To be able to connect a container to the switch instead of the docker0 a container should be created with the –network flag set to none like:

1 sudo docker run --name --network=none ubuntu:latest /bin/sleep 1000000 & Listing 4.1: Create container without a network driver

. The above command tells docker to create a container but not to connect it to the bridge. In the next step, we create a veth pair and set one veth in the network namespace of the container. The other end of the veth pair will be connected to the p4 switch when we bring the switch up. To add veth to the container network namespace, the container object has to be visible for the ip netns command. Therefore, we have to add the network namespace of container to the list of network namespaces of the host and make the container visible for the ip netns command. This can be done by executing the commands of listing 4.1:

1 pid=$(sudo docker inspect -f’{{.State.Pid}}’ container_name) 2 sudo mkdir -p /var/run/netns/ 3 sudo ln -sfT /proc/${pid}/ns/net /var/run/netns/container_name Listing 4.2: Make a container visible in the ip netns directory.

From this point, veths can be set to the network namespace of the container. Any ip address can be assigned to the veth that will be set to the network namespace of the container. The other side of the veth pair can be bound to the switch with the flag

1 -i @ Listing 4.3: Bind an interface to a port of the switch

19 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

Now, it’s possible to connect a container to the switch. The next step is to set up a network and to establish L3 and L4 connectivity. This will be discussed in the next sections.

4.2 TCP connection between containers using a p4 switch

The two main types of internet traffic are traffic with an established connection before sending data (TCP) and traffic without an established connection before sending data (UDP). We will first focus on establishing a TCP connection because most traffic involves TCP traffic. Since the main focus of this research is to establish a TCP connection, UDP will not further be discussed.

Three different p4 files were used during this research to establish a TCP connection between containers. From now on we will refer to these files as implementations. Below we will briefly discuss their use and their characteristics, supported by code snippets. The full implementations can be found in the Appendix A.3

• switch1.p4 makes a forwarding decision based on the ingress port. This implementation does not: – extract the headers – NAT – update the checksums switch1.p4 contains one table which matches on the ingress port. Based on the results of this implementation the problem can be narrowed down. No connectivity at L3 nor L4 implies that there might be a problem with the switch itself or the problem is related to the container. Connectivity at only L3 implies that the problem is L4 related. The switch1.p4 implementation relies on a network which has two containers at the same host as shown in section 4.2.1 • switch2.p4 Will parse each packet until L4. NAT is applied on L2, L3 and L4 and the checksum at L3 and L4 is updated. To be able to make a forward decision we need four tables local v4 forwarding table, map ip to port, check dst port and check src port

1 2 control ingress { 3 if (valid(ipv4_hdr)) { 4 apply(local_v4_forwarding_table); 5 } 6 if (valid(TCP_hdr)) { 7 apply(map_ip_to_port); 8 apply(check_dst_port); 9 apply(check_src_port); 10 } 11 } Listing 4.4: Ingress

The table local v4 forwarding table only forwards packets to containers at the same host. If the local v4 forwarding table results in a hit, no NAT has to be applied and a packet can be forwarded to the given egress port. The map ip to port table will match/forward packets that are destined for the containers on the remote host. The remaining two tables match/forward incoming packets from the remote host. When a packet matches one of the last three tables NAT is applied to the packet. After a hit with one of the last three tables shown in Listing 4.4, the action nat (as shown in Listing 4.5) will translate the current addresses into the addresses given through the CLI (see Listing 4.10): page 20 CHAPTER 4. METHOD 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

1 2 action nat(port, mac_src, mac_dst, host_ip, dst_ip) { 3 modify_field(ether_hdr.src, mac_src); 4 modify_field(ether_hdr.dst, mac_dst); 5 modify_field(ipv4_hdr.src, host_ip); 6 modify_field(ipv4_hdr.dst, dst_ip); 7 modify_field(standard_metadata.egress_spec, port); 8 } Listing 4.5: NAT and forward

CHAPTER 4. METHOD page 21 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

When the L3 or L4 addresses of a packet changes and the checksum(s) of that packet become(s) invalid. When the L3 checksum is invalid, a packet will be discarded by the underlay network. In case the L4 checksum is invalid the packet will be considered invalid by the L4 device receiving the packet and the packet will be lost. To avoid packet loss, the checksum of each packet will be updated The checksums of a packet will be updated in the deparser [10]. This will be done by the switch itself. We only have to call update as is shown in Listings 4.6 and 4.7:

1 field_list ipv4_checksum_list { 2 ipv4_hdr.version; 3 ipv4_hdr.ihl; 4 ipv4_hdr.tos; 5 ipv4_hdr.total_len; 6 ipv4_hdr .id; 7 ipv4_hdr.flags; 8 ipv4_hdr.offset; 9 ipv4_hdr.ttl; 10 ipv4_hdr.proto; 11 ipv4_hdr.src; 12 ipv4_hdr.dst; 13 } 14 15 field_list_calculation ipv4_checksum { 16 input{ 17 ipv4_checksum_list; 18 } 19 algorithm : csum16; 20 output_width : 16; 21 } 22 23 calculated_field ipv4_hdr.checksum { 24 update ipv4_checksum; 25 } Listing 4.6: MatchAction table

For the L4 header the checksum is computed by:

1 field_list TCP_checksum_list { 2 ipv4_hdr.src; 3 ipv4_hdr.dst; 4 8’0; 5 ipv4_hdr.proto; 6 meta.TCPLength; 7 TCP_hdr.src; 8 TCP_hdr.dst; 9 TCP_hdr.seq; 10 TCP_hdr.ack; 11 TCP_hdr.offset; 12 TCP_hdr.resrv; 13 TCP_hdr.flags; 14 TCP_hdr.window; 15 TCP_hdr.urgent; 16 payload; 17 } 18 19 field_list_calculation TCP_checksum{ 20 input{ 21 TCP_checksum_list; 22 } 23 algorithm: csum16; 24 output_width: 16; 25 } 26 27 calculated_field TCP_hdr.checksum{ 28 update TCP_checksum if(valid(TCP_hdr)); 29 } Listing 4.7: Update TCP checksum page 22 CHAPTER 4. METHOD 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

With the switch2.p4 implementation we can use the destination port as an identifier for a connection between two containers on a different host. We need the last two tables because in general, we don’t know which host will act as a server and which host will act as a client. Thus, if we set the destination port to 5201 the server has to check the destination port. When this packet is returned by the server the client can find the 5201 as the source port of the packet. • switch3.p4 is a simplified version of switch2.p4. The if statements are removed as well as the local v4 forwarding table. The map ip to port table forwards all traffic to eth0 of the VM. The NAT process is still the same as in switch3.p4 as well as the computation and update of the checksums. We will use this implementation to check if a connection at L3 level is possible between containers at two different hosts. To avoid too many alterations (which can lead to mis- takes) all tables and action names are still the same. This implementation is suited for two hosts, single container topology.

We separate TCP traffic in two different categories. The first category is TCP traffic within the same host. The second category refers to TCP traffic destined for remote containers. We differentiate these two types of TCP traffic because the second category involves NAT as shown in Listing 4.5, where traffic belonging to the first categories is only forwarded by the switch.

CHAPTER 4. METHOD page 23 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

4.2.1 Network topologies This subsection will describe the topologies used during the experiments. Different topologies were used to test for connectivity in a local environment (multiple containers in one host) and a remote environment (single container on two different hosts). The different topologies are used to examine if there is a difference in connectivity between the containers. If there is a difference in connectivity the different implementations (as discussed in section 4.2) to analyze what could be the cause of the problem.

Topology 1: is the most simple. There are two containers connected to one switch. The switch is not connected to the interface of the VM or in this case to the interface of the host. This is done for simplicity and to minimize mistakes since we have written and applied the rules manually. In this topology, we only have to deal with local traffic. As discussed in the previous section, we don’t have to apply NAT to the packets.

Figure 4.1: Single node setup, with multiple containers in host

page 24 CHAPTER 4. METHOD 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

Topology 2: is an extension of the first topology. There are two hosts (VMs). On each host, there is a container and a switch. The container is connected to the switch. The switch is connected to the interface of the VM. The VMs are connected by the underlay. The third topology.

Figure 4.2: Multi node setup, with one container per host

There are two options to switch between topologies. One, change the implementation. An implementation can be writing to support one specific topology i.e. switch1.p4. Since switch1.p4 can only look at the ingress port there can only exist one path between two containers. Since switch1.p4 does not apply NAT it can only support the topology shown in figure 4.1. Another method to switch between topologies is to write an implementation like switch2.p4 and connect two containers by adding new rules. With the second method, the topology can be changed while the switch is up. The disadvantage is that the file rules that have to be added are more complex. The second method increases the use of tables. As is discussed in chapter 3, an increase in tables affects the performance of the switch.

CHAPTER 4. METHOD page 25 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

4.2.2 TCP connection between containers in a single host • setup 1: In this setup we use the switch1.p4 implementation and create docker containers with the network driver specified as none. We used the two containers single host topology.

1 2 table_add forwarding_table set_egress 0 => 1 3 table_add forwarding_table set_egress 1 => 0 4 Listing 4.8: Matching rules switch1.p4 implementation

Figure 4.3: Results switch1.p4 implementation with –network=none for two containers single host

The results show that there is connectivity at L3 but there is no connectivity at L4. • setup 2: We will use the switch1.p4 implementation and create the containers with the default network driver. Both containers are connected to docker0. Then we pinged the containers after which we started an iperf3 session from within the containers. At L3 there was connectivity but at L4 there was no connection. The CLI rules were the same as in setup 1

page 26 CHAPTER 4. METHOD 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

Figure 4.4: Results switch1.p4 implementation with –network=none for two containers single host

• setup 3: For this setup we used the switch3.p4 implementation with the two containers, single host topology. The CLI rules used were: We created the containers with the network flag –network=none.

Figure 4.5: Results setup 3

CHAPTER 4. METHOD page 27 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

From figure 4.6 we can see that there is connectivity between the containers at L3 and L4. • setup 4: We will use the same implementation but we create new containers with the default network option. The containers will be connected to the docker0 bridge. We remove the interface from docker0 and find (with find veth.sh shown in Appendix A.2) it’s corresponding container. The veths are connected to the switch after which we run the commands to ping the other container and set up an iperf3 session. The results show that there is connectivity at L3 and L4:

Figure 4.6: Results setup 4

page 28 CHAPTER 4. METHOD 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

Figure 4.7: Packets inside server container.

Figure 4.8: Packet flow syn packet before and after switch.

The below table is a sumarize of all our setups with their results.

setup implementation nat checksum –network L3 connection L4 connection 1 switch1.p4 no no none yes no 2 switch1.p4 no no default yes no 3 switch2.p4 yes yes none yes yes 4 switch3.p4 yes yes default yes yes

Table 4.1: results connectivity between containers on single hosts using topology from figure 4.1

CHAPTER 4. METHOD page 29 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

4.2.3 TCP connection between containers in multiple hosts Figure 4.9 shows the topology that was used to set up a TCP connection between containers in multiple hosts. VMs were used to set up the container network. The hosts of the VMs are considered to be part of the underlay network. From the VM there is no or limited access to the underlay. In Figure 4.9 the underlay is represented by the black line which connects the two eth0 interfaces of each VM. When a packet leaves a container the addresses of in packet are private network addresses. A packet will be forwarded (in this scenario) to the public network. Thus, when a packet arrives at the first hop of the public network this hop doesn’t know the private network addresses of the packet and will not be able to forward this packet. The result is that this packet will be dropped. Therefore, the addresses of a packet have to be translated (NAT) into the public addresses as well as the address of the next-hop. With the public addresses, the packet can traverse through the public network and reach its destination. At the remote host, the public addresses have to be translated back to the private network addresses. The destination port (dport) of a packet can be set by a p4 switch. Once the dport is set, the underlay network will not alter the dport anymore. Therefore the dport can be used as the identifier for a connection between two containers. The format of the rules, used during this research is shown in listing 4.6. Adding the rules from Listing 4.6 creates a mapping from the specified dport to the source (src) and destination (dst):

1 table_add map_ip_to_port set_test_TCP2 => < src mac> 2 3 table_add check_dst_port set_mac => 4 5 table_add check_src_port set_mac => Listing 4.9: CLI rules for the switch3.p4 implementation

The last two rules of the Listing 4.6 are almost the same. To initiate the nat action at the server the first and the second rule have to be added to the CLI. On the client rules one and three have to be added to initiate the nat action (as shown in Listing 4.5) Iperf3 uses port 5201 as default. In case of the single container, multiple hosts network topology, there is no need to use the destination port as an identifier for a connection between two containers.

page 30 CHAPTER 4. METHOD 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

When there are multiple containers on a single host the rule from Listing 4.9 has to be extended as shown in Listing 4.10:

1 # Rule for the client 2 table_add map_ip_to_port set_test_TCP2 => < egress port> 3 4 # Rule for the server 5 table_add map_ip_to_port set_test_TCP2 => < egress port> Listing 4.10: CLI rules for the switch3.p4 implementation

For now, the emphasis is on establishing a TCP connection for a single container, multiple hosts network as discussed in section 4.2.1. All the below described setups use the same topology.

• setup 1: Uses the switch2.p4 implementation as well as the –network=none flag to create the containers. Because of the implementation, it’s only possible to test for L4 connectivity using the topology of figure 4.9. The iperf3 session timed out. Thus, there is no connectivity at L3 (due to the implementation) nor L4.

• setup 2: Uses the same setup as setup 3 only the containers were created with the default network driver. The iperf3 session timed out. Thus, there is no connectivity at L3 (see setup 1) nor L4 • setup 3: Uses the switch3.p4 implementation. The containers were created with the network driver set to none. First, we checked if we could ping a remote container. In this setup, it’s possible to ping the remote container from the client and to ping from the server to the client. A connection on L3 level is therefore possible.

Figure 4.9: Results setup 3

CHAPTER 4. METHOD page 31 4.2. TCP CONNECTION BETWEEN CONTAINERS USING A P4 SWITCH

• setup 4: Uses the same setup as setup 3. The containers were created with the default network driver. The results were the same as with setup 3.

Figure 4.10: Results setup 4

The summerized results:

setup implementation nat checksum –network L3 connection L4 connection 1 switch2.p4 yes yes none no no 2 switch2.p4 yes yes default no no 3 switch3.p4 yes yes none yes no 4 switch3.p4 yes yes default yes no

Table 4.2: results connectivity between containers on multiple hosts. Topology that was used is shown in figure 4.9

page 32 CHAPTER 4. METHOD CHAPTER 5 Discussion

5.1 Multiple containers on the same host

The results show that for traffic without NAT, it’s important to update the checksum to get connectivity at L4. Even when the addresses aren’t translated. The switch2.p4 implementation has connectivity at L3 and L4. Combining the default network driver with the switch2.p4 implementation results in connectivity at L3 and L4. The implementation switch1.p4 only has connectivity at L3. The main difference between the implementation of switch1.p4 and switch2.p4 is that switch2.p4 supports a TCP checksum update and switch1.p4 doesn’t. This implies that even though there is no NAT involved, the checksum at L4 is updated. We didn’t expect that updating the (TCP) checksum would be necessary to establish L4 connectivity between local containers because there is no need to since the addresses aren’t translated. During testing we noticed that the switch considered the checksum of the TCP header invalid at ingress as is shown in figure 5.1.

Figure 5.1: Two containers single host

The packet headers only have been parsed but the headers haven’t been changed at this point. The checksum of the IP header is always considered valid at ingress. If TCP checksums of a packet are compared before it has been processed by the switch and after we noticed that the checksum has been changed as is shown the packet trace in figure 5.2.

Figure 5.2: Packet before and after traversing switch. The checksum is marked the red rectangle

33 5.2. SINGLE CONTAINER, MULTIPLE HOSTS

Figure 5.3: Two containers single host

Figure 5.2 confirms the log of figure 5.1 that the checksums is not correct when arriving in the switch. This implies that a packet sent by the container is sent with an invalid checksum which is by itself unexpected. The IP checksum does not change during the packet traversal through the switch. Although we currently have no explanation for this behavior, this could explain why the checksum doesn’t have to be updated to create L3 connectivity (because switch1.p4 has L3 connectivity) but must be updated before we can create connectivity at L4. In the deparser the checksum is updated as is shown in figure 5.3 Setup 3 and 4 both achieve connectivity at L4. This suggests that the checksum at L4 is correctly updated. We know from the logs that the switch (even though we didn’t expect it) updates the checksums. If the packets leaving the switch had an invalid TCP checksum, the container would have refused the connection. The connection at L4 was established. Therefore we assume that the checksum at L4 was computed correctly.

5.2 Single container, multiple hosts

Setup 3 from table 4.2 shows that there is connectivity at L3. The implementation used in setup 3 applies NAT on packets. Therefore, we can assume that NAT isn’t the cause for the connection to fail. NAT only changes parameters at L2 and L3. Thus, if the was a problem in translating the addresses this would have exposed itself in the form of a failed L3 connection. Therefore, we assume that the problem lays in layer 4. The switch touches the TCP layer at one point which is the checksum. Because the addresses of L3 are changed, the checksum of the TCP layer also has to change to avoid an invalid checksum. To verify if the TCP checksum is updated correctly the checksum was examined when the packet left the client container and when the packet arrived at the server container on the remote host. The packets TCP checksum had been changed. The previous section showed that the checksum might change even though all the parameters in the headers stayed the same but that there is L4 connectivity. Thus, the TCP checksum changing does not have to be a symptom of a failed connection. Currently, it’s unknown why the connec- tion at L4 fails.

With setup 1 and 2 of Table 4.2, there was no connectivity between the containers. Concerning L3, this was caused by the implementation of switch2.p4. The packets were never matched with the tables because the tables were applied at L4. The result was no-op being applied to all packets after that the packets were lost. If we apply the same tables at L3 level we expect there will be connectivity at L3. Due to a lack of time, we had no time to verify this. None of the above discussed results explains why the switch updates the checksums and why the checksum is considered invalid at ingress. The results also don’t explain why a switch updates the TCP checksum correctly in a local network but computes an invalid checksum in a single container, multiple hosts network. In both situations, the checksum is updated in the same way. The only difference is that in the second scenario the addresses had been translated. At the beginning of this section, we assumed that NAT couldn’t be the cause of the problem. Follow up research should be done to clarify this behavior.

5.3 Firewall settings

We also examined whether firewall settings could be the cause of the refused connection. The ufw status of the containers was inactive. All the default chain policies were set to ACCEPT and the page 34 CHAPTER 5. DISCUSSION 5.3. FIREWALL SETTINGS tables and chains didn’t contain any rules. Therefore it’s highly unlikely that the firewall settings of a container were the cause of the refused connection. Docker adds some firewall rules to the firewall of the host (VM) during its installation. The firewall rules are applied to the containers that belong to the subnet of the docker0 bridge. Setup 3 and 4 of the multiple containers, single host topology both show connectivity at L3 and L4. In setup 3 the containers do not belong to the subnet of the docker0 bridge. This suggests that a container does not have to be in the same network as the docker0 bridge. The firewall rules which docker installs during its setup, only apply to the containers within the docker0 subnet. That the firewall of the host is the cause of the refused connection between containers on multiple hosts, therefore, seems unlikely.

CHAPTER 5. DISCUSSION page 35 5.3. FIREWALL SETTINGS

page 36 CHAPTER 5. DISCUSSION CHAPTER 6 Conclusions

6.1 Conclusion

The amount of data sent through the network is still increasing. At the same time, the speedup of hardware resources gets limited by hitting the power wall. New methods must be explored the find a more efficient way of using resources and limiting throughput. Containers use fewer resources than VMs [14] and setting up an overlay network with a p4 switch as a forwarding mechanism gives the possibility to reduce the overhead that is caused in a Docker overlay solu- tion. Combining Docker containers and p4 switches could result in the advantages of an overlay network without its overhead. This research focussed on connecting containers to a switch and establishing a TCP connection between containers in two different scenarios. In the first scenario, the containers resided on the same host and were connected to the same switch. In the second scenario. Each container was connected to a p4 switch that was connected to the underlay. The underlay connected the hosts. The results show that packets can be sent by one container and arrive in the other container while the containers are connected by a p4 switch. From the results, we can also conclude that there is connectivity at L3 and L4 between containers which are connected to the same p4 switch and share the same host. For containers connected by p4 switches on multiple hosts, we must conclude that there is connectivity at L3 but there is no connectivity at L4. With these conclusions, the question: how can we create connectivity between containers by utilizing a p4 switch? is only partially answered. From the results, we can also conclude that the problem is the updating of L4 checksum. Another conclusion we can draw is that there is no difference in connectivity between the two (default and none) container configurations. This results in more flexibility to adjust a container to your own needs. The research question: how can we minimize the overhead in Docker container networks by using modern technologies such as p4 switches? can not yet be answered. There is no TCP connection possible in a multiple nodes, with one container per host network. Therefore, a comparison between the two networks couldn’t be made. The problem is narrowed down to be an invalid checksum update. To be able to answer the research question to checksum must be updated correctly.

6.2 Future work

Several options qualify for follow-up research. First, future research should focus on why the TCP connection gets refused by the server container. When a TCP connection is made possible the performance of a p4 switch network can be compared to the performance of different overlay networks. Second, to make a research more organized follow-up research could also focus automation of deploying an implementation file. As is shown in figure 2.2) there are many steps from creating the p4 file to deploy it to the switch. All the steps have to be executed manually which is cum- bersome and leads to many, easy to avoid, mistakes. Debugging with respect to the p4 switch is

37 6.2. FUTURE WORK cumbersome because of the lack of breakpoints and print statements. To also have to deal with manually collecting the arguments to be able to manually create and execute CLI rules, makes it almost impossible not to make mistakes. Especially when the network becomes more complex. To be able to scale up the number of nodes and the number of containers, the deployment process should be automated. The attention of the programmer currently is diverted between writing a program and creating the CLI rules. Divertion is not desirable and can have a severe negative impact on the development time and above all, it could be avoided. When the implementation is written and the containers are known, the CLI rules are fixed. By parsing the p4 file the tables, actions and the number of arguments are known. Parsing the containers will fill in the values of the arguments of a rule. How to parse the p4 file and combine the results of it with the arguments obtained by parsing the containers could be done in follow-up research.

Third, although there is no L4 connectivity between two containers on different hosts,at this point, there is L4 connectivity between containers on the same host VM. Follow-up research could compare the performance of a local network with the docker overlay network to get an impression of how a p4 switch influences the throughput. What should be kept into account is the performance of the simple switch. The performance of the bmv2 model is significant lower than the performance of a production-grade open vSwitch [13]. Also, to get the best performance the switch should be installed with the following flags: ./configure ’CXXFLAGS=- g -O3’ ’CFLAGS=-g -O3’ –disable-logging-macros –disable-elogger. There is no difference in connectivity between containers created with the default network driver and a container created with the network driver set to none. After configuring both containers we achieved the same connectivity. This gives the user more tools to adjust a container to its needs. I.e. in the research of [15] macvlan outperforms Linux bridges. Next time, we could try to set a macvlan inside the container, connected them to the switch and experiment if this improves the performance in a local network.

6.2.1 Ethics SDN companies that maintain the network will get better resources to control the network. P4 switches will help the companies to access better control over the data send through their network. In the past, it was the chip manufacturers who co-decided how packets were getting forwarded. They are bound to protocols and to deploy new protocols takes a lot of resources and time. With p4 evolving to a more mature state it will become less expensive and less time consuming to accomplish such control. This means that the pressure on network neutrality will increase. These manufacturers didn’t have a direct interest since their profit comes from selling the hardware. The network companies do have a direct interest in the data since it embodies their profit. Currently, a network company has limited resources to distinguish traffic. Network neutrality is under pressure. Determinism in philosophy states: That if the technique is there, it will happen. Thus, since with SDN and p4 switches we have the means and resources to discriminate traffic, therefor from a deterministic point of view its a matter of time before we allow the network companies to discriminate traffic and bill the consumers.

The research of [7] discusses science with respect to ethics. It’s difficult to place the tech- nologies used in this research into one specific area. This depends on their application they are being used in. We can categorize our research considering figure 6.1. All data used during this research was neutral since it was forged by ourselves. The research does not cross the line of human values. Therefore i would place this research in the safe area.

page 38 CHAPTER 6. CONCLUSIONS 6.2. FUTURE WORK

Figure 6.1: The different areas of science with respect to ethics

CHAPTER 6. CONCLUSIONS page 39 6.2. FUTURE WORK

page 40 CHAPTER 6. CONCLUSIONS Bibliography

[1] Idris Zoher Bholebawa, Rakesh Kumar Jha, and Upena D Dalal. “Performance analysis of proposed OpenFlow-based network architecture using mininet”. In: Wireless Personal Communications 86.2 (2016), pp. 943–958. [2] Github P4 BMV2DevTeam. Incorperation of bmv2 switch in mininet. 2019. url: https:// github.com/p4lang/behavioral-model/tree/master/mininet. (accessed: 01.04.2020). [3] Yan-Wei Chen et al. “P4-Enabled Bandwidth Management”. In: 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS). IEEE. 2019, pp. 1–5. [4] Huynh Tu Dang et al. “Whippersnapper: A p4 language benchmark suite”. In: Proceedings of the Symposium on SDN Research. 2017, pp. 95–101. [5] Docker. Title: Docker containers. url: https://docs.docker.com/engine/reference/ run/. (accessed: 01.05.2020). [6] Kouji Hirata and Takuji Tachibana. “Implementation of multiple routing configurations on software-defined networks with P4”. In: 2019 Asia-Pacific Signal and Information Pro- cessing Association Annual Summit and Conference (APSIPA ASC). IEEE. 2019, pp. 13– 16. [7] ELEKTRA LIANGORIDI, ANTHONY VAN INGE, and MARY BETH KEY. “ETHICS IN SCIENCE: CREATING A CONSCIOUS ETHICAL IDEOLOGY”. In: (2014). [8] Sumit Maheshwari et al. “Comparative Study of Virtual Machines and Containers for DevOps Developers”. In: arXiv preprint arXiv:1808.08192 (2018). [9] Manual. Linux Programmer’s Manual. url: https://www.man7.org/ linux/man-pages/man7/namespaces.7.html. (accessed: 01.04.2020). [10] P4.org. P4: Language specification. url: https://p4.org/p4-spec/p4-14/v1.0.5/tex/ p4.pdf. (accessed: 01.04.2020). [11] F Paolucci et al. “P4 Edge node enabling stateful traffic engineering and cyber security”. In: Journal of Optical Communications and Networking 11.1 (2019), A84–A95. [12] Kun Suo et al. “An analysis and empirical study of container networks”. In: IEEE INFO- COM 2018-IEEE Conference on Computer Communications. IEEE. 2018, pp. 189–197. [13] p4 dev team. bmv2 model github. url: https : / / github . com / p4lang / behavioral - model/blob/master/docs/performance.md. (accessed: 01.04.2020). [14] Qi Zhang et al. “A comparative study of containers and virtual machines in big data envi- ronment”. In: 2018 IEEE 11th International Conference on (CLOUD). IEEE. 2018, pp. 178–185. [15] Yang Zhao et al. “Performance of container networking technologies”. In: Proceedings of the Workshop on Hot Topics in Container Networking and Networked Systems. 2017, pp. 1–6.

41 BIBLIOGRAPHY

page 42 BIBLIOGRAPHY Appendices

43

APPENDIX A Appendix A

A.1 P4 language information

Figure A.1: Parameters standardmetatadata

45 A.2. SUPPORT PROGRAMS

A.2 Support programs

1 read conName 2 3 iflIndex=$(sudo ip netns exec $conName cat /sys/class/net/eth0/iflink) 4 x="$(($iflIndex-1))" 5 veths=$(ifconfig | grep veth | awk’{print $1}’) 6 7 for veth in $veths 8 do 9 iflnk =$(cat /sys/class/net/$veth/iflink) 10 11 if [ $x -eq $iflnk ] 12 then 13 echo"veth connected to eth0 of $conName is:" $veth 14 break 15 fi 16 done 17 18 sudo brctl delif $veth Listing A.1: find veth.sh

1 #!/usr/bin/env bash 2 3 echo"Enter(even) container nr:" 4 read conNr 5 conName=container$conNr 6 7 sudo ip link add vethA-con$conNr type veth peer name vethB-con$conNr 8 sudo ip link add vethA-br0 type veth peer name vethB-br0 9 sudo ip link set up vethA-con$conNr 10 sudo ip link set up vethB-con$conNr 11 12 sudo docker run --name $conName --network=none ubuntu:latest /bin/sleep 1000000 & 13 sleep 5 14 15 sudo docker exec -ti $conName cat /proc/net/route 16 sudo nsenter -t ‘sudo docker inspect -f’{{.State.Pid}}’ $conName‘ -n ifconfig 17 18 c4pid=$(sudo docker inspect -f’{{.State.Pid}}’ $conName) 19 sudo mkdir -p /var/run/netns/ 20 sudo ln -sfT /proc/${c4pid}/ns/net /var/run/netns/$conName 21 22 sudo ip link set vethB-con$conNr netns $conName 23 24 sudo ip netns exec $conName ip link set up vethB-con$conNr 25 sudo ip netns exec $conName ip addr add 172.15.0.$conNr/24 dev vethB-con$conNr 26 sudo ip netns exec $conName ip route add default via 172.15.0.$conNr dev vethB- con$conNr Listing A.2: create configure container.sh

1 #!/usr/bin/env bash 2 3 4 echo"insert container name" 5 read name 6 7 sudo docker exec -ti $name cat /proc/net/route 8 sudo nsenter -t ‘sudo docker inspect -f’{{.State.Pid}}’ $name‘ -n ifconfig 9 10 c4pid=$(sudo docker inspect -f’{{.State.Pid}}’ $name) 11 sudo mkdir -p /var/run/netns/ 12 sudo ln -sfT /proc/${c4pid}/ns/net /var/run/netns/$name Listing A.3: configure container.sh

page 46 APPENDIX A. APPENDIX A A.3. IMPLEMENTATIONS

A.3 Implementations

1 2 parser start { 3 return ingress; 4 } 5 6 control ingress { 7 apply(forwarding_table); 8 } 9 10 table forwarding_table { 11 reads { 12 standard_metadata.ingress_port : exact; 13 } 14 actions { 15 set_egress; 16 } 17 } 18 19 action set_egress(port) { 20 modify_field(standard_metadata.egress_spec, port); 21 } Listing A.4: Switch1.p4

1 2 header_type ether_hdr_t { 3 fields { 4 dst : 48; 5 src : 48; 6 ethertype : 16; 7 } 8 } 9 header ether_hdr_t ether_hdr; 10 11 header_type ipv4_hdr_t { 12 fields { 13 version : 4; 14 ihl : 4; 15 tos : 8; 16 total_len : 16; 17 id : 16; 18 flags : 3; 19 offset : 13; 20 ttl : 8; 21 proto : 8; 22 checksum : 16; 23 src : 32; 24 dst : 32; 25 // Note: IP option fields are not parsed 26 } 27 } 28 header ipv4_hdr_t ipv4_hdr; 29 30 field_list ipv4_checksum_list { 31 ipv4_hdr.version; 32 ipv4_hdr.ihl; 33 ipv4_hdr.tos; 34 ipv4_hdr.total_len; 35 ipv4_hdr.id; 36 ipv4_hdr.flags; 37 ipv4_hdr.offset; 38 ipv4_hdr.ttl; 39 ipv4_hdr.proto; 40 ipv4_hdr.src; 41 ipv4_hdr.dst; 42 } 43

APPENDIX A. APPENDIX A page 47 A.3. IMPLEMENTATIONS

44 field_list_calculation ipv4_checksum { 45 input { 46 ipv4_checksum_list; 47 } 48 algorithm : csum16; 49 output_width : 16; 50 } 51 52 calculated_field ipv4_hdr.checksum { 53 update ipv4_checksum; 54 } 55 56 header_type tcp_hdr_t { 57 fields { 58 src : 16; 59 dst : 16; 60 seq : 32; 61 ack : 32; 62 offset : 4; 63 resrv : 6; 64 flags : 6; 65 window : 16; 66 checksum : 16; 67 urgent : 16; 68 } 69 } 70 header tcp_hdr_t tcp_hdr; 71 72 field_list tcp_checksum_list { 73 ipv4_hdr.src; 74 ipv4_hdr.dst; 75 8’0; 76 ipv4_hdr.proto; 77 meta.tcpLength; 78 tcp_hdr.src; 79 tcp_hdr.dst; 80 tcp_hdr.seq; 81 tcp_hdr.ack; 82 tcp_hdr.offset; 83 tcp_hdr.resrv; 84 tcp_hdr.flags; 85 tcp_hdr.window; 86 tcp_hdr.urgent; 87 payload; 88 } 89 90 field_list_calculation tcp_checksum{ 91 input{ 92 tcp_checksum_list; 93 } 94 algorithm: csum16; 95 output_width: 16; 96 } 97 98 calculated_field tcp_hdr.checksum{ 99 update tcp_checksum if(valid(tcp_hdr)); 100 } 101 102 header_type meta_t{ 103 fields{ 104 do_forward: 1; 105 ipv4_sa: 32; 106 ipv4_da: 32; 107 tcp_sp: 16; 108 tcp_dp: 16; 109 nhop_ipv4: 32; 110 if_ipv4_addr: 32; 111 if_mac_addr: 48; 112 is_ext_if: 1; 113 tcpLength: 16; 114 if_index: 8;

page 48 APPENDIX A. APPENDIX A A.3. IMPLEMENTATIONS

115 } 116 } 117 118 metadata meta_t meta; 119 120 parser start{ 121 extract(ether_hdr); 122 return select(ether_hdr.ethertype){ 123 0x0800: ipv4; 124 } 125 } 126 127 parser ipv4{ 128 extract(ipv4_hdr); 129 set_metadata(meta.tcpLength, ipv4_hdr.total_len- 20); 130 return select(ipv4_hdr.proto){ 131 0x06: tcp; 132 } 133 } 134 135 parser tcp{ 136 extract(tcp_hdr); 137 set_metadata(meta_flags.tcp_flags, tcp_hdr.flags); 138 return ingress; 139 } 140 141 control ingress{ 142 if(valid(ipv4_hdr)){ 143 apply(local_v4_forwarding_table); 144 } 145 if(valid(tcp_hdr)){ 146 apply(map_ip_to_port); 147 apply(check_dst_port); 148 apply(check_src_port); 149 } 150 } 151 152 table local_v4_forwarding_table{ 153 reads{ 154 ipv4_hdr.dst: exact; 155 } 156 actions{ 157 set_egress; 158 drop_packet; 159 } 160 } 161 162 table check_dst_port{ 163 reads{ 164 ipv4_hdr.src: exact; 165 standard_metadata.ingress_port: exact; 166 } 167 actions{ 168 set_mac; 169 drop_packet; 170 } 171 } 172 173 table check_src_port{ 174 reads{ 175 ipv4_hdr.src: exact; 176 standard_metadata.ingress_port: exact; 177 } 178 actions{ 179 set_mac; 180 drop_packet; 181 } 182 } 183 184 table map_ip_to_port{ 185 reads{

APPENDIX A. APPENDIX A page 49 A.3. IMPLEMENTATIONS

186 ipv4_hdr.src: exact; 187 ipv4_hdr.dst: exact; 188 } 189 actions{ 190 set_test_tcp2; 191 drop_packet; 192 } 193 } 194 195 action set_test_tcp2(port, mac_src, mac_dst, host_ip, dst_ip){ 196 modify_field(ether_hdr.src, mac_src); 197 modify_field(ether_hdr.dst, mac_dst); 198 modify_field(ipv4_hdr.src, host_ip); 199 modify_field(ipv4_hdr.dst, dst_ip); 200 modify_field(standard_metadata.egress_spec, port); 201 } 202 203 action set_mac(mac_src, mac_dst, src, dst, port){ 204 modify_field(ether_hdr.src, mac_src); 205 modify_field(ether_hdr.dst, mac_dst); 206 modify_field(ipv4_hdr.src, src); 207 modify_field(ipv4_hdr.dst, dst); 208 modify_field(standard_metadata.egress_spec, port); 209 } 210 211 action drop_packet(){ 212 drop(); 213 } 214 215 action set_egress(port){ 216 modify_field(standard_metadata.egress_spec, port); 217 } Listing A.5: switch2.p4

1 2 header_type ether_hdr_t { 3 fields { 4 dst : 48; 5 src : 48; 6 ethertype : 16; 7 } 8 } 9 header ether_hdr_t ether_hdr; 10 11 header_type ipv4_hdr_t { 12 fields { 13 version : 4; 14 ihl : 4; 15 tos : 8; 16 total_len : 16; 17 id : 16; 18 flags : 3; 19 offset : 13; 20 ttl : 8; 21 proto : 8; 22 checksum : 16; 23 src : 32; 24 dst : 32; 25 } 26 } 27 header ipv4_hdr_t ipv4_hdr; 28 29 field_list ipv4_checksum_list { 30 ipv4_hdr.version; 31 ipv4_hdr.ihl; 32 ipv4_hdr.tos; 33 ipv4_hdr.total_len; 34 ipv4_hdr.id; 35 ipv4_hdr.flags;

page 50 APPENDIX A. APPENDIX A A.3. IMPLEMENTATIONS

36 ipv4_hdr.offset; 37 ipv4_hdr.ttl; 38 ipv4_hdr.proto; 39 ipv4_hdr.src; 40 ipv4_hdr.dst; 41 } 42 43 field_list_calculation ipv4_checksum { 44 input { 45 ipv4_checksum_list; 46 } 47 algorithm : csum16; 48 output_width : 16; 49 } 50 51 calculated_field ipv4_hdr.checksum { 52 update ipv4_checksum; 53 } 54 55 header_type tcp_hdr_t { 56 fields { 57 src : 16; 58 dst : 16; 59 seq : 32; 60 ack : 32; 61 offset : 4; 62 resrv : 6; 63 flags : 6; 64 window : 16; 65 checksum : 16; 66 urgent : 16; 67 } 68 } 69 header tcp_hdr_t tcp_hdr; 70 71 field_list tcp_checksum_list { 72 ipv4_hdr.src; 73 ipv4_hdr.dst; 74 8’0; 75 ipv4_hdr.proto; 76 meta.tcpLength; 77 tcp_hdr.src; 78 tcp_hdr.dst; 79 tcp_hdr.seq; 80 tcp_hdr.ack; 81 tcp_hdr.offset; 82 tcp_hdr.resrv; 83 tcp_hdr.flags; 84 tcp_hdr.window; 85 tcp_hdr.urgent; 86 payload; 87 } 88 89 field_list_calculation tcp_checksum{ 90 input{ 91 tcp_checksum_list; 92 } 93 algorithm: csum16; 94 output_width: 16; 95 } 96 97 calculated_field tcp_hdr.checksum{ 98 update tcp_checksum if(valid(tcp_hdr)); 99 } 100 101 header_type meta_t{ 102 fields{ 103 do_forward: 1; 104 ipv4_sa: 32; 105 ipv4_da: 32; 106 tcp_sp: 16;

APPENDIX A. APPENDIX A page 51 A.3. IMPLEMENTATIONS

107 tcp_dp: 16; 108 nhop_ipv4: 32; 109 if_ipv4_addr: 32; 110 if_mac_addr: 48; 111 is_ext_if: 1; 112 tcpLength: 16; 113 if_index: 8; 114 } 115 } 116 117 metadata meta_t meta; 118 119 parser start{ 120 extract(ether_hdr); 121 return select(ether_hdr.ethertype){ 122 0x0800: ipv4; 123 } 124 } 125 126 parser ipv4{ 127 extract(ipv4_hdr); 128 set_metadata(meta.tcpLength, ipv4_hdr.total_len- 20); 129 return select(ipv4_hdr.proto){ 130 0x06: tcp; 131 } 132 } 133 134 parser tcp{ 135 extract(tcp_hdr); 136 set_metadata(meta_flags.tcp_flags, tcp_hdr.flags); 137 return ingress; 138 } 139 140 control ingress{ 141 apply(map_ip_to_port); 142 apply(check_dst_port); 143 apply(check_src_port); 144 145 } 146 147 table local_v4_forwarding_table{ 148 reads{ 149 ipv4_hdr.dst: exact; 150 } 151 actions{ 152 set_egress; 153 drop_packet; 154 } 155 } 156 157 table check_dst_port{ 158 reads{ 159 ipv4_hdr.src: exact; 160 standard_metadata.ingress_port: exact; 161 } 162 actions{ 163 set_mac; 164 drop_packet; 165 } 166 } 167 168 table check_src_port{ 169 reads{ 170 ipv4_hdr.src: exact; 171 standard_metadata.ingress_port: exact; 172 } 173 actions{ 174 set_mac; 175 drop_packet; 176 } 177 }

page 52 APPENDIX A. APPENDIX A A.3. IMPLEMENTATIONS

178 179 table map_ip_to_port{ 180 reads{ 181 standard_metadata.ingress_port: exact; 182 } 183 actions{ 184 set_test_tcp2; 185 drop_packet; 186 } 187 } 188 189 action set_test_tcp2(port, mac_src, mac_dst, host_ip, dst_ip){ 190 modify_field(ether_hdr.src, mac_src); 191 modify_field(ether_hdr.dst, mac_dst); 192 modify_field(ipv4_hdr.src, host_ip); 193 modify_field(ipv4_hdr.dst, dst_ip); 194 modify_field(standard_metadata.egress_spec, port); 195 } 196 197 action set_mac(mac_src, mac_dst, src, dst, port){ 198 modify_field(ether_hdr.src, mac_src); 199 modify_field(ether_hdr.dst, mac_dst); 200 modify_field(ipv4_hdr.src, src); 201 modify_field(ipv4_hdr.dst, dst); 202 modify_field(standard_metadata.egress_spec, port); 203 } 204 205 action drop_packet(){ 206 drop(); 207 } 208 209 action set_egress(port){ 210 modify_field(standard_metadata.egress_spec, port); 211 } 212 213 214 Listing A.6: switch3.p4

APPENDIX A. APPENDIX A page 53