Security in broadband satellite systems for the aeronautical and other scenarios

Double-diplôme Ingénieur SUPAERO (ISAE) – Enginyeria de telecomunicacions (UPC)

Projet de fin d’études (Master Thesis Report)

by Dirk Gómez Depoorter

SUPAERO supervisor: José Radzik

TriaGnoSys supervisor: Eriza Hafid Fazli

2011 Munich, Germany

1

Table of Contents TABLE OF CONTENTS ...... 2 LIST OF TABLES ...... 6 LIST OF FIGURES ...... 7 ABBREVIATIONS ...... 9 1 INTRODUCTION ...... 10 2 THE ESA REQUEST ...... 11

2.1 THE PROJECT ...... 11 2.2 OBJECTIVES ...... 11 2.3 PROJECT ORGANISATION ...... 11 2.3.1 Task 1 ...... 11 2.3.2 Task 2 ...... 12 2.3.3 Task 3 ...... 12 2.3.4 Task 4 ...... 12 3 CONCEPTS ...... 13

3.1 SATELLITE LINKS ...... 13 3.1.1 Long delay ...... 13 3.1.2 Bandwidth-Delay Product ...... 14 3.1.3 High Bit Error Rate (BER) ...... 14 3.2 TRANSMISSION CONTROL PROTOCOL (TCP) ...... 14 3.2.1 The TCP header ...... 15 3.2.2 Segment transmission ...... 17 3.2.3 Flow control: The receive window ...... 19 3.2.4 TCP congestion avoidance mechanisms ...... 20 3.3 (IP) ...... 26 3.3.1 Functions ...... 26 3.3.2 IP versions ...... 26 3.3.3 IP header ...... 26 3.3.4 IP addressing ...... 30 3.3.5 IP Fragmentation ...... 31 3.3.6 IP NAT ...... 31 3.4 DIFFERENTIATED SERVICES FIELD AND CLASSES ...... 31 3.5 VPN TECHNOLOGIES ...... 33 3.5.1 Internet Protocol Security (IPsec) ...... 33 3.5.2 High Assurance Internet Protocol Encryptor (HAIPE) ...... 35 3.5.3 SSL/TLS/HTTPS ...... 36 4 TECHNICAL ISSUES ...... 37

4.1 WORDING...... 37 4.2 PROTOCOL ENHANCING PROXIES (PEP) & ENHANCED PROTOCOLS ...... 37 4.2.1 Definition ...... 37 4.2.2 Placement related to VPNs ...... 37 4.2.3 Bandwidth delay product ...... 39 4.2.4 TCP slow start ...... 40 4.2.5 Continuous acknowledgements ...... 40 4.2.6 Frequently revised content ...... 41 4.2.7 Redundancy ...... 41 4.3 IP FRAGMENTATION ...... 41 4.3.1 Fragmentation ...... 41 4.3.2 VPN issue ...... 42 4.4 OVERHEAD BANDWIDTH CONSUMPTION ...... 42 2

4.5 ISSUES WITH THE IPSEC ANTI-REPLAY SYSTEM ...... 44 4.6 MULTICAST ...... 44 4.7 MOBILITY ...... 44 4.8 QOS ENFORCEMENT ...... 45 4.9 NETWORK ADDRESS TRANSLATION ...... 45 5 REFERENCE SCENARIOS ...... 46

5.1 DEFINITION OF THE SCENARIOS ...... 46 5.2 PUBLIC SAFETY COMMUNICATIONS ...... 46 5.2.1 Scenario description ...... 46 5.2.2 Types of communications ...... 47 5.2.3 Security choices ...... 48 5.2.4 VPN issues ...... 48 5.3 ISP SCENARIO ...... 50 5.3.1 Scenario description ...... 50 5.3.2 Types of communications ...... 50 5.3.3 Security choices ...... 50 5.3.4 VPN issues ...... 51 5.4 AEROPLANE SCENARIO ...... 52 5.4.1 Scenario description ...... 52 5.4.2 Types of communications ...... 52 5.4.3 Security choices ...... 53 5.4.4 VPN issues ...... 53 5.5 CONSUMER SCENARIO ...... 54 5.5.1 Scenario description ...... 54 5.5.2 Types of communications ...... 55 5.5.3 Security choices ...... 55 5.5.4 VPN issues ...... 55 6 TECHNICAL SOLUTIONS ...... 57

6.1 PEP ISSUE ...... 57 6.1.1 Position of the PEP ...... 57 6.1.2 PEP solutions...... 57 6.1.3 Choice of VPN depending on the PEPs ...... 58 6.1.4 Other VPN solutions that support the use of PEPs and enhanced protocols...... 58 6.2 IP FRAGMENTATION ...... 58 6.2.1 Adapting the path ...... 58 6.2.2 Adapting to the path ...... 58 6.3 OVERHEAD ...... 61 6.3.1 Overview of the solution ...... 61 6.3.2 RObust Header Compression (ROHC) ...... 62 6.3.3 ROHCv2 ...... 63 6.3.4 ROHC, ROHCv2 and ROHCoIPsec ...... 64 6.4 IPSEC ANTI-REPLAY ISSUE ...... 64 6.4.1 Disabling the protection ...... 64 6.4.2 Increasing the window size ...... 64 6.4.3 Multiple SA ...... 64 6.4.4 Shutting down QoS...... 64 6.5 MULTICAST ...... 64 6.6 MOBILITY ISSUE ...... 65 6.6.1 IPsec and mobility ...... 65 6.6.2 Mobile IP ...... 65 6.6.3 NEtwork MObility (NEMO) ...... 67 6.6.4 IKEv2 Mobility and Multihoming Protocol (MOBIKE) ...... 67 6.6.5 Comparison between Mobile IP and MOBIKE ...... 67 6.7 QOS ENFORCEMENT ...... 67 6.8 NETWORK ADDRESS PORT TRANSLATION (NAPT) ISSUE ...... 68 6.9 TECHNICAL SOLUTIONS FOR THE AERONAUTICAL SCENARIO...... 69 3

6.9.1 PEP issue ...... 69 6.9.2 IP fragmentation ...... 69 6.9.3 Overhead ...... 70 6.9.4 IPsec anti-replay issue ...... 70 6.9.5 Mobility ...... 70 6.9.6 Quality of service ...... 70 6.9.7 NAT ...... 71 7 TESTBED DESIGN ...... 72

7.1 AERONAUTICAL SCENARIO TESTBED DESCRIPTION ...... 72 7.2 NODE FUNCTIONALITIES AND SOFTWARE ...... 74 7.3 AERONAUTICAL SCENARIO TESTBED ADDRESSING SCHEME ...... 74 7.4 IMPLEMENTATION ISSUES ...... 81 7.4.1 APC IPsec GW and PEP ...... 81 7.4.2 Bridging ROHC ...... 83 7.4.3 Policy routing using the original packet ...... 84 8 BUILDING THE TESTBED ...... 87

8.1 VIRTUALISATION ...... 87 8.2 THE MASTER MACHINE ...... 87 8.3 SETTING UP AND TESTING THE TESTBED ...... 87 9 CONCLUSIONS...... 90 1 SOFTWARE EXPLORATION ...... 92

1.1 IPTABLES...... 92 1.1.1 Introduction ...... 92 1.1.2 Packet traversal through the Linux kernel ...... 92 1.1.3 Iptables matches and targets ...... 94 1.1.4 IPv4 testing ...... 96 1.1.5 IPv6 testing ...... 97 1.2 IPROUTE2 ...... 98 1.3 IP ...... 98 1.3.1 Introduction ...... 98 1.3.2 IP link ...... 98 1.3.3 IP addresses ...... 99 1.3.4 IP route ...... 99 1.3.5 IP rule ...... 99 1.3.6 IP tunnel ...... 100 1.3.7 Example ...... 100 1.3.8 Example with iptables ...... 101 1.4 TC ...... 101 1.4.1 Introduction ...... 101 1.4.2 Packet tagging ...... 101 1.4.3 PHB definition ...... 103 1.4.4 Queueing discipline family ...... 103 1.4.5 Creating the queueing disciplines and classes ...... 105 1.4.6 Packet distribution into the queues...... 107 1.4.7 (p or b) fifo ...... 107 1.4.8 Token Bucket Filter (tbf) ...... 107 1.4.9 Stochastic Fairness Queuing (sfq) ...... 108 1.4.10 PRIO ...... 109 1.4.11 Hierarchical Token Bucket (htb) ...... 109 1.4.12 Netem ...... 110 1.5 IP FRAGMENTATION ...... 111 1.5.1 Software ...... 111 1.5.2 PMTUD installation ...... 111 1.5.3 PLPMTUD installation ...... 111

4

1.6 HEADER COMPRESSION (ROHC) ...... 111 1.6.1 Software ...... 111 1.6.2 Installation ...... 111 1.6.3 Testing ...... 112 1.7 MOBILITY ...... 113 1.7.1 Software ...... 113 1.7.2 Tests ...... 113 1.7.3 Mobile IP and DSCP ...... 119 1.7.4 Mobile IP and handovers ...... 119 1.8 IPSEC (STRONGSWAN) ...... 120 1.8.1 Introduction ...... 120 1.8.2 Installation ...... 120 1.8.3 Newsky testbed ...... 121 1.8.4 Configuration ...... 121 1.8.5 IPv4-in-IPv6Tunnel test ...... 123 1.8.6 IPv6-in-IPv6 configuration ...... 125 1.8.7 IPv4 in IPv4 configuration ...... 126 1.8.8 Authentication & Encryption ...... 127 1.8.9 DSCP / TOS test ...... 127 1.8.10 IPv6 fragmentation test (IPv4 in IPv6) ...... 128 1.8.11 IPv6 fragmentation test (IPv6 in IPv6) ...... 132 1.8.12 IPv6 announced MTU ...... 136 1.9 SANDRA TESTBED...... 137 1.10 DON’T FRAGMENT BIT MANIPULATION ...... 138 1.10.1 Test1 ...... 139 1.10.2 Test2 ...... 139 1.10.3 Test3 ...... 139 1.10.4 Test4 ...... 140 1.10.5 Test5 ...... 140 1.10.6 Test6 ...... 140 1.10.7 Test7 ...... 141 1.11 MODIFYING THE TCP STACK OF LINUX ...... 141 2 REFERENCES ...... 143

5

List of Tables Table 1: TCP header fields ...... 17 Table 2: IPv4 header fields ...... 28 Table 3: Differentiated Services classes and values ...... 32 Table 4: Security overhead for different VPNs ...... 43 Table 5: Node functionalities and software ...... 74 Table 6: Testbed test plan ...... 89 Table 7: IP commands ...... 98 Table 8: IP tunnel types ...... 100 Table 9: Packet tagging using iptables ...... 102 Table 10: Netfilter DSCP target ...... 102 Table 11: pfifo / bfifo qdisc parameters ...... 107 Table 12: tbf qdisc parameters ...... 108 Table 13: sfq qdisc parameters ...... 108 Table 14: PRIO qdisc parameters ...... 109 Table 15: htb qdisc parameters ...... 109 Table 16: htb class parameters ...... 110 Table 17: netem qdisc parameters ...... 110 Table 18: ROHC example (packet sizes) ...... 112 Table 19: ROHC example (packet sizes) ...... 113 Table 20: IP addresses of the packets in mobility test 1 ...... 119 Table 21: IP addresses of the packets in mobility test 2 ...... 119 Table 22: Correspondance between TOS field and DSCP value ...... 127 Table 23: Transmitted data per packet for a 1400 data ping into an IPsec ESP IPv4 in IPv6 tunnel ...... 131 Table 24: Packet size for the IPv6 fragmentation test ...... 132 Table 25: Packet sequence for test 1 of the fragmentation header after IPv6 fragmentation...... 136 Table 26: Packet sequence for test 2 of the fragmentation header after IPv6 fragmentation...... 136 Table 27: Addresses of the IP tunnel and ROHC interfaces in the SANDRA testbed tests ...... 138 Table 28: Window scaling parameters and commands ...... 142 Table 29: Enable/disable SACK commands ...... 142 Table 30: TCP congestion control algorithm commands ...... 142

6

List of Figures Figure 1: TCP header ...... 16 Figure 2: TCP acknowledgment example ...... 18 Figure 3: TCP retransmission example ...... 19 Figure 4: TCP Tahoe congestion avoidance algorithm ...... 21 Figure 5: TCP Reno congestion avoidance algorithm ...... 22 Figure 6: Standard TCP simulation for different RTT values ...... 23 Figure 7: TCP Hybla simulation for different RTT values ...... 23 Figure 8: Sketch of the TCP CUBIC window growth function...... 24 Figure 9: IPv4 header structure ...... 27 Figure 10: IPv6 header structure ...... 29 Figure 11: IPv6 header fields ...... 30 Figure 12: IPsec transport mode packet...... 33 Figure 13: IPsec tunnel mode packet ...... 33 Figure 14: IPsec AH security header fields ...... 34 Figure 15: IPsec ESP packet structure ...... 34 Figure 16: IPsec anti-reply window ...... 35 Figure 17: TLS packet layer structure ...... 36 Figure 18: PEP implementation outside the VPN channel and use of enhanced protocols (Control case 1) ...... 38 Figure 19: Viability of placing the PEP inside the VPN channel depending on the VPN (Control case 2) ...... 38 Figure 20: PEP placement in control case 3 ...... 38 Figure 25: Public safety scenario network setup [Report] ...... 47 Figure 26: ISP scenarios network setup [Report] ...... 50 Figure 27: Aeroplane scenario ...... 52 Figure 28: Consumer scenario ...... 55 Figure 29: ROHC decompressor flow chart ...... 63 Figure 30: Mobile IP data exchange ...... 66 Figure 31: The aeronautical scenario testbed architecture ...... 72 Figure 32: The two different satellite link paths in the testbed...... 73 Figure 33: Link layer addressing and bridging of the testbed nodes...... 76 Figure 34: Internet Protocol addressing of the testbed nodes ...... 77 Figure 35: Testbed addresses and bridges ...... 80 Figure 36: Ground APC IPsec gateway ...... 82 Figure 37: Satellite terminal internal bridge positions ...... 84 Figure 38: Iptables routing chains ...... 93 Figure 39: Fictional network for the iptables example ...... 95 Figure 40: iptables test, default priority ...... 96 7

Figure 41: iptables test, Expedited Forwarding priority ...... 97 Figure 42: iptables test, Assured Forwarding 11 priority ...... 97 Figure 43: ip6tables test, Expedited Forwarding priority ...... 98 Figure 44: ip6tables test, Assured Forwarding 11 priority ...... 98 Figure 45: Queueing discipline family example ...... 104 Figure 46: Queueing disciplines and classes numeration ...... 105 Figure 47: qdisc/class parameters ...... 106 Figure 48: TC bandwidth units syntax ...... 106 Figure 49: TC data size units syntax ...... 106 Figure 50: TC time units syntax...... 107 Figure 51: Wireshark capture of the ROHC example ping packets. Highlighted is the size of the first ping request plus the two encapsulation bytes (85 + 2 = 87bytes)...... 113 Figure 52: Newsky testbed machine and node configuration ...... 116 Figure 53: First mobility test ...... 117 Figure 54: Second mobility test ...... 118 Figure 55: Mobile IP network ...... 120 Figure 56: Test configuration for Strongswan ...... 121 Figure 57: Ping packet through the netfilter processing at the remote gateway ...... 124 Figure 58: QoS test capture in AR1 using IPv4 ...... 128 Figure 59: Large echo request from 192.168.3.2 to 192.168.2.2 (capture on AR1 dev eth1) ...... 129 Figure 60: First fragment of the encapsulated packet from 192.168.3.2 to 192.168.2.2 (capture on main dev eth2)...... 129 Figure 61: Second fragment of the encapsulated packet from 192.168.3.2 to 192.168.2.2 (capture on main dev eth2)...... 130 Figure 62: Reassembled "Echo request" packets in test1 dev eth1 ...... 131 Figure 63: Fragmentation in IPv4-in-IPv6 tunnel when DF bit is SET (capture at AR1 dev eth1)...... 131 Figure 64: IPv6 fragmentation behaviour...... 133 Figure 65: IPv6 fragmentation behaviour in Linux ...... 135 Figure 66: MTU advertised in the "Datagram Too Big" with ESP null encryption, SHA1 authentication ...... 137 Figure 67: SANDRA testbed as used for IPsec and ROHC testing ...... 137

8

Abbreviations ACK – Acknowledgement (signal) AH – Authentication Header BDP – Bandwidth Delay Product BER – Bit Error Rate CoA – Care of Address DSCP – Differentiated Services Code Point ESP – Encapsulated Security Payload GW – Gateway HA – Home Agent HTTPS – HyperText Transfer Protocol Secure ICMP – Internet Control Message Protocol IKE – Internet Key Exchange protocol IKEv2– Internet Key Exchange protocol version 2 IPSec – Internet Protocol Security IPv4 – Internet Protocol version Four IPv6 – Internet Protocol version Six MCoA – Multiple Care of Address MR – Mobile Router MTU – Maximum Transmission Unit PEP – Protocol Enhancing Proxy PMTU – Path Maximum Transmission Unit PMTUD – PMTU discovery QoS – Quality of Service RTT – Round Trip Time SSL – Secure Sockets Layer TCP – Transmission Control Protocol TLS – Transport Layer Security ToS – Type of Service UDP – User Datagram Protocol VoIP – Voice over Internet Protocol VPN – Virtual Private Network

9

1 Introduction This thesis report is based on the work I did while working at TriaGnoSys on the ESA project “Security in broadband satellite systems for commercial and institutional scenarios”. The ESA project lasts for 12 months and I joined the project for 6 months, from the 4th to the 9th month of the project (both included). Therefore, while I catch up with the initial work, the results and analysis are not included in this report. The project goal is to analyse the impact of using security in satellite links. While satellite links already have some drawbacks by themselves, the presence of a virtual private network (VPN) worsens some and creates new ones. This thesis will focus on the analysis of the aeronautical scenario but other situations are also explained. This report starts with a revision of some concepts that are required for understanding this topic. Then, the issues of using VPNs over satellite links are identified and studied. The issues are only present in some situations, and so, different scenarios are defined to be used as reference for further study of both the issues and the proposed solutions. Then, solutions to the issues are proposed. With the problem having been studied, the next step would be to simulate, analyse and validate the solutions. For that, a testbed has been built. The testbed design and implementation are described as the last chapter of this thesis. Finally, some conclusions on the work done are presented. Also, found in the annex there is a chapter based on my experiences while learning how to configure the software that would be used for the testbed. It contains the software characteristics as well as some tests and bugs discovered while testing them.

10

2 The ESA request

2.1 The project The project requested by ESA aims at providing the base to a future standardisation. The study will contain the investigation of the impact of using VPN techniques over interactive broadband satellite systems. Different parameters will be taken into account: VPN techniques (IPSec, TLS/SSL/HTTPS, HAIPE). Network topology (star and mesh). Satellite terminals mobility (fixed and mobile). Applications (consumer, SOHO, corporate, backhaul, SCADA, military and institutional). Three different cases are to be studied, depending on where the performance enhancement functions are placed in relation to the VPN. Case 1: The VPN is controlled at both ends by the satellite operator or integrator or at least has some influence in the decision on VPN type or installation of features. Therefore, the operator can decide / influence whether to place the performance enhancement functions inside or outside the VPN. Case 2: The VPN operates end-to-end and the satellite operator or integrator has no control over it, effectively seeing only encrypted data. Therefore, the performance enhancement functions can only be placed inside the VPN. Case 3: The VPN operates end-to-end. The satellite operator or integrator has control or some influence over one end. Therefore, the performance enhancement functions can only take place in one of the ends, but it can be either inside or outside the VPN.

2.2 Objectives ESA expects the project to fulfil three objectives. The first objective is to identify the technical issues of using a VPN and an interactive broadband satellite system and the different scenarios in which they might take place. The second objective is to find different technical solutions applicable by the satellite industry to solve the presented issues. These solutions have to be assessed and validated using a test bed developed during the project. The third and last objective is to prepare a document containing the project results together with some guidelines and recommendations for the use of VPNs by interactive broadband satellite systems. This document will be presented and distributed to the satellite system industry.

2.3 Project organisation The project is to be organised in four different tasks that should last, at most, 12 months.

2.3.1 Task 1 The first task of the project is to get an understanding of the problem. For that, the different technical issues 11 and possible scenarios should be identified. These scenarios have to cover the three previously presented cases and applications. The risks that might apply to these applications have to be identified and a type of VPN defined to counter the risk.

2.3.2 Task 2 In this task the different technical solutions to the presented issues have to be identified. They will also be analysed independently for each scenario and a review of the solution efficiency, complexity, cost and degree of maturity shall be included.

2.3.3 Task 3 This task´s objective is to validate the solutions proposed in task 2. First, the simulation and test bed requirements have to be identified. Then the test bed shall be implemented and validated. Finally, the simulations have to be defined and executed and its results analysed.

2.3.4 Task 4 The last task consists on fulfilling the last objective by writing the document containing the guidelines and recommendations and disseminating it.

12

3 Concepts This section describes some concepts needed to understand the work done in this thesis.

3.1 Satellite links Many applications or protocols are designed with wired communications in mind. While this is not a bad thing per se, when used over satellite links they might suffer some unexpected problems. Even so, those technologies acknowledge the presence of a satellite link needs to take into account some of the constraints imposed by the use of such links. While satellites have a lower bandwidth than terrestrial means like optical fibre, the current bandwidth provided by satellite links is decent for most needs.

3.1.1 Long delay Because of the large distance at which the satellites are placed, the time it takes the data to travel from Earth to the Satellite and back again is quite important. This time can be calculated as:

However, it is usually the case that rather than using the one way delay, we are interested in the Round Trip Time (RTT). This time is defined as the time elapsed between a message is sent and the acknowledgment for that message is received. Because for satellite systems the propagation delay is much bigger than the processing delay, the RTT can be approximated to two times the one way propagation delay.

To give some figures, the case of geostationary satellites is used. Geostationary satellites are widely used for telecommunication applications. These satellites have the advantage of staying still in the sky and always being visible. However, they have to be placed at a high altitude, 35786 Km above the equator. Because they have to be over the equator, the distance is greater if the Earth terminal is placed at higher latitude. This distance can be calculated using the following formula, obtained using the cosine rule: Earth terminal

Distance between the satellite and the Earth station (d) Earth’s radius (R)

Satellite

Latitude

Satellite altitude (a) Earth’s radius (R) Earth

13

If we take for example a communication done between two terminals placed at Munich (48°) using a geostationary satellite, then:

The one way delay:

And the RTT:

Such a big delay can be cumbersome for some applications and it can even prevent some real time applications from working at all. Another problem of a big delay is the bandwidth-delay product, which is explained in the next section.

3.1.2 Bandwidth-Delay Product The bandwidth delay product is the minimum buffer Transmitter Receiver size required to hold the amount of unacknowledged DATA data for a given bandwidth (in bps) and the RTT. RTT … If the buffer is smaller, data might be lost and so, the RTO ACK effective bandwidth or throughput would decrease. The maximum throughput is achieved when the Retransmission Timeout (RTO) is equal to the RTT. Therefore, the limit of unacknowledged data is equal … to the amount of data we can send in one RTT:

The RTT is hard to change, because it would require a complete change of the medium where the data travels. For satellite links this is a nuisance because this value is quite big, so a large buffer is required. If the buffer size is fixed then the bandwidth cannot be used above a certain value:

3.1.3 High Bit Error Rate (BER) Satellite links suffer from higher losses than their terrestrial counterparts. For instance, the long distance that the data has to travel greatly reduces the power of the signal, causing Path Losses around 200 dB. Also, the signal is affected by other effects such as adverse meteorology, troposphere/ionosphere effects, shadowing effects, interference from other satellites, limited power on board, etc.

3.2 Transmission Control Protocol (TCP) The Transmission Control Protocol (TCP) is a widely used transport layer protocol. It is part of the Internet Protocol Suite (often called TCP/IP suite), among other protocols like IP, UDP, etc. This section is meant to refresh the reader some of the key concepts of TCP that will be needed for better

14 understanding of the issues treated in this thesis. It is by no means a full description of the protocol. The job of TCP is to provide service to applications to make end to end communication between two hosts possible. Therefore, when an application needs to send data to a remote application, it passes the data to the TCP layer to be sent to the remote host. The TCP layer at the remote host will then hand the data stream to the remote application. The stream of information provided by the application is broken into segments at the TCP layer. These segments are formed using part of the application data and the TCP header. Then the segments are passed to the network layer protocol to be transmitted. TCP provides the communication with loss recovery, flow control, congestion control and ordered arrival of the information. Also, TCP is a connection oriented protocol, which means that a connection has to be established before data is transmitted.

3.2.1 The TCP header The structure of the TCP header is shown in the following figure. The numbers indicate the offset of the field. Each line in the representation is made out of 32 bits (4 bytes).

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Source Port Destination Port

Sequence Number

Acknowledgment Number

C E U A P R S F N Data offset Reserved W C R C S S Y I Window Size S R E G K H T N N

Checksum Urgent Pointer

Options Padding

First Option kind First Option length First Option data

15

First Option data Second Option length Second Option kind

Second Option data Padding

Figure 1: TCP header

Field Size Description Source port 16 bits The source port. Destination port 16 bits The destination port. The initial sequence number if SYN is set or the total Sequence number 32 bits accumulated data since the start of the connection. The next sequence number expected. All previous data is Acknowledgment Number 32 bits acknowledged. Valid only if ACK is set. The size of the TCP header in words of 4 bytes (32 bits). The Data offset 4 bits minimum header without option is 20 bytes (5 words) and the maximum is 15 words (60 bytes). Reserved 3 bits For future use. It is set to 0. NS 1 bit First control bit or flag. ECN-nonce. CWR 1 bit Congestion Window Reduced (CWR) ECN-echo. Together with the SYN flag set indicates that the host is ECN capable. With the SYN flag clear it means that a ECE 1 bit packet with the Congestion Experienced flag (IP header) set was received. URG 1 bit Some or all of the data in the segment is urgent. When set, it indicates that the segment is carrying an ACK 1 bit acknowledgment. Requests that the data in this segment is immediately PSH 1 bit pushed to the receiving application. RST 1 bit The sender requests for the TCP connection to be reset. SYN 1 bit Request for the Sequence number to be synchronised. FIN 1 bit Request for closing the connection. Corresponds to the size in bytes that the host is willing to Window Size 16 bits receive. That is, the receive window size. The checksum is calculated is calculated for integrity Checksum 16 bits protection. Other than the entire TCP segment, additional

16

fields from the IP header are taken to calculate it. This contains the sequence number of the last byte of Urgent Pointer 16 bits urgent data. Used with the URG flag set. 0 to 320 bits This field is optional. It contains additional options. Each Options (shadowed fields) (multiple of 4 option is formed by the “Option kind” and “Option length” bits) fields, each 4 bytes and the option data of variable length. This field is used when there are options, to ensure that the Padding Variable packet is multiple of 4 bytes (32 bits). Table 1: TCP header fields

The TCP header is then at least 20 bytes long and at most, 60 bytes.

3.2.2 Segment transmission When a TCP segment is received, it is checked at the receiver if it is the next expected segment (sequence number). If it isn’t, the segment is stored for future delivery. If it is, the segment is delivered to the application layer along with any segments that were stored and that come immediately after the segment in the numbering, until there is a missing segment. This makes TCP deliver the data stream in order to the application. When segments are delivered to the application layer, the TCP receiver will send back to the sender an acknowledgment (ACK) message. This message contains the next expected sequence number, so the original sender knows that all data before that number, has been correctly received.

17

Transmitter Receiver

SN = 1 Segment 1 is Segment 1 is acknowledged. known to have SN = 2 been correctly ACK 2 delivered. SN = 3 Segment 2 has not yet arrived. Segment 3 is stored. SN = 4

Segment 2 has not SN = 5 yet arrived. Segment 5 is stored.

Segments 2 and 3 are known to have ACK 4 been correctly Segment 2 has delivered. arrived. Segments 2 and 3 are ACK’ed. Segment 4 has not arrived yet. Segments 4 and 5 are known to have ACK 6 been correctly delivered. Segment 4 has arrived. Segments 4 and 5 are ACK’ed.

Figure 2: TCP acknowledgment example

To better illustrate this behaviour, let’s look at the example. The first segment arrives at the receiver with Sequence Number 1. The receiver delivers the segment to the application and sends an ACK back to the sender with the next sequence number, 2. When the ACK is received, the transmitter knows that the segment with SN 1 has been correctly delivered. The segment with SN = 2 takes longer than usual to reach the receiver. When segment 3 is received earlier than 2 it is stored because it is not the next expected segment. The same happens with packet with SN = 5. When segment 2 is received, it is delivered to the application. All packets stored are also delivered, as long as there is no gap or missing packet in between. Segment 3 is passed to the application but segment 5 is not because segment 4 has not yet been received. Therefore, the ACK message indicates the transmitter that the next expected segment is segment 4. Finally, when segment 4 is received, both 4 and 5 are delivered to the application and acknowledged. When the sender sends a TCP packet, it starts a timer. If that timer reaches a value known as Retransmission Timeout (RTO) before the segment is acknowledged by the receiver, it is considered lost or corrupted. Because the receiver will not send any ACK before it receives the lost segment, not even if it received the following packets, the faulty segment and all segments that come after it are retransmitted.

18

Transmitter Receiver

The timer SN = 1 Segment 1 is lost. starts It never reaches SN = 2 the receiver.

The timer for SN = 3 SN1 finishes Segment 1 has after RTO SN = 4 not yet arrived. seconds. Segments 2 to 4 Segments after SN = 1 are stored. SN1 are retransmitted. SN = 2 ACK 2 SN = 3 Segment 1 has ACK 3 arrived. It is SN = 4 acknowledged. The timer ACK 4 starts again. ACK 5

The timers stops when the ACK is received

Figure 3: TCP retransmission example

In the second example the behaviour of the timer for retransmission can be seen. When the first segment is sent a timer starts. The same happens for the rest of segments. When the timer of segment 1 reaches RTO seconds without receiving the ACK message, it is considered lost and it is retransmitted, just like all following packets (even if they had been correctly received). When the packets are retransmitted, their timers are also reset. Note that these examples increase the sequence number one by one. This has been done for comprehension purposes, in reality the SN increases as much as the packet’s length in bytes. Also, the ACK messages might not be sent after each packet has arrived but instead, the receiver might wait for another packet to arrive shortly afterwards and acknowledge them both at once. This is done to reduce the amount of signalling. An alternative way of handling the ACK messages is to send the ACK messages specifying what packets have been correctly received. This allows the sender not to retransmit all packets after a loss but only the packets that were really lost. However, it increases the size of the ACK messages and therefore, the signalling load.

3.2.3 Flow control: The receive window TCP has a flow control mechanism to ensure that the sender doesn’t transmit too fast for the receiver to process the data. For that, it uses the receive window, whose size is advertised back to the sender. The receiver is ready to receive an amount of data equal to the window size. Therefore, the sender will send segments after the last acknowledged but it will stop before the unacknowledged data sent exceeds the window size. When data is acknowledged, the sender can send more segments. This window size is limited by the “Window size” field of the TCP header. This field is 16 bits long, so a

19 maximum of bytes (64 Kbytes) of unacknowledged data could be sent. However, an option exists in the TCP header called Window Scaling that allows this field to be increased to 32 bits, creating a window of bytes (4 Gbytes).

3.2.4 TCP congestion avoidance mechanisms TCP uses a congestion avoidance mechanism to prevent overload of the network. To do so, TCP limits the amount of sent data by that of the size of the congestion window. At the start of a communication there is no knowledge of the state of the network. Therefore, the initial congestion window size must be small to prevent sending too much data into a network that might be full. However, if this value is too small, the transmission will be slow at the start. As TCP sends segments, it is gathering information on the network status. This is then used to increase the congestion window size to increase the throughput. There are different algorithms to do so.

3.2.4.1 TCP Tahoe TCP Tahoe’s congestion avoidance algorithm uses the “slow start” mechanism at the start to increase the congestion window. This mechanism is implemented to avoid the initial burst of data from overwhelming the connection and so, preventing the connection from even starting [RFC 2001]. The transmission rate is limited by the congestion window. During the “slow start” phase, its size will increase until either there is a packet loss or the size of the window is greater than a threshold ssthresh. The window size is increased by 1 MSS every time an ACK is received in the slow start state. Because every RTT the amount of acknowledged packets equals the size of the window, then it effectively doubles every RTT. When the threshold is reached, the algorithm enters a state of “congestion avoidance”. There, the congestion window will slowly increase at a maximum rate of 1 MSS/RTT. If a RTO occurs the mechanism goes back to “slow start” with window size 1. In both “slow start” and “congestion avoidance” a packet loss is detected when three duplicated ACKs arrive in a row. Then the requested packet in the ACKs is retransmitted immediately (fast retransmit *RFC 2001+) and the algorithm goes back to “slow start”. This explanation can be found in the form of state diagram in the following figure:

20

Figure 4: TCP Tahoe congestion avoidance algorithm

The window size can be expressed as function of either the number of the ACK received or the time [TCP Hybla]:

Therefore, the congestion window size increases more slowly the bigger the delay is. TCP Tahoe waits for a timeout and empties the pipe when a packet is lost. This means that in scenarios with either high packet loss or high delay, the protocol will suffer an important degradation in performance [TCP comparison].

3.2.4.2 TCP Reno TCP Reno is the most widely used TCP version. It uses a similar congestion avoidance algorithm as TCP Tahoe but it implements an important improvement on how to handle packet losses. Both versions implement Fast retransmit but unlike Tahoe that goes back to “slow start” with the congestion window size equal to 1 MSS, Reno implements Fast Recovery after Fast Retransmit. The Fast Retransmit mechanism consists on lowering the ssthresh to one half of the current congestion window size and then, updating the window size with the new ssthresh value +3. The reasoning behind this mechanism is that the receiver can only generate the duplicate ACKs when more segments are received, meaning that the congestion is not that bad not to allow some data to be transmitted. Therefore, it is extreme to go back to the “slow start” state *RFC 2001+.

21

Figure 5: TCP Reno congestion avoidance algorithm

While TCP Reno improves the congestion management of TCP Tahoe, it still suffers in high delay/packet loss scenarios [TCP comparison]. The reason is that it can only detect one packet loss in one window.

3.2.4.3 TCP New Reno TCP New Reno, just like the name indicates is a modification of the TCP Reno protocol. The improvement is that New Reno is capable of recognising more than one packet loss per window. The difference with TCP Reno is that the Fast Retransmission is not exited until all unacknowledged data when entering the Fast Retransmission state is acknowledged. However, New Reno is still suffers from high packet loss / high delay links.

3.2.4.4 TCP Selective Acknowledgements (TCP SACK) This version of TCP is a modification of TCP Reno. Instead of acknowledging the packets cumulatively, they are acknowledged individually. Therefore, the sender knows exactly what packets have been correctly received and it doesn’t have to retransmit successfully transmitted packets. However, this solution poses a new problem that the other solutions don’t have. The ACK message structure is modified, which means that the receiver must also support this feature.

3.2.4.5 TCP Hybla TCP Hybla’s goal is for the bandwidth to become independent of the RTT to avoid the problems caused in big delay links [TCP Hybla]. This means that Hybla has to compensate twice the RTT, for the bandwidth depends on the RTT and the window size that in turn, also depends on the RTT.

TCP Hybla introduces the idea of a reference round trip time RTT0. The goal is to achieve a performance equal to what we would have if the real RTT was equal to the reference. The normalised RTT is defined as:

22

The window has to be multiplied by ρ to compensate the RTT in the bandwidth calculation. Also, it must be included before any RTT in the window formula to make the window independent of the RTT.

Note that the value of ssthresh and the time to reach it also change:

The final value of the bandwidth is independent of RTT:

The effects can be easily seen from two figures found in [TCP Hybla]:

Figure 6: Standard TCP simulation for different RTT values

Figure 7: TCP Hybla simulation for different RTT values

23

In Figure 6 we see the use of standard TCP. The bigger the RTT is, the longer it takes for the congestion window to reach the ssthresh (32 Kb in this simulation). Figure 7 shows that the time to change from slow start (exponential curve) to congestion avoidance (linear curve) is the same no matter the value of RTT. However, the ssthresh value is different for each RTT. With TCP Hybla it is possible to establish a congestion avoidance algorithm that doesn’t suffer the negative effects of the RTT on the throughput. However, this comes at a cost. The increased size of the congestion window means that it is more likely to have multiple losses on the same window. It is for this reason that TCP Hybla makes the use of SACK mandatory rather than optional and it makes use of timestamps.

3.2.4.6 TCP Cubic TCP Cubic, as its name indicates, increases the congestion window size using a cubic function [TCP CUBIC]. The window will increase rapidly after the window has been reduced. Then, when approaching the estimated size that corresponds to best bandwidth utilisation, it will slowly increase. It will effectively stay around the estimated optimum for some time. The increase size rate will then go up probing for more bandwidth. The evolution of the window over time can be defined as follows:

Where C is a TCP CUBIC parameter.

cwnd W(t)

convex region

Wmax

concave region

Wmax(1-β)

t K

Figure 8: Sketch of the TCP CUBIC window growth function

NOTE: Figure 8 isn’t a cubic function but an approximation. t: time elapsed since the last window reduction.

24

W(t): Congestion window size. Wmax: The window size value estimated for maximum throughput. Wmax(1-β): The initial window size after a window size reduction. K: The time where the window size reaches Wmax. The value of K can be calculated:

The advantage of this system is that we increase quite fast the window size after the window size reduction but when we reach the estimated value, cwnd stays more or less constant. If the value would continue to grow at the same pace, it would quickly reach the point where losses start to reoccur. When a packet is lost in the convex region, then the loss has occurred for a higher window size than last time. This possibly means that more bandwidth is available. Therefore, the update is done:

However, if the loss happens in the concave region, then it has occurred for a lower window size, which possibly implies a reduction on bandwidth. Because of that, the reduction of Wmax is higher:

Another feature of TCP cubic is that it is TCP friendly. The algorithm detects when the window size would be higher in standard TCP congestion control mechanism and updates its cwnd to that value. According to [TCP CUBIC] the average window size of additive increase and multiplicative decrease (AIMD) with additive factor α and multiplicative factor β would be:

TCP’s congestion avoidance (Reno’s) is based on AIMD, we increase by 1 MSS every RTT and then we half the congestion window upon a loss. Then, α = 1 and β = 0.5. To find the relation between α and β that would give use the same value:

And so the window size in function of time is:

25

3.3 Internet Protocol (IP)

3.3.1 Functions The Internet Protocol is the core of the TCP/IP suite. It is the main protocol at the network layer. IP is a vast topic and so, I will only highlight some of the aspects that are important for the thesis. If two hosts are connected to the same physical network, then delivery through the link layer might be possible. If that is not the case, a protocol is needed to “navigate” through the different networks. IP is a set of protocols used delivering the packets from one host to another through the different networks. The packets that are sent to a host located in a different network get routed through intermediary devices that redirect the packet through the different networks. These devices are called routers. To do so, IP uses addressing. The hosts can be identified by their “IP address”. Routers connected to more than one physical network may send the incoming packets to one or another network depending on the IP address. IP is a “best effort” protocol, that is, it is unreliable in the sense that it doesn’t correct errors on the data it sends and it doesn’t provide some features like congestion or flow control. The packets are not acknowledged when they reach the receiver, so the sender doesn’t know whether the packet has been delivered or not. This also means that IP doesn’t retransmit packets. Another feature of IP is that it takes care of fragmentation and reassembly if the link Maximum Transfer Unit (MTU) is too small.

3.3.2 IP versions The first Internet Protocol version is version 4. The IP functions were integrated in TCP when it was first created. With the fourth version of TCP, the protocol split into the current TCP and IP in 1981 (RFC 760, 791 and 793). To keep it coherent, they numbered the version of IP as version 4. This is the one that is currently extended through most of the devices. While IPv4 has resisted a remarkable amount of years, it presents some problems that have been corrected with a newer version. Starting on the mid 1990s, the development of the new version of IP started. It was numbered version 6 to avoid confusion with another protocol called Internet Stream Protocol that had been assigned number 5.

3.3.3 IP header The IP header changes depending on the IP version. For instance, they differ in the way the options are presented. The IPv4 header has variable length and the IPv6 header uses header extensions.

3.3.3.1 IPv4 header The IPv4 header is always multiple of 4 bytes and it has a minimum size of 20 bytes. Optional fields may increase this value.

26

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Version Header length Type of Service (TOS) IP datagram total length

R D M Identification S Fragment Offset F F V

Time to live (TTL) Protocol Header Checksum

Source IP address

Destination IP address

Options

Options Padding

Figure 9: IPv4 header structure

Field Size Description Version 4 bits The IP version number. For IPv4 it is 4 (0100). The length of the IP header in 4 bytes words. The maximum Header Length 4 bits header size is then 16 words or 64 bytes. This field has changed its role over the years. The 6 most significant bits contain the DiffServ field that establishes the Type of Service (TOS) 8 bits Quality of Service required by the datagram and the rest are used for Explicit Congestion Notification (ECN) for end to end congestion notification. IP datagram total length 16 bits The total length of the IP datagram, header and payload

27

included. Therefore, the maximum IPv4 datagram size is bytes. This field contains the same value for all fragments belonging to the same message, so that fragments belonging to different messages are not reassembled Identification 16 bits together by accident. If the packet is not fragmented, this field is still set so that if fragmentation occurs it could be used. Reserved 1 bit This bit is reserved. Must be set to 0. Don’t fragment 1 bit When this flag is marked, the packet cannot be fragmented. This flag indicates that there are more fragments after this More Fragments 1 bit one. The last fragment will always have this flag cleared. Fragment Offset 13 bits The offset in bytes at which the payload data begins. This value represents the “time” the packet is allowed to Time to live 8 bits stay in the network. It is counted in router hops and each time it passes a router it is decreased by 1. Protocol 8 bits Identifies the protocol encapsulated in the IP datagram. A checksum of the different fields in the IP header. It Header checksum 16 bits verified at each hop and the packet is discarded if it is incorrect. Source IP address 32 bits The IP address of the source host. Destination IP address 32 bits The IP address of the destination host. This field is optional. It is used for additional features that Options Variable are not always required. If the options field size is not multiple of 4 bytes, padding is Padding 0 to 31 bits added to force the header to be a multiple. Table 2: IPv4 header fields

3.3.3.2 IPv6 header The IPv6 header is longer than the standard IPv4 header. However, the size is always fixed at 40 bytes. Most of these bytes are due to the addresses that have gone from 32 to 128 bits. In fact, the combined size of all the fields except for the addresses is smaller for the IPv6 header than the IPv4 header.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Version Traffic Class Flow Label

Next Header Hop Limit

28

Payload Length

Source Address

Source Address

Source Address

Source Address

Destination Address

Destination Address

Destination Address

Destination Address

Figure 10: IPv6 header structure

Field Size Description Version 4 bits The IP version number. For IPv6 it is 6 (0110). This field is used to contain the DS field in the 6 most Traffic Class 8 bits significant bits and the ECN field in the 2 least significant fields. This field is meant to be used to give packets in the same Flow Label 20 bits flow the same treatment. This field contains the size of the IPv6 optional headers and Payload Length 16 bits the payload. Because the IPv6 header size is fixed, it is not counted. Next Header 8 bits The Next Header field contains the protocol number of the 29

IPv6 encapsulated packet. If there are optional IP headers then it will contain the number of the next optional header. This field is the same as the “TTL” field in the IPv4 header. The only change is the name to fit the current use of such Hop Limit 8 bits field limiting the number of hops rather than the time elapsed. Source Address 128 bits The IP address of the source host. Destination Address 128 bits The IP address of the destination host. Figure 11: IPv6 header fields

3.3.4 IP addressing The IP addresses are used to identify hosts and route packets. All addresses are unique except for the a range of addresses called “private addresses”. Private addresses are local and could be repeated in different local networks and so, they cannot be used outside the local networks. There are two parts of the IP address, one called the network ID and the other the host ID. The network ID is the same for all hosts that belong to that network and the host ID is different and unique inside the network. The IP addresses are given with a mask that is used to separate both parts. When given in decimal, the mask notes the number of most significant bits that belong to the Network ID. The addresses have a different format depending on whether they are IPv4 or IPv6 addresses.

3.3.4.1 IPv4 addressing IPv4 addresses are 32 bits long (4 bytes). They are usually represented with the 4 bytes separated by dots and each byte with its decimal number between 0 and 255. For example, the address 11000000 10101000 00000001 01100100 would be represented as 192.168.1.100 in the standard notation. If the 24 most significant bits belong to the Network ID, then the mask would be 11111111 11111111 11111111 00000000 or 255.255.255.0. However, when given with the IP address it is usually given after the address, separated from it by a slash: 192.168.1.100/24. This can only be done though if the mask has no zeros between the 1s and vice versa.

3.3.4.2 IPv6 addressing IPv6 addresses are four times longer than their version 4 counterparts, that is, 128 bits (16 bytes). This is one of the biggest changes in IPv6 and the main motivation to create a new version of IP. The IPv4 address space is almost exhausted. While 64 bits (or even some less) might have been enough to cover the needs of IP addresses in the future, it was decided to make them so long to not only have enough, but also to make organisation much easier. The addresses are usually represented in hexadecimal numbers. Every four hexadecimal digits (representing 2 bytes) there is a colon. Finally, just like in IPv4, the mask is added using the slash and decimal notation. So, for example, an IPv6 address could be 2001:0001:0000:0000:0000:0000:0000:0001/64. That surely is long. To shorten the notation of the addresses to make them more readable, the zeros at the left of each block of hexadecimals is omitted. So, the address would be 2001:1:0:0:0:0:0:1/64. This is still quite long. Since many 30 addresses will have large strings of zeros, it was decided to simplify even more the notation by allowing, only once per address, to represent a string of zeros using a double colon. So, the same address would be represented as: 2001:1::1/64. This can only be done once to avoid ambiguity. Because it can only be done once per address, it is possible to calculate how many zeros are represented by the double colon.

3.3.5 IP Fragmentation Sometimes the IP datagram will be too big to fit the link MTU. If that is the case, two possibilities exist. Either the packet is dropped or it is fragmented in pieces to make it fit the link MTU. If the packet is dropped, it is possible to send an ICMP signalling message back to the source so it sends the packets already at the correct size. Alternatively, the packet can be cut into smaller datagrams – fragments – and sent through the link. Then, the fragments are reassembled at the end host. In IPv4, the routers in the path that must forward bigger than the MTU datagrams will decide their course of action depending on the “Don’t Fragment” flag on the header. If the flag is set to one, the packet will be drop and an ICMP message will be sent back to the source. If it is clear, then it will be fragmented and the fragments forwarded. IPv6 works in a different way. The intermediate routers should never perform fragmentation and so they will send an ICMPv6 message back to the source signalling for the need of reducing the packet size.

3.3.6 IP NAT Network Address Translation was a temporary solution to mitigate the IPv4 address space problem. IP NAT changes the source/destination addresses of the packets between the public network and a private network. The solution consists in assigning the same public IP address to all packets leaving the private network and that all packets are destined to the private network are sent to that same public address. The NAT router will translate the internal private network address with the TCP or UDP port number. This way, a bunch of nodes can access the public network using a single public IPv4 address. The cost is using some TCP or UDP ports but they are largely unused anyway. This solution presents some problems like not being able to identify a machine behind a NAT router unless it is the one starting the communication or there is already a match between the private IP address and a port. The increased address space in IPv6 makes this solution unnecessary. However, whether it is standardised or not for IPv6 is still in debate. While not needed for the address space problem, it still provides a benefit in hiding the network architecture behind the NAT router.

3.4 Differentiated Services field and classes The DS field consists of 6 bits of the ToS field of the IPv4 header (Traffic Class in IPv6). It is used by DiffServ to classify packets depending on their needs and provide QoS to the network. The DS field value (the so-called DiffServ Code Point, DSCP) will determine the Per-Hop Behaviour (PHB) group that the packet is assigned to, that is, the treatment that the packet will have when traversing a router. The different PHB groups are: Best Effort (BE, the default treatment), Class Selector (to maintain compatibility with the IP precedence field), Assured Forwarding and Expedited Forwarding. However, the system administrator is

31 free to only use those that fit its need or even define custom PHB groups. BE traffic has the lowest priority and other traffic will go through the router faster. However, a minimum bandwidth is assigned to this traffic to ensure that those hosts not knowing about DiffServ are not left without connection. Traffic using Class Selector will still receive the proper per hop forwarding treatment when going through routers using the IP precedence field [DSCP]. Assured Forwarding (AF) traffic is divided into 4 classes. Each class is assigned a different amount of resources (buffer space and bandwidth). The traffic from each class should be treated independently. Then, each packet is marked with one out of three drop precedence, each associated with a probability of drop. When the traffic exceeds the assigned resources, the router will drop packets but in no case it will reorder them. The notation is AFxy with x being the class and y the precedence. Finally, Expedited Forwarding (EF) intends to give packets a low loss, low delay and low jitter. These packets are limited to a fraction of the total bandwidth. DSCP class DSCP value (D) DSCP value (H) DSCP value (B) BE 0 0x00 000000 AF11 10 0x0a 001010 AF12 12 0x0c 001100 AF13 14 0x0e 001110 AF21 18 0x12 010010 AF22 20 0x14 010100 AF23 22 0x16 010110 AF31 26 0x1a 011010 AF32 28 0x1c 011100 AF33 30 0x1e 011110 AF41 34 0x22 100010 AF42 36 0x24 100100 AF43 38 0x26 100110 EF 46 0x2e 101110 CS0 0 0x00 000000 CS1 8 0x08 001000 CS2 16 0x10 010000 CS3 24 0x18 011000 CS4 32 0x20 100000 CS5 40 0x28 101000 CS6 48 0x30 110000 CS7 56 0x38 111000 Table 3: Differentiated Services classes and values

32

3.5 VPN Technologies

3.5.1 Internet Protocol Security (IPsec) IPsec is a security standard designed to provide security for IPv4 and IPv6. The use of IPsec can provide protection to the IP and upper layer protocols. The protection is given in the form of four different characteristics: Integrity: The data can’t be modified undetectably. Authentication: It is possible to verify that the members in the communications are who they claim to be. Confidentiality: Information can only be accessed by authorised elements of the communication. Anti-replay: Packets in the communication can only be sent by legitimate sources.

3.5.1.1 IPsec modes There are two modes of use that will affect how the packed is formed, transport mode and tunnel mode. Transport mode. In IPv4 the security header is located between the original IP header and the payload. In IPv6 the security header is located after the base IP header and extensions but might be located before or after destination options. Original IP header IPsec security header IP payload Figure 12: IPsec transport mode packet

Tunnel mode. In this mode the whole IP packet is encapsulated inside a new IP packet. This way the original IP header is protected. This mode will be used in the satellite link to protect the original header. New IP header IPsec security header Original IP header IP payload Figure 13: IPsec tunnel mode packet

If a communication is mode from end to end, it would make no sense to use tunnel mode. The encryption of the original header would be useless because the new IP packet's header would still have the same address. Therefore, for end to end secure communications, transport mode gives the same protection at a lower cost in overhead.

3.5.1.2 IPsec security protocols There are two basic security protocols used by IPsec that can be used either alone or together. The protocols support both modes. Authentication Header (AH). Provides authentication and (optionally) anti-replay. Protects some parts of the packet calculating the Integrity Check Value from them. Protects the IP payload, which contains the above layers. Protects the fields of the IP header that do not change during transit (all but DSCP, ECN, flags, fragment offset, TTL, header checksum). Note that it gives integrity to both the inner and outer header IP headers when using tunnel mode and the original IP header in transport mode. Uses the Internet Key Exchange protocol (IKE) for obtaining the confidentiality keys.

33

In the next figure we can see the fields of the AH security header: AH security header fields Size Next header 1 byte Payload length 1 byte Reserved 2 bytes Security Parameters Index (SPI) 4 bytes Sequence number (SN) 4 bytes Authentication data (Integrity Check Value ICV) Multiple of 4 bytes Figure 14: IPsec AH security header fields

The next header field indicates the protocol of the first header found on the AH payload. The SPI is used to identify the security association. The sequence number, as explained later, is part of the anti-replay mechanism. Finally, the ICV is a value calculated from the protected fields. Therefore, if any of those fields change, the ICV will not match. Encapsulating Security Payload (ESP). Provides authentication, confidentiality and anti-replay. Authentication and confidentiality can work alone or together. However, at least one must be present. Uses the Internet Key Exchange protocol (IKE) for obtaining the confidentiality keys. Add a security trailer after the IP payload. The following figure contains the structure of an ESP packet. The ESP related fields have been shadowed. ESP section ESP packet structure Size Lower layer headers (e.g. IP and Ethernet) Varies Security Parameters Index (SPI) 4 bytes Security header Sequence number (SN) 4 bytes ESP payload Varies TFC Traffic Flow Confidentiality (TFC) Varies Padding Varies Pad length 1 byte Security trailer Next header 1 byte Authentication data (Integrity Check Value ICV) Multiple of 4 bytes Figure 15: IPsec ESP packet structure

Note that the next protocol header (not the “Next header” field) has to be aligned to a multiple of 4 bytes in IPv4 and to a multiple 8 bytes for IPv6. This means that the total size of the ESP encapsulation has to be chosen to meet this requirement. There are new fields included in ESP and not present in AH. The TFC is an arbitrary payload added to mask the actual payload size from an attacker. The pad length contains the padding length. The next header field indicates the protocol of the first header found on the ESP payload. Note that this field in ESP comes after the payload, while it was before the payload in AH. In case that the encryption is done to protect the packet in some part of the way, like the satellite link, ESP 34

tunnel mode is preferred over the other possibilities because it fully protects the original IP header. Note that while AH tunnel mode protects the header from being changed, it doesn’t encrypt it.

3.5.1.3 IPsec anti-replay system The anti-reply system protects against replay attacks. A replay attack is done when the attacker doesn’t create new packets but instead injects in the network copies of previously captured packets. These could be used for example, to fake the identity of a user or to confuse a user by receiving out-of-time packets. If a packet or a group of packets represent a bank operation, the attacker might be capable of repeating that operation by sending the packets more than once. All the packets are assigned a sequence number (SN). The IPsec anti-replay system is based on a sliding window kept by the receiver. The window is always delimited by the lower and upper ends. If the arriving packet has a SN lower than the window, the packed is considered hacked and it is discarded. If the SN is within the window limits, the packet is checked for duplicates, and discarded if another packet with the same SN had been previously received. If the SN is higher than the window, then an integrity check is performed and if successful, the window is updated. The higher limit of the window always corresponds to the highest SN successfully received. The lower is basically the higher minus the window size. The window size is a compromise. It has to be kept small to be effective but at the same time not too small or else if there are packets slightly out of order they would already be considered as faked. The recommended window size for SN of 32 bits is 64 packets. The best possible solution though would be to keep track of all the received packets to be checked for duplications rather than having a window. However, this is infeasible due to memory constrains and much likely, computational constraints.

Packets received here are checked for integrity SN higher than the window Window is adapted if OK Upper limit (Highest SN received and

validated) Packets received here are checked for duplicates Lower limit SN within the window (Upper limit minus

window size) Packet received here are discarded SN lower than the window

Figure 16: IPsec anti-reply window

3.5.2 High Assurance Internet Protocol Encryptor (HAIPE) HAIPE is a protocol developed by the U.S. National Security Agency. HAIPE uses ESPv3 to encapsulate IPv4 and IPv6 traffic. Therefore, it is similar to IPsec when it comes to protection, issues, etc.

35

3.5.3 SSL/TLS/HTTPS Secure Sockets Layer (SSL) and its successor Transport Layer Security (TLS) are to protocols that provide transport layer security. These protocols are mainly used on top of TCP and will protect the payload, but not the header. They are applied between a single server and a single client. Application (protected by TLS) TLS Transport Network ... Figure 17: TLS packet layer structure

The TLS protocol provides security in the form of reliability, integrity, anti-replay and (optionally) confidentiality through symmetric cryptography. Notice that the IP and TCP header fields are left untouched and therefore, any protocol or mechanism that needs them is capable of using them. HTTPS (HyperText Transfer Protocol) is the use of HTTP over an encrypted SSL or TLS connection.

36

4 Technical issues There are different technical issues when using VPNs in interactive broadband satellite networks. The impact of each will be different depending on the scenario.

4.1 Wording When reading the following sections, the following statements/definitions should be kept in mind: In the cases that the title contains PEP and enhanced protocols, then they might both be referred to as “PEP” in the text. The terms “IPsec VPN” refer to either an IPsec VPN or a VPN using similar protocols like HAIPE or SINA. The terms “TLS VPN” refer to a TLS VPN / SSL VPN or TLS based VPNs (like HTTPS). Problem: A problem is a situation or fact that lowers the system´s performance. Issue: an issue is when either o a feature is needed but the use of a VPN makes it difficult or impossible to implement, o VPN deployment causes a problem, or o specific characteristics of the scenario cause problems.

4.2 Protocol Enhancing Proxies (PEP) & Enhanced Protocols

4.2.1 Definition Some protocols, like TCP, might not work efficiently in some networks. In these cases it is interesting to use PEPs to improve the performance of the protocols, or use an enhanced version of such protocols. An enhanced version of a protocol differs from the standard protocol in the configuration of some parameters or behaviours. It is a substitute for the original protocol, while PEPs are new elements in the topology configuration that improve the performance of the protocol.

4.2.2 Placement related to VPNs A note about the figures that follow: the figures are generic and show PEPs / enhanced protocols in both sides. However, if it is only needed in one side, the implementation is still valid. If we are in control case 1, the satellite operator has control over where to place the PEP / enhanced protocols. Therefore, there are no issues.

37

Figure 18: PEP implementation outside the VPN channel and use of enhanced protocols (Control case 1)

Figure 19: Viability of placing the PEP inside the VPN channel depending on the VPN (Control case 2)

In control case 2, the satellite operator is placed inside the VPN. If the PEPs require access to a header that is protected by the VPN, the PEP cannot be implemented.

Figure 20: PEP placement in control case 3

Finally, in control case 3 the operator has control on only one side. So, unless the uninfluenced side already supports our solution, the satellite operator can only use PEPs / enhanced protocols that don’t require implementation by both sides.

38

4.2.3 Bandwidth delay product

4.2.3.1 Problem Given the formula for maximum bandwidth or throughput seen in 3.1.2, the maximum throughput for a TCP connection can be calculated.

This buffer in TCP is given by the sliding window mechanism and its size is 64 kBytes. Due to a high round trip delay in geostationary satellite link (RTT around 550 ms) the throughput using the maximum window size will be limited to almost 1 Mbps.

If a higher throughput would be needed, for example 2.5 Mbps, the required buffer size for the unacknowledged would have to be bigger than 64 kBytes:

This problem can be solved using an enhanced version of TCP that widens the window, window scaling. However, this requires support from both sides of the communication. Another solution would be to use a TCP accelerating PEP, which splits the connection before entering the satellite link. It would send back an ACK message when it receives a packet so the transmitter sees a lower RTT. Then, the communication through the satellite link would be done with a protocol different than TCP. This solution also requires support from both sides as the receiving side expects packets in TCP and therefore, a PEP to reconvert the packets to TCP must be placed after the satellite link.

4.2.3.2 Issue The implementation of an enhanced TCP protocol means that access to the TCP layer is required. Therefore, window scaling cannot be implemented inside an IPsec VPN, for the layer is encrypted. However, in the case of a TLS VPN, there is no issue because the TCP header is accessible. Let’s look at the issue of having an IPsec VPN from a case point of view. If we are in case one, the satellite operator controls the VPN and so, it can decide whether to place the enhancement functions before or after it. Therefore, users could configure their machines to use the solution or the satellite operator could place a proxy implementing the enhanced TCP right before the VPN. In control case 2, the satellite operator is placed inside the IPsec VPN and has no control over it, so there is nothing it can do. In case 3 there is one side that the operator cannot control and both are required for this solution to work. When splitting the connection, TLS VPN won’t be a problem. However, splitting the connection inside an IPsec VPN is not possible. The proxy that generates the “fake” ACK packets wouldn’t be able to see the TCP headers if encryption was implemented. If not, it would still require security credentials to create the IPsec headers. Therefore, this solution is only implementable when the splitting is done outside the IPsec VPN (case 1).

39

4.2.4 TCP slow start

4.2.4.1 Problem TCP is designed to adapt the traffic flow depending on the network status. For that, the sender uses the congestion window, which limits the transmission rate. When TCP segments are acknowledged the window increases and so does the transmission rate. When packets are lost, it is likely that the network is congested and so the window decreases. This statement is true for wired terrestrial networks, for which TCP was originally designed. However, when using satellite links, the loss might be caused by the high BER. Because satellite links have a high RTT, it takes long to receive the ACKs and the transmission rate increases slowly. An enhanced implementation of TCP could be set at the transmitter’s side that, for example, implements a congestion window with a big initial size. Another approach could be installing a proxy that splits the connection before entering the satellite link, as seen in the BDP problem but not necessarily changing the protocol if there is no BDP problem. It would send back the ACK message when it receives a packet so the sender doesn’t need to suffer the satellite delay when initialising.

4.2.4.2 Issue When modifying the congestion window we need to change a TCP parameter. Therefore, it is possible to do it in TLS VPNs. IPsec VPN though, will be an issue. When using an IPsec VPN, control case 1 will not be a problem. The satellite operator can influence the user to change the initial size of the window. In control case 2, there is no control whatsoever and so, no solution can be implemented. Case 3 depends on which side we control / influence. Because the solution only requires modifying the transmitter, if we control the transmitting side then we can implement the solution. When splitting the connection, the same issues as with BDP apply. The only difference is that the splitting can be done in only one side, and so it could also be implemented in a case 3.

4.2.5 Continuous acknowledgements

4.2.5.1 Problem The receiver sends back ACK messages continuously. Since the bandwidth on the return link is usually smaller, the ACK traffic represents an important consumption of bandwidth. Even worse, if the return link's bandwidth is really limited, it might be too small to transmit the TCP ACKs as well as all other packets, effectively limiting the forward link throughput. The use of selective acknowledgments would reduce this problem. The packets are acknowledged in groups, and so the transmitter knows the exact packets that have to be retransmitted. Both ends must support selective acknowledgments.

4.2.5.2 Issue The issue is exactly the same as with the bandwidth delay product.

40

4.2.6 Frequently revised content

4.2.6.1 Problem Sometimes there might be some applications that are frequently used in the same way; for example, a group of users with similar internet browsing behaviour. This leads to the same information to be retransmitted over and over through the satellite link. This is an inefficient use of a relatively scarce resource, the bandwidth. A solution for this problem could be using proxy that caches this information and stores it locally for future use.

4.2.6.2 Issue When using a cache proxy the data is reviewed at application level. The use of TLS or IPsec VPNs is an issue, for that data is protected for both. Therefore, the proxy should be placed outside the VPN. What’s more, the use of the proxy might require the user to change some of its configurations, so it has to be possible to influence it. This means that only cases 1 and 3 are suited for this solution. In some cases, the proxy can be implemented inside the VPN. They actually break the VPN into two VPNs so the user would have to trust the proxy.

4.2.7 Redundancy

4.2.7.1 Problem Some parts of the packet, like protocol headers or application data might have some redundancy. That means that we are consuming a larger bandwidth than the strictly needed. Bandwidth in satellite links is expensive and its consumption should be reduced whenever possible. To reduce the amount of data, compression can be applied. This requires that both ends agree on using compression, since the decompression has to be performed at the receiver’s end.

4.2.7.2 Issue In order to apply compression, access to the data is required. That means that inside an IPsec VPN it is not possible to apply compression to the transport layer header and above. Inside TLS based VPNs no application data compression is possible.

4.3 IP fragmentation

4.3.1 Fragmentation The IP Maximum Transmission Unit (MTU) is the largest size that the IP header and payload can take. It might have a different value on different links. It is interesting to have a large MTU value to maximise the ratio header to payload size. However, the MTU cannot be too large or else we increase the probability of a packet having an error, in which case the whole packet must be retransmitted. Also, large packets can occupy slow links for quite long. If a packet is larger than the MTU it will have to be fragmented, that is, broken into new smaller packets. Fragmentation causes overhead; each new fragment has its own header. This increases the traffic load. Also, the computational load of fragmenting and reassembling is not negligible. If fragmentation and reassembly are

41 to be performed, then it is better to avoid doing them at the routers for they are the nodes with the highest traffic volume in the network. The Path MTU (PMTU) is the minimum MTU along the path between a transmitter and a receiver. If a packet is larger than the PMTU, then it can be assured that it will have to be fragmented somewhere in the way.

4.3.2 VPN issue The issue with VPNs is that they add overhead and therefore they increase the packet size. If this isn’t taken into account and the user chooses a packet size close to the PMTU, then the packet + the security overhead might exceed the PMTU. If that is the case, then fragmentation will occur and cause the previously mentioned problems. If fragmentation is applied before the satellite operator’s network then it will suffer from additional traffic load caused by the fragments. However, if it is the operator that performs the VPN and this causes the fragmentation or if it has a low MTU that forces fragmentation because of the overhead, then it will suffer both traffic load and computational efforts.

4.4 Overhead bandwidth consumption As it has been said before, the addition of overhead increases the traffic load. However, the impact of this overhead will depend on the situation. For smaller payloads (like VoIP) the relative increase will be important. For larger payloads the problem isn't as bad, but it is still present. Therefore it seems reasonable to do something to decrease the overhead, like compression. Depending on the VPN, the overhead comes from: The Additional IP header (IPsec tunnel mode). Security header. Security trailer. Includes padding on IPsec ESP. Encryption. Authentication data (integrity check value, ICV). Additional padding to mask the payload size (e.g. ESP's Traffic Flow Confidentiality, TFC). Looking at the values in the following table, if we consider that VoIP packets coded with G.729 have a payload size of 20 bytes (smallest payload size according to [Cisco on VoIP]), we can observe that the overhead is from one to five times the payload size. Note that if the satellite operator is inside the VPN, it can still compress / decompress the header at the satellite link ends.

42

Configuration Additional IP header Security header Security trailer Encryption Authentication data Additional Padding Total

16 bytes Depends on the 26-41 bytes ESP Transport mode 0 8 bytes 2 to 17 bytes AES in CBC mode 0 implementation + TFC (128-bit) 16 bytes 46-61 bytes (IPv4) 20 bytes (IPv4) Depends on the ESP Tunnel mode 8 bytes 2 to 17 bytes AES in CBC mode 0 66-81 bytes (IPv6) 40 bytes (IPv6) implementation (128-bit) + TFC

12 bytes AH Transport Mode 0 12 bytes 0 0 0 24 bytes HMAC-SHA1-96

20 bytes (IPv4) 12 bytes 44 bytes (IPv4) AH Tunnel mode 12 bytes 0 0 0 40 bytes (IPv6) HMAC-SHA1-96 64 bytes (IPv6)

16 bytes ESP+AH Transport 12 bytes Depends on the 50-65 bytes 0 8+12 bytes 2 to 17 bytes AES in CBC mode mode HMAC-SHA1-96 implementation + TFC (128-bit) 16 bytes 70-85 bytes (IPv4) ESP+AH Tunnel 20 bytes (IPv4) 12 bytes Depends on the 8+12 bytes 2 to 17 bytes AES in CBC mode 90-105 bytes (IPv6) mode 40 bytes (IPv6) HMAC-SHA1-96 implementation (128-bit) + TFC 16 bytes Authentication ESP 12 bytes Depends on the 38-53 bytes 0 8 bytes 2 to 17 bytes AES in CBC mode Transport mode HMAC-SHA1-96 implementation + TFC (128-bit) 16 bytes 58-73 bytes (IPv4) Authentication ESP 20 bytes (IPv4) 12 bytes Depends on the 8 bytes 2 to 17 bytes AES in CBC mode 78-93 bytes (IPv6) Tunnel mode 40 bytes (IPv6) HMAC-SHA1-96 implementation (128-bit) + TFC 16 bytes 20 bytes TLS 0 5 bytes 0 AES in CBC mode 0 to 15 bytes 41-56 bytes SHA1 (128-bit)

Table 4: Security overhead for different VPNs

43

4.5 Issues with the IPsec anti-replay system Both IPsec and TLS have an anti-replay system. However, the IPsec one causes some issues when different Quality of Service (QoS) priorities are applied. When using different QoS priorities, the packets are likely to be reordered. Therefore, a low priority packet could take long to be sent, allowing higher priority packets to be sent before it. This creates the risk of the packet arriving outside the anti-replay window. If too many high priority packets are sent, eventually one will have a SN high enough to move the window boundary higher than the low priority packet SN. A few conditions have to be met for this problem to take place: The packets with different priorities are in the same security association (SA) or else they don't affect each other’s anti-replay window. QoS is done after the SN is assigned (inside the VPN) or else the packets will be assigned the SN when they are already reordered. The user had set up the ToS field and it is either visible (transport mode) or recopied to the new header (tunnel mode). The packets have such a low priority or the bandwidth becomes so scarce that the delay before they are sent is higher than the window size +1 (i.e. 64+1=65 packets). It should be noted that if such conditions are met, the eventual loss of such low priority packets might not be too critical, for they were already marked as low.

4.6 Multicast Multicasting and broadcasting are interesting when using satellites because the coverage tends to be large. However, because the packets might be easily intercepted by others (the medium of propagation is the space / air) then the need of securing the transmissions can arise in certain situations. Because TLS VPNs are often applied in a client / server model (2 users) there is no need of multicast. IPsec support of multicast will depend on the mode of work. If used in transport mode, the original IP addresses are kept and multicast can be implemented. However, when using tunnel mode the address should be copied from the inner to the outer header or the data might be transmitted in the wrong path. Also, the use of a secure multicast group presents additional problems related to key management like members joining or leaving, large groups and safe re-keying.

4.7 Mobility Mobile applications are subject to different problems than static ones. For instance, when changing from one place/context to another, a handover might result from it. For example, when using satellite telephony like Iridium, the satellite in use might stop being seen and the connection is handed over to a new satellite. When changing from one network to another, the IP address will change. The issue with mobility is that VPNs require the same IP during all the session (static IP). The problem with re-establishing the SA with the new IP address is that it requires a long time (2 round trips for IKEv2 – 1 second with geostationary satellites) and that the user notices handover as the application stops. The possible solutions are: having the same IP during the whole connection, accepting multiple IP addresses for one SA or else, re-establishing the security association 44 every new context. It seems logical to find the solution among the two first options.

4.8 QoS enforcement Quality of Service (QoS) is applied to give different priorities to different types of traffic. For example, real-time applications like VoIP require a minimum bit rate all the time and others like FTP aren’t affected if the resource assignment varies greatly. QoS policies usually classify traffic depending on the Differentiated Services Code Point field (DSCP or ToS – Type of Service) or a group of parameters (traffic classifiers): Source IP addresses. Destination IP addresses. Protocol type. Source ports. Destination ports. All the fields can be found either in the IP or transport headers. Therefore, TLS VPNs will not be an issue but IPsec VPNs in tunnel mode will. IPsec transport mode works because the 5-tuple and the ToS field (DSCP) are preserved. IPsec tunnel mode requires the DSCP to be copied into the outer IP header, as the original is stored in the inner header.

4.9 Network Address Translation The IP Network Address Translation mechanism has been developed to overcome the problem of IPv4 address scarcity. The mechanism gathers multiple private IP addresses into a single public IP address. It makes use of the underutilized number of ports to route the traffic to the different members of the private network. NAT requires being able to change the IP address as well as TCP or UDP port numbers. Therefore there is no problem when using TLS VPNs, but with IPsec these fields are protected. Also, there are a few incompatibilities with IPsec: IPsec AH is totally incompatible with NAT. The IP addresses are protected and cannot be changed. IPsec ESP. The TCP/UDP checksum is protected against manipulation and NAT cannot modify it. In tunnel mode if the NAT only changes the IP address (and not the ports), then it is fine. However only one host behind the NAT can be supported because if both hosts are behind a NAT, the connection cannot be initiated. A host with a public IP is required to initiate the communication. If IP addresses are used as identifiers in IKE, when NAT changes them the packets will not be correctly identified. If more than one user behind a NAT gateway initiates a SA to the same IPsec responder, they will be seen as the same user by the responder (with the NAT GW public IP address).

45

5 Reference scenarios

5.1 Definition of the scenarios The choice of scenarios has to be done so they cover as much as possible. For that, a few requirements are given by ESA: Each type of VPN shall be addressed in at least one scenario. All control cases should be covered by the scenarios. At least one scenario should be based on a mesh satellite network and the will be based on star networks. Both fixed and mobile satellite terminals should be present by the scenarios. Multicast and mobility should also be considered in at least one scenario. Based on these requirements, a list of scenarios was developed as part of task 1: Reporter scenario Public safety communications Connection to embassies Interconnection of military networks Backhaul scenario Aeronautical and maritime communications SCADA scenario Corporate scenario ISP scenario Consumer scenario As part of task 2, four scenarios were omitted due to their similarities with others (interconnection of military networks, corporate, SCADA and reporter). Then, of the remaining 6, 4 were finally chosen as they cover all the requirements (public safety, ISP, airplanes and consumer).

5.2 Public safety communications

5.2.1 Scenario description Public safety organisations are essential. They provide services to the community that are essential in events that could endanger people. When working in emergency situations, they usually have priority over all other services. People expect these organisations to be well equipped, organised and ready to act whenever they are needed. Their communications have to be fast and reliable, even in adverse situations. Therefore, the network has to be available at all times, especially in emergency cases or in isolated places. That is why a satellite connection is preferred. Some examples of public safety organisations are: 46

Police Road traffic safety Fire fighters Civil protection Public health Food safety The users of the network will be the different teams deployed by the organisation and the command centre or headquarters (HQ). These teams will likely have their own network in the field, compromising a variety of devices like computers, cameras, sensors, etc. Also, they will be required to move from one place to another and therefore, mobility will be an issue. Communication between the teams and the HQ will be done through the satellite link and the internet. Interconnection between different teams will be done directly, without needing to pass through the HQ. Otherwise, we would have to make use of the satellite link twice (Team1 -> HQ and HQ -> Team 2) effectively doubling the delay (double-hop). Therefore, this scenario uses a mesh network topology. It is likely that the same data has to be sent to different teams at once so multicast support would be desired.

5.2.2 Types of communications

Communication 1 Communication 2 Command control centre Communication 3

Databases and information servers Satellite Internet Hub

Bidirectional satellite links

GW GW Laptop

Tablet-PC

Camera Sensors

On-site network 1 On-site network 2

Figure 21: Public safety scenario network setup [ESA Report]

As we can see in the figure, the team networks are interconnected using the satellite hub. Also, they can access the internet using the hub and through that, the HQ. We can define three types of communications: Communication 1: communication between one of the teams and the HQ. The path will cross the 47

internet. Because the satellite operator has some influence over both the HQ and the teams (they belong to the same organisation), then this corresponds to a control case 1. Communication 2: Two teams exchange data. The path will not cross the internet. For the same reasons as communication 1, this is a control case 1. Communication 3: in this communication, one of the teams accesses the internet for information. Since the satellite operator has no influence over the internet services the team is accessing, this is a case 3. Notice that if the HQ wants to access the internet, it will not use the satellite link and so, this communication is out of scope of the project.

5.2.3 Security choices This scenario requires encryption of the data to assure that possible attackers are incapable of intercepting it (e.g. a criminal being followed by the police). Also, some sensitive personal data might be transferred that an attacker might want to sniff. Also, the data should be protected against manipulation and spoofing, therefore integrity and authentication are required. Finally, anti-replay is also required to avoid an attacker sending a false alarm or suppressing an alarm. In case of communications 1 and 2, the data should be protected from the team network’s gateway to the HQ’s gateway and the other network’s gateway, respectively. An IPsec VPN is deployed, using tunnel mode. Because of the need of encryption, ESP will be chosen over AH. For communication 3, IPsec might not be available at the internet side, so it is more suitable to deploy a TLS/HTTPS VPN.

5.2.4 VPN issues

5.2.4.1 PEP and enhanced protocols PEP functionality is interesting in this scenario to assess the previously seen problems like Bandwidth Delay Product. In communications 1 and 2, the VPN uses IPsec ESP tunnel from gateway to gateway. In order for PEPs to work, they should be placed before the IPsec gateway. Because this is case 1, the satellite operator can recommend the public safety organisation the use of PEPs and enhanced versions of the protocols. Communication 3 uses TLS so the use of PEPs inside the VPN, at the satellite link, is possible except for application data compression and caching (unless a dedicated proxy is installed).

5.2.4.2 IP fragmentation Communications 1 and 3 go through the internet, meaning that the PMTU is unknown. Therefore it is possible that the satellite operator has to perform fragmentation, suffering from the computational load. If fragmentation occurs elsewhere in the path it will still have to support the additional traffic load. To avoid having to perform the fragmentation at the satellite gateways, the operator should assure that the MTU of the link is at least as big as that of the team/HQ networks. If that is the case, packets that still haven’t travelled through the internet will not cause fragmentation at the satellite gateways. Packets coming from the

48 internet might already be fragmented but at least, the computational load of fragmentation is avoided. Communication 2 is different because the path doesn’t cross the internet. The satellite operator is in control of part of the path and the rest belongs to the client. Usually, devices are either connected via Ethernet (MTU = 1500 bytes) or WLAN (MTU = 2272 bytes) which means that, if the satellite MTU is at least equal to Ethernet’s, then the PMTU will be 1500 bytes. If the overhead from the VPN is taken into account when establishing the packet sizes not to exceed this value, then no fragmentation will take place.

5.2.4.3 Overhead Overhead from using VPN is inevitable. However, we can reduce it through the use of compression. In this scenario it can be important, for it is likely that some applications like VoIP generate small packets.

5.2.4.4 IPsec anti-replay Communication 3 is a TLS-based VPN so there are no anti-replay issues. However, the other two will suffer from this if the conditions mentioned in 4.5 are met.

5.2.4.5 Multicast Multicast might be a desired feature for communications 1 and 2. However, since they deploy an IPsec VPN in tunnel mode, the multicast IP destination will be protected inside the payload.

5.2.4.6 Mobility While the teams might be mobile, it is unlikely that they travel such large distances to require a gateway or satellite handover. It is much more likely though that they change from one to another technology for WAN connectivity (3G, DSL, Satellite link). This will cause a change on the IP address and so, a pause in the service while the VPN is re-established.

5.2.4.7 QoS enforcement Communications one and two are likely to differentiate the traffic according to its priority. However, the ToS field is found in the inner header of the IPsec packet (tunnel mode) and so, unless the field is recopied outside, there is no way for the satellite link to differentiate the packets. Therefore, packets with high priority might be dropped before lower priority ones. Such field is accessible in TLS based VPNs and so, communication 3 will have no QoS enforcement issue.

5.2.4.8 NAT The issue with NAT takes place when NAT is found inside an IPsec VPN (communications 1 and 2). However, in this scenario the IPsec VPN goes from gateway to gateway, therefore being unaffected by the team or HQ networks members having private IPs. The problem comes when members on the VPN path (IPsec gateways or satellite terminals) don’t have a public IP.

49

5.3 ISP scenario

5.3.1 Scenario description In this scenario, an Internet Service Provider buys satellite bandwidth from the satellite operator. It searches to link its terrestrial network to the rest of the internet or some of its own backbone for redundancy matters. The ISP sells internet services to other customers, like home users who want a high speed connection to the internet, companies or organisations. Note that these customers might be unaware that there is a satellite link in their path.

5.3.2 Types of communications

Customer of terrestrial ISP Communication 1 (Location 2) Communication 2

Servers Internet Backbone

Satellite Hub

Bidirectional satellite links

Network of terrestrial ISP Customer of terrestrial ISP (Location 1)

Figure 22: ISP scenarios network setup [ESA Report]

As seen in Figure 22, a customer of the ISP might establish two kinds of secure communications. Because the satellite operator has no influence over the final customer, which will be the one setting up the VPN, this corresponds to case 2. Communication 1: the customer is an organisation or company that wants to connect two of its branches (networks). Communication 2: in this case, the customer connects to a terminal that is in the internet but that doesn’t belong to the ISP network.

5.3.3 Security choices The customer will set his own VPN because it doesn’t necessarily trust the ISP. In communication one, an IPsec VPN in tunnel mode between the two customer networks will be established. In communication 2, sensitive information sent or received with servers in the internet will be done using a TLS/HTTPS VPN.

50

5.3.4 VPN issues

5.3.4.1 PEP and enhanced protocols PEPs and enhanced protocols would be interesting in this scenario. If the ISP is selling high speed internet, its customers might be angry if they are limited by the Bandwidth Delay Product problem. The satellite operator is inside the VPN and it has no influence over the ends of the VPN (case 2). So, in communication 1 (IPsec VPN) it will not be possible to implement any kind of PEP. In communication 2, because it is a TLS VPN, the only PEPs that cannot be implemented are application data compression and cache proxies. The latest aren’t an option because the satellite operator doesn’t know the ISP clients and even so, they might not trust a third party.

5.3.4.2 IP fragmentation In this scenario the satellite operator has no control or even knowledge on the PMTU. Therefore, the MTU should be as big as possible to avoid being smaller than the rest and so, being forced to fragment.

5.3.4.3 Overhead In the ISP scenario, just like the public safety one, there are different applications that generate a variety of packet sizes. For applications like VoIP that generates small packets, the overhead is cumbersome. Therefore, it is recommended to apply header compression to reduce the problem.

5.3.4.4 IPsec anti-replay This issue affects only IPsec based VPNs (communication 1). Losing low priority packets will happen under the circumstances described in 4.5.

5.3.4.5 Multicast The communication using TLS will have no problem when multicasting. However, the IPsec based VPN will prevent the use of multicast unless the right IPsec extension to add multicast support is used. The satellite operator has no influence over the IPsec gateways and so, it can’t make any suggestions.

5.3.4.6 Mobility In case the end users are mobile, it is not a problem of the satellite operator.

5.3.4.7 QoS enforcement QoS can be requested in two different situations. First, the ISP wants to give priority to certain users (identified by their IP addresses). Then QoS enforcement can be applied as long as the outer header addresses are used. Second, the ISP wants to differentiate different kinds of traffic. Then the QoS enforcement cannot be applied in the IPsec tunnel if the ToS label is not copied to the outer header or if it uses the other 5 traffic classifiers as described in section 4.8.

5.3.4.8 NAT Just like multicast or mobility, the satellite operator is not involved in the NAT process and cannot recommend or deploy any solutions.

51

5.4 Aeroplane scenario

5.4.1 Scenario description The aeroplane scenario represents the communications when one of the networks is mobile. In this case, an airplane. The crew needs to communicate with the ground to operate the aircraft. In addition, the passengers are provided with a service that includes connection to the internet and mobile communications (UMTS, etc). The communication service is provided by the Air Communication Service Provider (ASCP), which connects the aircraft with both the internet and the Air Navigation Service Provider (ANSP). This communication is established using the satellite link between the aircraft and the ACSP WAN and then connecting the ACSP WAN with the ANSP and the internet.

5.4.2 Types of communications

Communication 1 ANSP Communication 2 WAN Internet

APC IPSec GW ATS/AOC IPSec GW HA

ACSP WAN Satellite Hub

Bidirectional satellite links

MR Aircraft

ATS/AOC APC IPSec IPSec GW GW Cockpit WLAN End AP Laptop System

Figure 23: Aeroplane scenario [ESA Report]

The traffic coming from an airplane can be of two types, depending on the source. Safety communications consist of communications between pilots and air traffic controllers (Air Traffic Services ATS) and communications between the aircraft and its airline (Airline Operation Control, AOC). These are shown in the figure as “Communication 1”. Because the satellite operator has some influence on the ACSP, it is possible to suggest the deployment of PEPs. Hence, this is case 1.

52

The other type consists of the traffic that comes from the passengers and forms the non-safety communications, which are represented as “Communication 2”. Because there is no control on both ends of the communication (passenger and internet) this is a case 2. Note that communication 1 has a higher priority than communication 2. It is much more important that the crew can communicate and get the plane safely to its destination than that the passengers can use their mobile phones. The current regulation forces the two types of communications to be physically separated. The tendency though, is to have one Mobile Router that handles all the traffic and to separate both connections using two different security associations. As it can be seen in the figure, the two communications use a different IPsec gateway but go through the same Mobile Router.

5.4.3 Security choices In this scenario the traffic is very sensitive. Manipulation of the communication has to be avoided to prevent a catastrophe from happening, like the airplane crashing. For this reason, authentication is required; the system has to prevent an attacker from either impersonating the aircraft or the air traffic controllers. Also, integrity and anti-replay are important to avoid manipulation of the data. The ATS traffic is currently sent unencrypted because of ANSP regulations. However, the other part of the safety communications (AOC) might contain sensitive data and so, confidentiality is a requirement. Note that even though an IPsec VPN is deployed, the passengers (communication 2) might establish their own VPN, for example, an IPsec VPN to connect to the office or using HTTPS to access internet banking systems.

5.4.4 VPN issues

5.4.4.1 PEP and enhanced protocols All kinds of PEPs can be interesting in this scenario. Communication one corresponds to a case one and so, the satellite operator can propose the ACSP to install PEPs or use enhanced protocols just outside the IPsec VPN gateways. Communication two is trickier because the passenger might open its own VPN. Then, being a case 2, only those PEPs that can be installed inside the VPN (IPsec or TLS, depending on the passenger) are useful.

5.4.4.2 IP fragmentation The hosts in the aircraft network are connected to the IPsec gateway via Ethernet (MTU = 1500 bytes) and the ACSP WAN could support an MTU higher than 1500 bytes. The satellite link MTU should avoid being smaller than these two values to avoid causing fragmentation. Traffic coming from the internet (communication 2) might cause fragmentation though. If fragmentation is needed and the satellite link MTU has been chosen at least equal to the ACSP WAN, then the packets will arrive to the satellite link already fragmented.

5.4.4.3 Overhead Like the two previous scenarios, there are applications that generate small and big payloads. Therefore, header compression is advised.

53

5.4.4.4 IPsec anti-replay Because the traffic in safety communications (communication 1) is given different priorities, the loss of packets might happen given the conditions described in 4.5. Communication 2 might also be affected if those conditions apply. As a matter of fact, communication 2 is more likely to be out of bandwidth than communication 1, for the latter has a higher priority. Still, the other conditions are very unlikely to take place.

5.4.4.5 Multicast Safety communications could benefit from multicast. However, the multicast support in aircraft is not yet in a mature state.

5.4.4.6 Mobility Airplanes are by definition moving entities. Also, because they can travel quite long distances, they will have to change from one to another different links or change satellites. The VPN connections should not be interrupted while this happens and so, mobility solutions are required.

5.4.4.7 QoS enforcement IPsec by definition (RFC 4301) preserves the ToS field, so QoS enforcement is possible. It could be an issue in communication 2 if the passenger opens an IPsec VPN from end to end that doesn’t keep the ToS field.

5.4.4.8 NAT IPv6 is used by aircraft and other aeronautic services. Therefore, there is no issue with NAT. However, the passengers might use some piece of equipment that works with IPv4. NAT could take place in the APC IPsec gateway and then it would become an issue.

5.5 Consumer scenario

5.5.1 Scenario description In this scenario, a customer buys the connection through the satellite link directly to the operator, unlike the ISP scenario where the customer bought it from the ISP. Therefore, the user is fully aware that there is a satellite link in the path. Also, because the client is buying directly from the operator, it can provide recommendations like installing PEPs or modifying certain parameters. This won’t be the case though if the client is a private company, for it will likely not accept the satellite operator installing software on the worker’s laptop.

54

5.5.2 Types of communications

Consumer‘s office

Server

Internet Satellite Satellite Hub operator Server Communication 1 Communication 2 Bidirectional Communication 3 satellite links

Modem

Laptop Figure 24: Consumer scenario [ESA Report]

In this scenario, three types of communications can be differentiated. As seen in the precedent figure: Communication 1. This communication is done between the client and some internet servers. Because the satellite operator has no control over the internet, this is case 3. Communication 2. The satellite operator might provide additional services to the client, like e-mail. These services are hosted at the satellite operator and so, it has direct control over them. This is case 1. Communication 3: This communication is done between the user and its company. Therefore, the satellite operator has no control on them, corresponding then to a control case 2.

5.5.3 Security choices Both communications one and two are connecting to services which are likely to be using a TLS/HTTPS VPN to protect application data. These services require a degree of confidentiality because passwords and sensitive data are being transmitted. Authentication is also required to avoid someone impersonating the user to read its e-mails, access its bank account, etc. Integrity and anti-replay protection are also important to prevent an attack that would modify the data (like the destination account of a transaction or the repetition of it). Like in previous scenarios, communication 3 is done between a user and its company, so it is likely that an IPsec VPN is used.

5.5.4 VPN issues

5.5.4.1 PEP and enhanced protocols For communication 1 we have a TLS VPN that allows the satellite operate to use TCP acceleration. Also, header compression is possible. 55

Communication 2 is also a TLS based VPN and so, the same rule as communication 1 applies. However, it corresponds to control case 1, so all the PEP and enhanced protocols could be placed outside the VPN to being able to also use application layer compression and cache. In communication 3, PEPs cannot be placed outside the IPsec VPN as they should in order to work properly.

5.5.4.2 IP fragmentation Much like the ISP scenario, the internet part of the path is unknown. Therefore, it is hard to guess the size of the PMTU. Having a minimum MTU size of 1500 bytes will likely be enough to cover most of the MTUs found on the internet. Communication 2 is different because the path doesn’t go through the internet. In this case, having an MTU size of 1500 bytes is enough because the user’s device will be connected to the modem via Ethernet.

5.5.4.3 Overhead Just like the previous scenarios, we can assume all kinds of traffic and therefore, small and big payloads. Header compression is recommended.

5.5.4.4 IPsec anti-replay Communications one and two are unaffected by this problem because they use a TLS based VPN. However, communication 3 will be affected if the conditions described in 4.5 take place. However, it is highly unlikely that the ToS field is set by the user applications and so, this will not become an issue in this scenario.

5.5.4.5 Multicast Communications one and two will not have the issue of multicasting because it is possible to implement it in TLS VPNs. Communication three applications are usually not in need of multicast support, so it will not be an issue in this scenario.

5.5.4.6 Mobility The users of this scenario are fixed.

5.5.4.7 QoS enforcement QoS is done through the 5 traffic classifiers described in section 4.8. In IPsec VPN these are encrypted and so, QoS will not be possible in communication 3.

5.5.4.8 NAT The TCP or UDP header required by NAT will be visible in communications 1 and 2. However, in communication 3 the headers will be encrypted and therefore inaccessible.

56

6 Technical solutions

6.1 PEP issue

6.1.1 Position of the PEP

6.1.1.1 Distributed PEPs position Distributed PEPs require to be installed in both sides of the communication. The implementation of transport layer PEPs is possible when deploying a TLS VPN. However, IPsec protects the transport layer, which is required for the PEPs to work. Thus, the PEPs have to be installed outside the VPN. This requires control case 1. The PEP solution is offered in different supports: TCP acceleration software. This solution will apply the TCP acceleration via software and before the VPN processing. TCP acceleration hardware. In this case, a dedicated device is placed between the end terminal and the VPN gateway. Integrated TCP acceleration and VPN hardware. One device that can do both the VPN processing and the TCP acceleration (done internally before the VPN).

6.1.1.2 Integrated PEPs position Integrated PEPs are installed in only one side of the communication. Integrated transport layer PEPs are usually based on splitting the connection. Another solution that has already been explained is modifying the behaviour of the congestion window. In any case, the PEP is installed in one side of the satellite link, effectively enhancing the transmission from that side to the other. These PEPs can only be deployed in TLS VPNs or IPsec VPNs in cases 1 and 3. Application layer PEPs also offer integrated solutions in the form of a proxy cache. They are placed close to the user to provide short transmission times. However, if the PEP doesn’t know the security keys, it will have to be placed outside the VPN channel. This requires either control case 1 or 3.

6.1.2 PEP solutions

6.1.2.1 TCP acceleration TCP acceleration is described in section 4.2.3. The splitting of the connection will only be doable when the VPN is TLS based or in the case of IPsec, outside the VPN.

6.1.2.2 Enhanced TCP Enhanced TCP versions can be either deployed in the end machine(s) or set through the use of an intermediary proxy that implements the enhanced version. Some of the features of these protocols are window scaling or selective acknowledgments. Support from both users is required to implement these features. The problem is that all the nodes have to be configured (and maybe upgraded) so control case 1 is needed but in a stronger way.

57

6.1.2.3 TLS aware proxies Some solutions accelerate applications that are protected by a TLS VPN. However, these proxies are not transparent to the user, and so they have to be enabled. Also, the TLS session is split at the proxy and so, it must be aware of the security keys to establish the connections. Therefore, the user has to trust the organisation controlling the proxy (or install it itself).

6.1.3 Choice of VPN depending on the PEPs As it has been said, the use of application layer security (TLS) allows for the implementation of transport PEPs. It is then reasonable to prefer this kind of VPNs. However, it is not always possible to replace IPsec with TLS. Some of the reasons are: IPsec offers a higher level of security because it protects the transport and application layers. IPsec has a single secure channel for all applications and it is independent of them. TLS isn’t. TLS is based on the Client-Server model. IPsec is not restricted to it.

6.1.4 Other VPN solutions that support the use of PEPs and enhanced protocols Some other solutions are Selective Layer Encryption, Multi-Layer IPsec, using feedback from the VPNs or cross layer mechanisms. These solutions have been examined in the project but were discarded due to lack of standardisation or unclear advantages over the already proposed solutions.

6.2 IP fragmentation There are different solutions to the problem of IP fragmentation. Some are more general than others, that is, they can be applied in most situations. There are two approaches to solve this problem. We can either adapt the network to our needs or adapt the packet sizes to the path. In case fragmentation is required it is important to avoid it from happening in the VPN channel. If fragmentation happens inside, the VPN gateway will have to wait for all the fragments before being able to decapsulate the packet. Also, the VPN gateway will be forced to make the reassembly effort.

6.2.1 Adapting the path One solution to the issue is adapting MTU of the different links to the end’s link MTUs. For example, in the public safety scenario the rescue team networks might be connected using Ethernet (MTU = 1500 bytes). Then we could adapt the MTU of the satellite link to be equal or greater than this value and so, prevent any fragmentation at the VPN gateways. Then, it would be wise to set the packet size smaller than the MTU to avoid the security overhead from causing avoidable fragmentation. Note that this requires a high degree of control and influence. The satellite operator has to adapt its own MTU but should also recommend doing so to the members of the network.

6.2.2 Adapting to the path

6.2.2.1 Path MTU Discovery Path MTU Discovery (PMTUD) is used to discover the path’s MTU so the transmitter adapts the packet size.

58

The transmitter starts sending packets with the “Do not fragment” bit set to 1 (IPv4), which is not needed in IPv6 as routers in the way cannot fragment. Then, if anywhere in the path detects that the packet is size is too big, it sends ICMP messages back to the source. These are “ICMP Destination Unreachable with code 4” (IPv4) and “Datagram too big” (IPv6) to inform to the sender and also with the PMTU information so it can adjust the size. The size of the datagram is large at first, and decreases if the ICMP messages report that the datagram was too big. Different suggestions on how to adapt the estimation of the PMTU can be chosen like adapting the value to known typical values or using the “Next hop MTU” field in the ICMP packets [RFC 1191-5]. While any node in the path might initiate the PMTUD, it should be at least used from the end users or the VPN gateways so fragmentation doesn’t take place inside the VPN channel. A few conditions have to be met so PMTUD works: The VPN gateway has to copy the DF bit from the inner to the outer header when working in IPv4 IPsec tunnel mode. ICMP “Datagram too big” or its IPv4 equivalent shouldn’t be propagated back to the end user but instead, the VPN gateway should modify the IPsec tunnel (Security Association, SA) MTU. Implementation of this solution requires a case 1 so the satellite operator has access to the VPN gateways or at least, the end users. Also, note that PMTU changes over time, depending on where the packets are routed. New lower PMTU values will be easily detected because there will be an ICMP message stating that the packet is too big. On the other hand, a larger PMTU will take a new run on the PMTUD algorithm to discover. PMTUD is not free of issues: ICMP packets can be faked to influence the PMTU estimation. Huge amounts of ICMP signalling can block the connection at some nodes. Also, some TCP issues can derive from the use of PMTUD. Their description and solutions can be found on [RFC 2923-2]. Some routers do not send back the ICMP message and some firewalls might block the ICMP messages causing TCP to retransmit the packets not knowing that they were dropped because of their size. Some TCP receivers generate the ACK for two full-sized segments (2xMSS). The PMTU might be much smaller than the MSS, and so, the ACKs arrive only from time to time. This generates bursts of traffic, which in turn will likely increase the packet drops and it also creates a slow start problem on the congestion window. Also described in detail in [RFC 2525-2.13]. Using the PMTU to calculate the MSS advertised might force smaller than necessary packets to be sent. This is because the MSS is being calculated with the PMTU linked to sending packets, while the return channel (for which the MSS is used) might be different. PMTUD is currently supported by most operating systems.

6.2.2.2 Packetization-Layer PMTUD (PLPMTUD) PLPMTUD is an extension of PMTUD that doesn’t rely on the ICMP messages to estimate the MTU. Instead, it uses a higher layer than IP, the packetization layer, that responsible for choosing the packet size. The strategy

59 consists in sending packets specifically created at the packetization layer to the destination varying the packet size to detect whether they make it through or not. This effectively probes the path to estimate the PMTU. The three basic probing parameters are: Upper boundary. The largest possible packet. Chosen equal to the next hop MTU. Lower boundary. This value has to be chosen so it works in most environments. (1024 bytes recommended in RFC 4821) Probe size. The probe initial size should be in between the boundaries. It will correspond to the MTU estimation. If the probe packet is successfully acknowledged then the probe size can be increased. If the probe fails to arrive and it is somehow marked as lost (ICMP message, not ACKed...) but the previous and next packets were sent correctly, then the probe likely failed because of its size. The upper boundary is set to the probe size and the probe size is updated. However, if the probe is lost as well as the surrounding packets or if the probe is lost and there is no information on the following packets then it cannot be assured that the probe was lost because of its size. In these cases, no action is taken. Choosing a new probe packet size is left unspecified in RFC 4821 so it will depend on the implementation. However, some recommendations are given, like for example, what to do when using TCP as packetisation layer. Because failed probes in TCP need to be retransmitted, then it is recommended that the probe size starts relatively small and increases slowly. Like PMTUD, the operator requires to be in control case one to modify the end hosts. The level of implementation of PLPMTUD is low at the moment and it is currently only found in Linux kernel from version 2.6.17.

6.2.2.3 Control case 3 If we are in control case 3, then we only have control over one end. One solution could be fragmenting the packets before the VPN processing, so they fit the PMTU of the VPN channel. Then fragmentation inside the VPN channel would be avoided, even though this solution doesn’t prevent the fragmentation overhead. To perform this solution it is necessary to run some PMTU discovery algorithm between the controller VPN gateway and the uncontrolled one. In case the PMTU discovery can only be run from the gateway to the end user the following algorithm could be used: 1. Use PMTUD to discover the PMTU between the VPN gateway and the end user. 2. If there is an ICMP echo reply, then there are no fragmentation issues. 3. If there is an ICMP destination unreachable then check the original IP header contained in the ICMP message. 4. If the field “Protocol” corresponds to IPsec, then the fragmentation is required inside the VPN channel. Therefore, apply fragmentation before the VPN gateway to avoid doing it inside. 5. If the field corresponded to any other protocol, fragmentation is required outside the VPN channel and so no action has to take place. Note however, that this requires working on IPv4. IPv6 doesn’t allow in-way fragmentation.

60

6.3 Overhead

6.3.1 Overview of the solution As it was explained, one of the issues of working with a VPN is the overhead added to the packet. Because of that, bandwidth consumption increases and it is a highly valuable resource in satellite communications. The proposed solution is to compress the security overhead. Note that the overhead caused by encryption is not compressible, for it there are no identifiable patterns. The load caused by redundant fields like those static (they don’t change from one packet to another) and predictable is reduced. Therefore, the goal is to reduce the header size through compression. Some different aspects are valued when choosing the compression algorithm: The algorithm must be compression transparent. That is, the decompressed header should match exactly the original header. Mechanisms to detect decompression failures are recommended. Good compression efficiency or how much does it reduce the size of the header. Also interesting is the relative gain. It can be used to either compare two different algorithms or different payload sizes. In both cases the bigger the relative gain is, the better.

A couple of examples to illustrate the relative gain: Algorithm 1 (better) Algorithm 2 (worse) Uncompressed header size 40 bytes 40 bytes Compressed header size 1 bytes 5 bytes Efficiency 97.5 % 87.5% Payload size 100 bytes 100 bytes Relative gain 0.4 0.08

Algorithm 1 with small payload Algorithm 1 with big payload (the (the relative result is better) relative result is worse) Uncompressed header size 40 bytes 40 bytes Compressed header size 1 bytes 1 bytes Efficiency 97.5 % 97.5 % Payload size 100 bytes 1000 bytes Relative gain 0.4 0.04

The algorithm should be robust, capable of recovering from packet losses and other eventualities like packet reordering. Immunity to some of the degrading satellite communication parameters like the RTT or the BER. Based on these criteria, RObust Header Compression (ROHC) has been chosen.

61

6.3.2 RObust Header Compression (ROHC) The two main features of ROHC are high compression efficiency and robustness. It uses Cyclic Redundancy Check (CRC) and feedback from the decompressor to recover from errors. ROHC version 1 is unable of handling packet reordering. The two main drawbacks are that ROHC uses a window-based least significant bit (W-LSB) mechanism that makes it quite complex and that it only works on a hop per hop basis. To avoid multiple compressions/decompressions it will only be performed in the most critical link: the satellite link. ROHC works for different combinations of protocols (profiles): TCP/IP, UDP/IP, IP/IP, ESP/IP, RTP/UDP/IP and UDP-lite/IP. Note that all different profiles include the IP header. Depending on the VPN type, the compression will be applied on: TLS: IP+TCP headers. IPsec with NAT traversal: IP+UDP headers. IPsec transport mode: IP+ESP or IP+AH headers. IPsec AH tunnel mode: IP+AH outer headers and IP+UDP/TCP inner headers. IPsec ESP tunnel mode. IP+ESP outer headers. A control case two is enough to deploy ROHC and so, the satellite operator is always capable of performing compression.

6.3.2.1 ROHC modes ROHC has three different operating modes. They differ in robustness and return channel requirements. Unidirectional mode (U-Mode): In this mode there is no feedback from the decompressor to the compressor. The compressor will send a sufficiently large amount of uncompressed packets to the decompressor to assure that at least one arrives, so it can build a context. States will be determined by timeouts and irregularities in the header field pattern. Optimistic mode (O-Mode): In this mode feedback is used but rarely and it is not the only tool to determine the state. Reliable mode (R-Mode): This mode maximises robustness at the expense of compression efficiency. The feedback is used for NACK and ACK.

6.3.2.2 ROHC compressor states Initialisation and Refresh (IR): In this state the compressor sends all the header information to initialise the decompressor. First Order (FO): The header is partially or completely compressed but some static fields might still be updated. Second Order (SO): This state is reached when the compressed header is at the minimum size possible.

6.3.2.3 ROHC decompressor states No Context (NC): This is the initial state of the decompressor. No context has yet been created. Full Context (FC): A packet has been decompressed and so, the context has been created. The decompressor stays in this state unless there are too many failures.

62

Static Context (SC): The decompressor might move from FC to this state if too many packets have been lost (k1 out of the last n1, being the two parameters configurable). In O-mode and R-mode a NACK might be sent to the compressor. A single decompressed packed from FO will allow the decompressor to move back to FC. If k2 out of n2 failures happen while in SC, the decompressor will transition to NC and the cycle has to be reset.

Figure 25: ROHC decompressor flow chart

6.3.2.4 ROHC over IPsec (ROHCoIPsec) This extension of ROHC is available to being able to compress the inner headers in IPsec tunnel mode. However, the compression and decompression have to be performed in the same entity that performs IPsec (VPN gateway). When using this extension, the restrictions are greater and a case 1 is needed. Because compression is done between more than one hop, ROHC has to rely on a non-link layer to negotiate the compression parameters. IKEv2 is recommended for the parameter exchange [RFC 5857]. ROHCoIPsec requires several extensions to IPsec processing. These can be found in RFC 5858.

6.3.3 ROHCv2 ROHCv2 aims at simplifying the specifications while maintaining the efficiency and robustness of ROHC. All the differences with ROHC are listed in RFC5225. A few of them are: Unlike ROHC, the state machine and operating moves have been excluded from the compressor. The compressor chooses the packet type based on feedback, the characteristics of the channel and the fields it has to update. It tolerates a certain amount of reordering.

63

It works in unidirectional mode until feedback is received. Then it will switch to bidirectional. The format of the packets is independent of the mode.

6.3.4 ROHC, ROHCv2 and ROHCoIPsec ROHCv2 is similar to ROHC but with added robustness and the ability to handle reordering. However, it is not backwards compatible; it cannot be used with other systems that use ROHC. ROHCoIPsec is relatively new and no implementation exists, though it might be an interesting option since the inner headers are also compressed.

6.4 IPsec anti-replay issue There are a few solutions to the IPsec anti-replay issue:

6.4.1 Disabling the protection If there is no anti-replay mechanism, there is no issue. Needless to say, this is unacceptable in most cases. For instance, ESP implementations must support it though the receiver might disable it. Therefore we require a case 1 or 3.

6.4.2 Increasing the window size The minimum window size is 32 packets and the recommended, 64. Either in case 1 or 3, the operator can modify the window size so there are no packets dropped. It is possible to increase the size without suffering an increase in computation and memory load. Cisco recommends a window size of 1024 packets. [Cisco window]

6.4.3 Multiple SA One of the conditions for the anti-replay issue is that different priority traffic goes through the same SA. Therefore, if one SA is created for each kind of traffic, then the issue is gone. This can be done by changing the SPI (which identifies the SA). This solution requires access to both VPN gateways and therefore is a control case 1.

6.4.4 Shutting down QoS Another solution is switching off QoS enforcement. While this has the advantage that no issue will take place and that it can even be done in control case 2, it is not the perfect solution. QoS is enforced to make sure high priority packets are the last to be dropped in case of overuse of the channel. If we switch QoS the packets will be lost independently to their priority, so high priority packets might be lost. If QoS is kept but there is an issue with anti-replay, it will be the low priority packets that will be dropped.

6.5 Multicast Multicast security is currently still under standardisation (MSEC protocol). Also, there is no practical experience from operational deployment and therefore it is not studied in the project.

64

6.6 Mobility issue

6.6.1 IPsec and mobility If there is a handover caused by mobility or any other interruption of the network for a few minutes, it can happen during different stages of the IPsec connection. If it happens when the keys are being exchanged, then the gateway will retry the connection a few times before giving up. The times it will try and the delay between them can be configured. If it happens when the SA has exceeded its valid time and it is performing a rekeying then the same behaviour as when initialising happens. If no rekeying is possible, the connection will be reinitialised. The last stage at which the connection might be interrupted is while sending IPsec protected data. In this case the connection can survive for quite long. In any case, the connection can be re-established at the cost of not being able to transmit data while the handover is taking place and while the connection is being re-established.

6.6.2 Mobile IP The first solution to the problem is using Mobile IP. While it doesn’t avoid the flow interruption during the handover, it prevents the system from needing to re-establish the IPsec connection. The approach of this solution is to try to have a static IP address when in reality, the user is mobile and it changes its IP address when it changes from network to network. Some specific terminology of Mobile IP: Mobile Node (MN): The mobile user. Care of Address (CoA). This is the IP address assigned to the MN by the network at which it is connected to. Home Address (HoA). This is the static IP address that represents the MN. Foreign Agent (FA). A mobile capable modem that gives the CoA to the MN when it connects to its network. Home Agent (HA). This is the node in charge of representing the mobile user. Binding Update message (BU). This message is sent to the HA when the MN changes its CoA. It contains the HoA and the new CoA. Binding Acknowledgement (BA). The reply answer that the BU was received correctly. Correspondent Node (CN). The user trying to communicate with the MN. It will only know the HoA, as it is the static address.

65

MN (CoA) HA CN

Dest. Source. Payload

HoA CN address

Outer dest. Outer src. CoA HA address

Inner dest. Inner src. Payload HoA CN address

Outer dest. Outer Source. HA address CoA

Inner dest. Inner Source. Payload CN address HoA

Dest. Source. Payload

CN address HoA

Figure 26: Mobile IP data exchange

As it can be noticed in the figure, the CN doesn’t need to know the MN address (CoA) to send it a packet. This can actually be seen as a splitting of the connection in the HA, that acts as a proxy. Mobile IPv4 includes some protection (authentication and integrity protection) between the MN and the HA, but it is not complete (for example, no anti-replay). The MN might decide that it wants to set up its own VPN. The HoA can be used then, as it is static and will not change even if there is a handover. In Mobile IPv6 the anti- replay for BU, the data integrity protection and encryption are realised by IPsec. There are some differences between Mobile IPv4 and Mobile IPv6. Mobile IPv6 allows the MN to communicate via BU its new CoA to the CN (note that the CN must support MIPv6). Note that this means that

66 there is no need for the traffic to go through the HA once the connection is established, effectively reducing the jitter and the latency. MIPv6 is already integrated as part of IPv6 and it constitutes just an extension header with the mobile parameters. While mobile IP seems like a good solution, it has some maturity problems. It requires the availability of a mobility service provider (HA) which at the moment, none exist. Even so, the HA can become the weakest part of the link. Also, though there are a few implementations to integrate Mobile IP, no native support is provided in any operating system.

6.6.3 NEtwork MObility (NEMO) While Mobile IP provides support for a single node (MN), NEMO does the same for a whole moving network, like for example, an airplane. The group of nodes forming the mobile network is known as Mobile Network Nodes (MNN). The network is managed by a single entity, the Mobile Router (MR). NEMO incorporates a new flag in the BU to indicate that the sender is a MR and not a MN. The same applies for the BA.

6.6.4 IKEv2 Mobility and Multihoming Protocol (MOBIKE) The idea behind MOBIKE is to, instead of having a single static IP, allowing more than one IP for each SA. When the key exchange is done, the users will indicate if they support MOBIKE. If they do, then more than one address can be added to the list of accepted addresses. In case that one of the users changes from one network to another and the new IP wasn’t included in the list, there is no problem. An “INFORMATIONAL” request can be sent containing an “UPDATE_SA_ADDRESSES” notification. This request is sent using the new IP address. This request is protected so the users involved in the SA know that it isn’t an external user trying to fake the request. Note that MOBIKE doesn’t provide support for two mobile users. At least one of them has to be fixed. Also, because there is no fixed home address, the fixed user doesn’t know the mobile’s address until the connection is established. This means that only the mobile user can start the connection. Only a few implementations of IKE exist that support MOBIKE. Also, the process of detecting the movements is not yet standardised.

6.6.5 Comparison between Mobile IP and MOBIKE MOBIKE is restricted to fixed-to-mobile scenarios. In MOBIKE only the mobile user can start the connection. Mobile IP mobility service does not yet exist. Mobile IP adds latency to the connection because the packets have to go through the Home Agent.

6.7 QoS enforcement It should be reminded that different traffics have different needs. For example, VoIP requires low one way delay, low jitter and a minimum guaranteed bandwidth. FTP traffic is unaffected by delay and the transmission of data can be bursty or even interrupted at times, but it is heavily affected by packet losses. Different

67 treatment can be done to each type of traffic. There are two main protocols when it comes to QoS in the internet: IntServ and DiffServ. IntServ makes use of the Resource Reservation Protocol (RSVP) to signal and reserve the QoS parameters for each flow. All the routers in the path are required to support RSVP and each will either allocate or deny the requested resources. All routers have to store the information for each flow and so, this solution is cumbersome when used in heavily operated nodes. DiffServ uses the DS field to manage the traffic. Instead of identifying a flow, DiffServ differentiates each packet depending on the DSCP value. This value will imply different treatments for different traffics. Unlike IntServ, the routers on the path do not need to store information – they are fine only reading the DSCP field. Therefore, this solution is easy to implement and scalable. As it has been explained in the issues, the QoS can be done using the 5 different traffic classifiers. Also, IPv4 adds the ToS field for QoS purposes. In IPv6 two fields, Traffic Class (TC) and Flow Label (FL) are available. Both the ToS and the TC fields are used when deploying DiffServ. The issue comes when using IPsec tunnel mode as these fields are protected. The solution is to copy the QoS fields into the outer header. This solution though is not perfect. Copying the DSCP means that the traffic type can be seen by an attacker and so, the type of application can be guessed (for example, VoIP). Also, this solution requires control case 1.

6.8 Network Address Port Translation (NAPT) issue When using NAPT the IP and transport layer headers are modified. Therefore, access to them is needed. In IPsec tunnel mode, we use a NAT traversal mechanism integrated in IPsec to solve the problem. A UDP header is included between the outer header and the security header to provide access to the needed ports for NAPT. In order to do the UDP encapsulation and decapsulation, both IPsec endpoints have to support NAT traversal. This means that this solution can only be implemented in control case 1. The presence of NAT inside the VPN is detected when using IKEv2. During the initialisation phase, the protocol exchanges between the two endpoints a “NAT_DETECTION_SOURCE_IP” notify payload containing the SPI, source IP and source ports and “NAT_DETECTION_DESTINATION_IP” containing the SPI, the destination IP and the destination port. If any of the information received is different than the IP/UDP header values, NAPT has been performed. NAT traversal has some limitations though: No support for AH. IPsec AH protects the IP header from modification and so, only ESP can be used. The communication has to be started from the node behind the NAT device as a static IP is required to initiate the communication. Therefore, two nodes behind NAT cannot start a connection. The NAT mapping is refreshed periodically and a signal is sent stating the update. However, if the communication is briefly interrupted, it might cause a mapping timeout and so, the IPsec communication would have to be reinitialised. If two gateways behind different private networks use the same private IP address and connect to a public-address gateway, then two SA would be linked to the same IP address.

68

6.9 Technical solutions for the Aeronautical Scenario The aeroplane scenario has been described in section 5.4. As it was explained before, the communications in this scenario can be divided in two: Communication 1: This includes all traffic between the cockpit and the ANSP WAN (ATC and AOC). This traffic is non-encrypted due to regulations. Because of the influence of the satellite operator over the ACSP, it is considered control case 1. Communication 2: Traffic between the passengers and the internet (APC). Since there is no control over the passenger, this is control case 2. All other scenarios have been analysed as part of the project. The scenarios “Consumer” and ISP have been discarded for their implementation in the testbed because all the solutions they use are already present in either the Public Safety or Aeronautical scenarios. I will focus on the Aeronautical scenario because it is TriaGnoSys main activity and also because it is the field where I would like to specialise in.

6.9.1 PEP issue Both communications go through the same satellite link but they are logically separated. Each communication goes through a different IPsec gateway before entering the satellite link, so the traffic flows belong to different security associations. Communication 1, being a control case one, allows for the deployment of enhanced protocols or PEPs. According to the Newsky analysis [Newsky] it standard TCP is not good enough for ATS/AOC traffic in satellite links. PEPs improve the performance but suffer from the slow start problem. They are best suited for long sustained traffic. ATS/AOC is mainly non sustained traffic consisting of short messages, so the solution is using an enhanced version of TCP. Communication 2 on the other hand is more standard traffic. A distributed PEP is deployed outside the IPsec channel. However, it will only be effective when the passengers don’t start their own IPsec VPN. If they do, the PEPs will be placed inside the passenger’s IPsec channel and will not work. An improvement could be achieved if the passenger would use an enhanced version of TCP. Being case 2, there is nothing the satellite operator can do to enforce this.

6.9.2 IP fragmentation IP fragmentation is not an issue for communication 1. The MTU of each link is controlled and so it can be chosen to avoid fragmentation. Communication 2 presents more problems, for the traffic comes and goes to the internet. It is unknown a priori what networks will the packets cross and so, the Path MTU. The solution in this case is to enforce the passenger to perform Path MTU Discovery. To do so, two solutions have been proposed, PMTUD and PLPMTUD. PLPMTUD is not possible because it has to be deployed outside the VPN channel and communication 2 is control case 2. PMTUD is desired at least, between the IPsec gateways. If the end hosts send IPv6 or IPv4 with the Don’t Fragment flag (DF) to set packets, then PMTUD is performed. However, if the end hosts send IPv4 packets with the DF clear, we have to enforce PMTUD at the IPsec gateways. When such packets arrive at the IPsec

69 gateways the DF is changed to set during the routing decision. This way, if the packet would require fragmentation, an ICMP code 4 message (datagram too big) is sent back to the source host.

6.9.3 Overhead The problem of overhead will be mitigated using header compression. ROHC is implemented at both ends of the satellite link. Because it is a transparent solution, it can be applied for both communications 1 and 2. ROHC takes place at the satellite link. The headers found on all packets of that link are an IPv6 header (mobility encapsulation), another IPv6 header (IPsec tunnel encapsulation), the ESP/AH header and then the IPsec payload. Therefore, IPv6+IPv6+ESP/AH headers will be compressed. Since the aeronautical regulations forbid encryption for ATS/AOC traffic, the headers in the IPsec payload of these packets could be compressed by ROHC. Therefore, additional compression could be achieved for communication 1.

6.9.4 IPsec anti-replay issue This issue is not studied further because it rarely occurs and a set of conditions must be given for the issue to arise. Even so, the affected traffic would be the lowest priority traffic.

6.9.5 Mobility Two solutions have been proposed for mobility, Mobile IP / NEMO and MOBIKE. For the aeronautical scenario, MOBIKE is unsuited because of MOBIKE’s drawbacks, like being forced to start the communication on the airborne side. Therefore, Mobile IPv6 is used. Because of the static IP address (HoA) given by the mobility solution, when a handover occurs, the connection is not interrupted. All traffic sent to the HoA will be routed to the Mobile Router in the airplane by the Home Agent in the ACSP WAN. This solution is transparent to the end hosts and it is implemented in the satellite link, therefore it can be implemented even for control case 2. Therefore, it is useful for both communications 1 and 2.

6.9.6 Quality of service DiffServ is an easy to implement solution and therefore it will be used for communication one. The end hosts (Cockpit and ATS end systems) will mark the packets with a different value on the DSCP field depending on a pre-configured policy. For this solution to work, the DSCP value has to be copied to the outer IP header when the packet is encapsulated. If the value isn’t copied, the packet cannot be treated at the satellite link according to the chosen priority. In communication one, there are two tunnels: the IPsec tunnel and the mobility tunnel. Because communication one is control case one, it is possible to recommend the ACSP to deploy an IPsec and Mobile IP implementations that copy the value. In communication two, when the user enables its own IPsec processing, it cannot be assured that the DSCP field will be copied to the outer header. Therefore, these packets will be treated as Best Effort (lowest priority) in favour of explicitly higher priority traffic.

70

6.9.7 NAT Communication 1 is completely in IPv6. In theory, NAT is meant to disappear with IPv6. However, even if this wouldn’t happen, communication 1 is still control case one, so NAT traversal could be employed. NAT is used for communication two since most devices still work on IPv4. If the passenger establishes an IPsec VPN, it must enable NAT traversal. Being control case 2, this cannot be recommended by the satellite operator. However, most implementations have NAT traversal enabled by default.

71

7 Testbed design In order to analyse the effect of the proposed solutions, a testbed has to be designed and built. It has to be noted that at the time of the testbed design, there was a decision still pending from ESA about the type of PEP. Two options were being studied: hardware and software PEP.

7.1 Aeronautical scenario testbed description The testbed is designed based on the proposed solutions and the scenario as shown previously in Figure 23. The scenario nodes can be divided in two groups: those that belong to the mobile or airborne subnetwork and those that belong to the ACSP group subnetwork. Additionally, in the testbed architecture there is another node between the subnetworks to represent the satellite link.

ATS ATS ATS ATS IPsec IPsec ES ES GW GW ANSP WAN Satellite Link Satellite MR HA terminal emulator Hub APC Device APC APC TCP Internet IPsec IPsec PEP server GW GW Web Internet cache TCP PEP Airborne subnetwork ACSP ground subnetwork

Figure 27: The aeronautical scenario testbed architecture

The airborne subnetwork nodes are: Air Traffic Services End System (ATS ES). This node represents the cockpit and generates all safety communications, including both ATS and AOC. Air Traffic Services IPsec Gateway (ATS IPsec GW). This IPsec gateway is used to encapsulate/decapsulate safety communications to logically separate them from APC traffic. Air Passenger Communication Device (APC device). Similar to the ATS ES, this node generates all non safety communications on the airborne subnetwork. Web cache. This node implements web caching of HTTP and HTTPS traffic for the airborne subnetwork. TCP PEP. This node implements a Distributed TCP PEP solution as proposed. Air Passenger Communication Gateway (APC IPsec Gateway). The non safety counterpart of the ATS IPsec GW, this node IPsec processes non safety traffic.

72

Mobile Router (MR). The mobile router connects the airborne subnetwork to the rest of the world. It can do so through any of the two links that connect it to the Satellite Terminal. Because it implements MIPv6/NEMO, the IP connectivity of the airborne subnetwork is maintained even if there are link handovers. Satellite Terminal. This node is used to simulate the access to the satellite links and to simulate link handovers. While in reality there would be one terminal per link, the implementation for the testbed implements them all into a single node, using different pairs of interfaces for each link. This is done for resource saving purposes. The Link Emulator emulates two different links. Just like the Satellite terminal, both links are integrated on the same node when testing. Still, they have different characteristics in terms of delay, loss, etc. To better illustrate where the packets go when travelling from one or another link, the two links have been coloured on the following figure:

ATS ATS ATS ATS IPsec IPsec ES ES GW GW ANSP WAN Satellite Link Satellite MR HA terminal emulator Hub APC Device APC APC TCP Internet IPsec IPsec PEP server GW GW Web Internet cache TCP PEP Airborne subnetwork ACSP ground subnetwork

Figure 28: The two different satellite link paths in the testbed

The ACSP ground subnetwork nodes are: Satellite Hub. The hub serves a similar function as the Satellite Terminal, allowing the ground subnetwork to access the satellite links. Air Traffic Services IPsec Gateway (ATS IPsec GW). Same as the airbone one. Air Traffic Services End System (ATS ES). Same as the airborne one. Home Agent (HA). The Home Agent redistributes all packets coming from the mobile router and serves as gateway for the ground IPsec gateways. Air Passenger Communication Gateway (APC IPsec Gateway). Same as the airborne one. TCP PEP. The other part of the distributed PEP deployed. Internet Server. All services implemented in the APC device as clients will have their server counterparts in the Internet Server.

73

7.2 Node functionalities and software To fulfil their function, the nodes implement some services. The following table shows which ones and the software used for it.

Nodes Implement (software) ATS ES (airborne and ground) VoIP (Linphone). Message generation (iperf). TCP enhancement (linux sysctl). DSCP tagging (iptables, ip6tables). ATS IPsec Gateways (airborne and ground) IPsec (linux IPsec implementation, strongSwan). APC device and Internet Server VoIP (Linphone). Web browsing (Google Chrome) FTP-TLS (FileZilla for APC device and ProFTP for Internet server) E-mail (Linphone). Message generation (iperf). Web cache HTTP and HTTPS caching (Squid). TCP PEP TCP splitting and acceleration (Mentat SkyX if hardware solution is chosen) APC IPsec Gateways (airborne and ground) PMTUD (iptables). DSCP tagging (iptables, ip6tables). IPsec (linux implementation, strongSwan). DHCP, airborne only (DHCP3-Server). NAT, ground only (iptables). Mobile Router MIPv6/NEMO (nautilus6). Satellite terminal QoS enforcement (ip6tables, tc). Header compression (ROHC). Link Emulator Link emulation (tc, netem). Satellite Hub QoS enforcement (ip6tables, tc). Header compression (ROHC). Router advertisement (radvd). Home Agent MIPv6/NEMO (nautilus6). Router advertisement (radvd). Table 5: Node functionalities and software

7.3 Aeronautical scenario testbed addressing scheme The testbed simulates all the nodes in of the architecture as virtual machines. To interconnect them, the interfaces have been given a layer two hardware address and they have been bridged.

74

The hardware address starts at 00:16:3e:06:01:XX and increases by one for each interface. Other than the interfaces shown in the previous figures, each node has an additional interface (eth0) that is used to connect remotely to the machine using ssh. The address of this interface is green coloured in the figure. The bridges simulate the Ethernet link in between the different nodes. All the eth0 interfaces are connected to the br0 bridge. Note that at the moment, the PEP implementation is still undecided between a software and hardware solution. For that reason, the address has not yet been reserved. The interfaces have also been assigned a name. Then, the networks have an IP address that is version 4 or 6 depending on the subnetwork. The internet and the passenger devices are IPv4, so these networks use version 4 of the Internet Protocol. On the other hand, the rest of the subnetworks are IPv6.

75

esabr1 00:16:3e:06:01:XX esabr13 :37 ATS esabr4 ATS ATS :35 :36 :38ATS :01 :03IPsec :04 IPsec ES ES GW GW :00 :02 esabr6 esabr8 esabr10 esabr12 :34 ANSP :14 WAN esabr2 :16 :19Satellite:21 :24 Link :26 :29Satellite :13 MR :31 :33 HA :32 :17 :20terminal:22 :25emulator:27 :30 Hub APC :06 esabr5 :15 :18 :23 :28 esabr16 Device :12 :39 :45 APC APC :05 TCP esabr7 esabr9 esabr11 Internet :XX :XX :11IPsec :40 IPsec:43 :46 PEP server GW :41GW:42 Web :09 :10 esabr14 :08 :XX:XX Internet cache esabr3 TCP esabr15 :07 PEP Airborne subnetwork ACSP ground subnetwork :44

Figure 29: Link layer addressing and bridging of the testbed nodes

76

3ffe:ff00:1:1::/64 3ffe:ff01:1::/48

eth2 eth1 ATS ATS :2 :1 ATS ATS eth2 IPsec IPsec :2 ES ES :2 :1 :2 3ffe:ff00:1::/48 3ffe:ff02::/32 3ffe:ff01::/32 GW GW eth1 eth1 eth1 eth1 ANSP eth3 eth1 eth3 eth1 eth3 eth1 :1 :1 eth1 WAN Satellite Link Satellite :1 :3 MR HA 3ffe:ff00::/32 eth1 terminal emulator Hub eth3 APC .4 eth2:1 eth4 eth2 eth4 eth2 eth4 :1eth2 Device eth2:2 eth4 .2 APC APC .3 TCP eth1 .1 Internet 10.0.0.0/24 IPsec 3ffe:ff03::/32 IPsec PEP .1 :4 server GW GW .2 eth1 eth2 eth2 eth3 eth1 Web eth1 eth1 eth2 cache .5 3ffe:ff00:2::/48 TCP Internet eth1 PEP Airborne subnetwork ACSP ground subnetwork

192.168.0.0/24

Figure 30: Internet Protocol addressing of the testbed nodes

77

The next table contains a summary of all the addresses, interfaces names, etc. The “a” in the node names indicates “airborne subnetwork” and the “g” indicates “ACSP ground subnetwork”. Node Parameter Eth0 Eth1 Eth2 Eth3 Eth4 ATS ES a IP address/network mask 172.24.20.1/16 3ffe:ff00:1:1::2/64 / / / Hardware address 00:16:3e:06:01:00 00:16:3e:06:01:01 / / / Bridge esabr0 esabr1 / / / Gateway 172.24.0.254 3ffe:ff00:1:1::1 / / / ATS IPsec GW a IP address/network mask 172.24.20.2/16 3ffe:ff00:1:1::1/64 3ffe:ff00:1::2/48 / / Hardware address 00:16:3e:06:01:02 00:16:3e:06:01:03 00:16:3e:06:01:04 / / Bridge esabr0 esabr1 esabr4 / / Gateway 172.24.0.254 / 3ffe:ff00:1::1 APC dev IP address/network mask 172.24.20.3/16 10.0.0.4/24 / / / Hardware address 00:16:3e:06:01:05 00:16:3e:06:01:06 / / / Bridge esabr0 esabr2 / / / Gateway / 10.0.0.3 / / / Web IP address/network mask 172.24.20.4/16 10.0.0.5/24 / / / Hardware address 00:16:3e:06:01:07 00:16:3e:06:01:08 / / / Bridge esabr0 esabr2 / / / Gateway / 10.0.0.3 / / / TCP PEP a IP address/network mask 172.24.20.5/16 10.0.0.3/24 10.0.0.2/24 / / Hardware address 00:16:3e:06:01:09 TBD TBD / / Bridge esabr0 esabr2 esabr3 / / Gateway / / 10.0.0.1

78

APC IPsec GW a IP address/network mask 172.24.20.6/16 10.0.0.1/24 3ffe:ff00:2::2/48 / / Hardware address 00:16:3e:06:01:10 00:16:3e:06:01:11 00:16:3e:06:01:12 / / Bridge esabr0 esabr3 esabr5 / / Gateway 172.24.0.254 / 3ffe::ff00:2::1 / / MR IP address/network mask 172.24.20.7/16 3ffe:ff00:1::1/48 3ffe:ff00:2::1/48 Auto CoA Auto CoA Hardware address 00:16:3e:06:01:13 00:16:3e:06:01:14 00:16:3e:06:01:15 00:16:3e:06:01:16 00:16:3e:06:01:17 Bridge esabr0 esabr4 esabr5 esabr6 esabr7 Gateway 172.24.0.254 / / / / Sat term IP address/network mask 172.24.20.8/16 None None None None Hardware address 00:16:3e:06:01:18 00:16:3e:06:01:19 00:16:3e:06:01:20 00:16:3e:06:01:21 00:16:3e:06:01:22 Bridge esabr0 esabr6 esabr7 esabr8 esabr9 Gateway 172.24.0.254 / / / / Link emul IP address/network mask 172.24.20.9/16 None None None None Hardware address 00:16:3e:06:01:23 00:16:3e:06:01:24 00:16:3e:06:01:25 00:16:3e:06:01:26 00:16:3e:06:01:27 Bridge esabr0 esabr8 esabr9 esabr10 esabr11 Gateway 172.24.0.254 / / / / Sat hub IP address/network mask 172.24.20.10/16 3ffe:ff02::1/32 3ffe:ff03::1/32 3ffe:ff01::1/32 / Hardware address 00:16:3e:06:01:28 00:16:3e:06:01:29 00:16:3e:06:01:30 00:16:3e:06:01:31 / Bridge esabr0 esabr10 esabr11 esabr12 / Gateway 172.24.0.254 / / / / HA IP address/network mask 172.24.20.11/16 3ffe:ff01::3/32 / / / Hardware address 00:16:3e:06:01:32 00:16:3e:06:01:33 / / / Bridge esabr0 esabr12 / / /

79

Gateway 172.24.0.254 3ffe::ff01::1 / / / ATS IPsec GW g IP address/network mask 172.24.20.12/16 3ffe:ff01::2/32 3ffe:ff01:1::1/48 / / Hardware address 00:16:3e:06:01:34 00:16:3e:06:01:35 00:16:3e:06:01:36 / / Bridge esabr0 esabr12 eabr13 / / Gateway 172.24.0.254 3ffe:ff01::3 / / / ATS ES g IP address/network mask 172.24.20.13/16 3ffe:ff01:1::2/48 / / / Hardware address 00:16:3e:06:01:37 00:16:3e:06:01:38 / / / Bridge esabr0 esabr13 / / / Gateway 172.24.0.254 3ffe:ff01:1::1 / / / APC IPsec GW g IP address/network mask 172.24.20.14/16 3ffe:ff01::4/32 / / 192.168.0.1/24 Hardware address 00:16:3e:06:01:39 00:16:3e:06:01:40 00:16:3e:06:01:41 00:16:3e:06:01:42 00:16:3e:06:01:43 Bridge esabr0 esabr12 esabr14 esabr15 esabr16 Gateway 172.24.0.254 3ffe:ff01::3 / / / TCP PEP g IP address/network mask 172.24.20.15/16 192.168.1.1/24 192.168.1.2/24 / / Hardware address 00:16:3e:06:01:44 TBD TBD / / Bridge esabr0 esabr14 esabr15 / / Gateway 172.24.0.254 / / / / Inet server IP address/network mask 172.24.20.16/16 192.168.0.2/24 / / / Hardware address 00:16:3e:06:01:45 00:16:3e:06:01:46 / / / Bridge esabr0 esabr16 / / / Gateway / 192.168.0.1 / / / Figure 31: Testbed addresses and bridges

80

Note that the nodes have a default IPv4 gateway that will be useful in case there is need to access some content outside the network. Four nodes (APC device, web cache, TCP PEP airborne and Internet server) require a default IPv4 gateway determined by the testbed network architecture and so, they cannot use the default gateway. This causes the problem that the machines outside the network could connect to them but there wouldn’t be a route for the reply. To allow SSH to be done from the TriaGnoSys local network, the following route is added after the interfaces are created, on these four specific nodes: ip route add 192.168.10.0/24 dev eth0 via 172.24.0.254

Another thing to note is that the Satellite Terminal, The Link Emulator and some Mobile Router interfaces do not have an assigned IP address. This is done intentionally. The interfaces eth3 and eth4 of MR will get their IP addresses based on the router advertisement sent from the satellite hub. The Satellite terminal and Link emulator don’t need addresses as they should simply pass all traffic from eth1 to eth3 and that from eth2 to eth4. To do so, the interfaces are internally bridged.

7.4 Implementation issues Before starting the design of the testbed, some tests have been made with the existing testbed of the project NewSky. On this project, some issues like PEP or mobility were already present and so it is possible to quickly test some features. The goal of these tests was to become familiar with the software that was likely to be used for the testbed and detect if there were any components missing. For example, one of the tests involved verifying if fragmentation happens when encapsulating in an IPv6 tunnel using Linux. These tests and information on the software can be found on the Annex. Once the testbed architecture was proposed, some more tests had to be run to verify that it was possible to implement such architecture. Three possible conflicts were detected. They are explained on the following subsections.

7.4.1 APC IPsec GW and PEP The order of the packets processing going from ground to airborne subnetworks would be first NAT, then PEP and then IPsec among other steps. The APC IPsec Gateway is meant to implement both IPsec and NAT to avoid separating them and use another machine. An issue could arise from this situation. The PEP function goes between the NAT and the IPsec, meaning that if the PEP is a hardware solution then it should be plugged in between the NAT and IPsec functions. However, they are both represented by the same machine. Therefore, traffic will leave and come back to the same machine. Therefore, the configuration of the ground APC IPsec GW would be:

81

Interfaces to the PEP

Interface eth2 eth3 to the Interface to ACSP the public WAN Internet

IPsec2 NAT

eth1 eth4 DSCP PMTUD tagging

Figure 32: Ground APC IPsec gateway

Note that the arrows indicate the traffic flow. The tricky part here is to make the packets follow the flow without interference from the rest of the components of the machine. The configuration should be as follows: Routing o All packets coming from eth1 should be routed to eth2. o All packets coming from eth2 should be routed to eth1. o All packets coming from eth3 should be routed to eth4. o All packets coming from eth4 should be routed to eth3. NAT o Iptables should use the SNAT target (NAT table, POSTROUTING chain [iptables tutorial]) on all traffic leaving through eth4. PMTUD o Iptables should change enforce PMTUD by setting the DF bit to 1 at the PREROUTING chain for all packets coming from eth2. o The bit shall be restored to its original value after it is verified against the IPsec tunnel MTU. DSCP tagging o The DSCP field can be changed at the FORWARD chain at the mangle table for all packets coming from eth2. IPsec o Packets coming from the APC airborne subnetwork (remote network 10.0.0.0/24) and from the internet should be IPsec processed. Packets coming from the internet should only be processed after the PEP. These conditions have to be configured with iptables, iproute2 and strongSwan. Packets coming from the Home agent and going to the internet will be routed to eth2 and decapsulated by IPsec. Because NAT will only be applied to packets going to eth4, IPsec is applied first. 82

For packets coming from the internet, PMTUD and DSCP tagging are applied when the packet comes from eth2 meaning that NAT and PEP will for sure have taken place. In turn, they will be done before IPsec because PMTUD and DSCP tagging are done at the PREROUTING, INPUT and FORWARD chains, which go before the check against xfrm transform that will send the packet for ESP encapsulation. The problem comes with IPsec and PEP. The IPsec policy, as configured with strongSwan, doesn’t differentiate the input or output interfaces but the addresses. Therefore, the packets coming from eth4 would be IPsec encapsulated before being sent through eth3. In order to solve this problem, it was looked into the possibility to force the packet coming from eth4 to go to eth3 without checking the policies and the strongSwan configuration options. The proposed solution is to differentiate traffic according to the protocol. Since it is only required that TCP traffic goes to the PEP, the rest can go directly from the NAT function to the IPsec function. Therefore, the solution is to devise an IPsec policy that excludes TCP packets. TCP packets will then pass the IPsec policy check and sent to the PEP. These TCP packets match the policy after they are changed from TCP to the proprietary XTP by the PEP.

7.4.2 Bridging ROHC As it has been mentioned before, the Satellite terminal and Link emulator have their interfaces bridged in pairs. However, this is more complicated to do with the Satellite terminal than it is with the Link emulator. The reason is that the Satellite terminal has to implement ROHC. Therefore, the internal bridge between the interfaces has to be done differently. ROHC creates an interface that we will call rohc1 for the eth1-eth3 link and rohc2 for the eth2-eth4 link. The ROHC tunnelling application will take care of the transportation of the packet between the rohc interfaces and the corresponding eth3 or eth4. Then the rohc interfaces have to be bridged to eth1 and eth2. The problem comes when trying to set up this bridge, for the ROHC tunnelling application creates the rohc virtual interface as a TUN interface [TUN/TAP]. These interfaces do not have the Ethernet address needed to create the bridge.

83

QoS Interface to enforc. MR Interface to interface #1 ROHC satellite link #1

eth1 eth3

Bridges rohctunnel

QoS Interface to enforc. MR Interface to interface #2 ROHC satellite link #2

eth2 eth4

Figure 33: Satellite terminal internal bridge positions

To solve this problem, it was necessary to change the ROHC tunnelling application. There is another type of interface that could be created other than TUN that has an Ethernet address, namely a TAP interface. While the rohctunnel application seemed to be ready for using it by just changing a flag on the source code, it wasn’t so straightforward. The problem came because when using the TAP interface [TUN/TAP], an Ethernet header is added to the packet. This header should not be present when the tunnelling application hands the packet to the ROHC compressor / decompressor application because the ROHC library expects to be given an IP packet and so, the fields positions would not match. The source code was modified to store temporarily the Ethernet header while the packet was being processed by the compressor / decompressor and then added back to the packet afterwards. Also note that ROHC packets are sent using an UDP socket application that will require an IPv4 address. This address will have to be added when configuring ROHC at the Satellite terminal and the Satellite Hub.

7.4.3 Policy routing using the original packet Policy routing consists on sending the packets through one or another link depending on some of the packet characteristics, for example the DiffServ field, the source and destination addresses or even the protocol. This is done to ensure that if there is more than one path, the traffic is correctly distributed depending on its needs. In this project is done through the use of DiffServ. The packet is tagged with a DSCP value and then QoS is enforced based on this value. This was identified as a source for an issue if the DSCP value was not copied after IPsec encapsulation. However, the implementations used are compliant with this behaviour, so this shouldn’t be a problem. There is another project that has a similar architecture but different needs. Based on some criteria that depend

84 on the original packet previous to IPsec encapsulation, the mobile router has to decide whether to send the packet through one link or another. This is an issue because the packet has already been encapsulated when it is received at the mobile router. Two solutions have been proposed. One is to mark the packet depending on its characteristics using the flow label. The other one is to integrate the IPsec gateway and the Mobile router together. Both solutions would benefit from the fact that the internal Netfilter mark added to a packet before the encapsulation is kept after encapsulation. The mark is not attached to the packet as a header or part of a header so it doesn’t leave kernel and it is not transmitted to the next node.

7.4.3.1 Flow label solution This solution implies creating categories differentiated by source address, destination address, DSCP, port number or any other field that is needed to differentiate two packets that have different requires. Then, these categories are associated with a flow label value. The IPsec gateway decides the category to which the packet belong before encapsulating it and assigns the correct flow label value to the encapsulated packet. The categories have and their relation with the flow label has to be known to both the IPsec gateway and the Mobile router. The problem with this approach is that it has a finite number of categories that can be established. Also, if the policies change, as it is the case on the project, the categories have to be updated both at the mobile router and the IPsec gateway. Since they change based on the information available on the mobile router (link status) there is need for signalling between the two nodes, which is an undesired effect.

7.4.3.2 IPsec gateway and Mobile router integration This solution involves integrating both the IPsec gateway and the Mobile router on the same node. The advantage should be clear: the mobile router has access to the packets before IPsec encapsulation. The disadvantages are that it is mixing two functionalities in one node which might not be entirely dependent on the same organisation and implementation issues. The mixing being a problem by itself is arguable. Therefore, I focused on whether this is a possible solution. The possible issues with the mixing that were identified are: Will the multiple care of address (MCoA) patch for the mobility software cause any trouble with the marking of the packet? Does IPsec happen before mobility? Can it be changed if not? According to [MCoA] the MCoA patch is already prepared to apply policy routing. The packets still have to be marked but then they don’t need to be redirected to one or another interface. It creates a new problem because it requires the symmetric configuration to be executed at the Home Agent so that replies get routed correctly. This is a problem both for this solution and the previous one. The Home Agent is placed between the IPsec gateways, meaning that it cannot execute policy routing based on the unencrypted packet. To solve it, the Home Agent would also have to integrate the IPsec functionality. This is acceptable only under the condition that the Home Agent is controlled by a trusted organisation. Both the IPsec and mobility implementations are applied as transformations after the POSTROUTING chain using xfrm. These transformations are applied in one order which depends on the priority of the 85 transformation. It was noted that the priority of the mobility was higher than that of the IPsec implementation, meaning that one or the other had to be changed. After examining the source code of both, it was found that changing the priority for the mobility implementation could be done quickly by just changing the values of some definitions. At the time of finishing this report this topic is still “work in progress” and the viability of the solution is still being checked.

86

8 Building the testbed

8.1 Virtualisation The testbed architecture contains 16 different nodes. If each would be implemented with a different server, then the testbed would become very expensive. This solution is also not very efficient, for there are nodes that barely need any resources. The current servers on the market exceed the combined needs of all the nodes. Therefore it makes sense to implement all the nodes in the same server. This can be done through virtualisation of the nodes and the network links. For the purpose of saving time, it was decided to build a master machine, a virtual machine that would be used as a base for all the nodes in the network. The virtualisation environment decided to build the testbed is XEN, a solid option and the most used in the other testbeds of the company. At the time of the implementation though, all the virtualisations in the company had moved from XEN to KVM, for it is a better solution and the transition doesn’t require changing the virtual machine image, only the configuration file.

8.2 The master machine The master machine was built using a long term support (LTS) distribution of Ubuntu Server. The LTS server distribution is supported for five years after the initial release, making it a viable option for the project. The latest LTS server version is Ubuntu 10.04 Server. The first step to set the machine up is installing the operating system. After doing so, the machine would be ready for cloning. Some additional features that will be used in more than just one node are installed to avoid having to repeat the installation many times on several machines. The first thing to install is the support for PMTUD. This option requires a kernel module that I coded to modify the Don’t Fragment field of the IPv4 header. To add this, it is required to recompile both the kernel and iptables. Once this process is completed, strongSwan is installed as it will be used by the 4 IPsec gateways. Finally, ROHC is installed.

8.3 Setting up and testing the testbed When the master machine was finished, it was cloned. Each clone was modified to match one of the nodes described in the testbed. The interfaces, host name, addresses and bridges are all configured on this step. Then, each machine is installed the additional software it needs and it is configured. Finally, a series of tests are carried out to verify that the testbed has been correctly implemented.

87

Test scenario Involved nodes Scenario ID Conditions Test method Expected outcome Link emulator Satellite terminal <-> Link-01 netem configured with exemplary delay and Ping from both ends of the Packet round-trip time delay Link emulator <-> delay variation parameters. link (from satellite terminal to obtained from ping statistics Satellite hub the hub). shall be within the configured range. Link emulator Satellite terminal <-> Link-02 netem configured with exemplary delay and Ping from both ends of the Some packets are displayed packet Link emulator <-> delay variation parameters. Ping packets are sent link (from satellite terminal to by the output of ping in the reordering Satellite hub with inter-arrival time less than twice netem’s the hub). wrong order. link delay variation. Link emulator Satellite terminal <-> Link-03 netem configured with exemplary packet loss Ping from both ends of the Packet loss percentage packet loss Link emulator <-> parameter. link (from satellite terminal to obtained from ping statistics Satellite hub the hub). shall be within the configured range. Link emulator Satellite terminal <-> Link-04 netem configured with exemplary packet loss Ping from both ends of the Some packets are lost in burst packet loss Link emulator <-> and packet loss correlation parameter. link (from satellite terminal to observed from ping output. correlation Satellite hub the hub). Link emulator Satellite terminal <-> Link-05 htb or tbf configured to limit the bitrate on the Iperf from both ends of the Iperf bandwidth statistics rate limitation Link emulator <-> link. Configure Iperf to send packets with bitrate link (from satellite terminal to shall be within the configured Satellite hub higher than the configured link capacity. the hub). netem limit. ROHC Satellite terminal <-> ROHC-01 Basic ROHC configuration for IP and IP/UDP Iperf using UDP packets and tcpdump at the link emulator Link emulator <-> compression, link emulator disabled. ping from both ends of the shows ROHC compressed Satellite hub link (from satellite terminal to packets. the hub). QoS Satellite terminal QoS-01 The satellite terminal’s interface to the satellite Packet delivered according to link configured with a strict priority scheduling. the priority order NEMO MR<->Satellite Mob-01 Mobile IPv6/NEMO software is started at MR and Observation of MR and HA MR registered at the HA registration terminal<->Link HA, ROHC enabled. All other components binding cache. emulator<->Satellite disabled (traffic prioritisation, link emulator). hub<->HA ROHC is needed due to its interaction with the NEMO software (the ROHC implementation uses a virtual network interface that is used by NEMO). NEMO security Mob-02 Same as Mob-01 tcpdump at the satellite hub Encrypted BU and BA after ROHC decompression. NEMO routing MR<->Satellite Mob-03 Same as Mob-01 Ping and tcpdump at the link Both request and reply terminal<->Link emulator. packets observed on one link.

88

emulator<->Satellite hub<->HA NEMO MR<->Satellite Mob-04 Same as Mob-01 - Turn off router handover terminal<->Link advertisements from emulator<->Satellite hub. hub<->HA - Bring down one of the NEMO network interfaces and bring up another. IPsec tunnel Between IPsec GW IPsec-01 NEMO and ROHC enabled, all other components IKE traces setup disabled. IPsec Between IPsec GW IPsec-02 Ping Encrypted/authenticated encryption and packet trace authentication Application Between end hosts App-01 NEMO and ROHC enabled, all other components Start the various applications, data transfer disabled. and attempt to perform data transfer accordingly (e.g. voice call for VoIP, file download for HTTP/FTP). Application App-02 Perform data transfer, Encrypted/authenticated with IPsec tcpdump at the link emulator application layer data Application App-03 IPsec, NEMO, ROHC, and link emulator enabled. Perform data transfer, Application session and IPsec with network tcpdump at the satellite hub, tunnel are not disrupted due mobility trigger handover to the handover. Table 6: Testbed test plan

89

9 Conclusions It is clear that some issues like overhead arise from the use of VPNs. Some others are present without having the VPN but could be solved/reduced, like the bandwidth delay product for TCP that can be mitigated using a PEP. The use of a VPN prevents in some cases the deployment of such solutions. All the solutions that have been found can be applied if the satellite operator controls (or can influence) all the elements of the path. If that is not the case, then some solutions cannot be used. Still, even when the solutions are applied, the effect is not always going back to the situation without VPN. For example with the overhead issue, the use of header compression reduces the overhead but it doesn’t eliminate it completely. The simulation on the testbed of these solutions should provide some insight on how well the solutions work. However, the testbed itself presents some limitations. The software that has been used is free software and sometimes it is not well maintained or it has bugs. Fortunately, because it is open-source software, all the big bugs or lack of capabilities could be patched.

90

Annex

91

1 Software exploration Before defining the test requirements I have tested the different software that will be used for simulating. Some of the solutions are already implemented in the testbed (like mobility support) and some others will be provided by IABG.

1.1 Iptables

1.1.1 Introduction Iptables is a free software used to filter packets (firewall) and also capable of doing NAT. It also allows us to implement policy routing. The command line for IPv4 is iptables and for IPv6 it is ip6tables.

1.1.2 Packet traversal through the Linux kernel There are different chains implemented for iptables. Depending on where the packet comes from and where it goes, it will go through one or another chain [iptables tutorial]. Prerouting: All the packets coming from another node go through this chain. Input: The traffic with destination the node. Output: The traffic generated by the node. Forward: Traffic coming from another node and destination different than the local node. That is, the traffic only passes through the machine. Postrouting: All traffic leaving the node. Iptables works with different routing tables. Filter: This is the default table if none is specified. It contains the INPUT, FORWARD and OUPUT chains. Mangle: This table is used when altering packets. From kernel 2.4.18 it contains the five previously described chains. Nat: This table is consulted when a packet creates a new connection. It contains only the PREROUTING, OUTPUT and POSTROUTING chains. This table is not implemented for ip6tables. Raw: This table is used mainly for configuring exemptions from connection tracking in combination with the NOTRACK target. Packets go through this table before any other. It contains the PREROUTING and OUTPUT chains. In Figure 34 the chains and tables that the packets go through, depending on where they are originated (yellow boxes) or where they go. A more detailed figure can be found in [Netfilter tables].

92

Figure 34: Iptables routing chains

93

1.1.3 Iptables matches and targets Iptables can be used to manipulate the packets. From the man page of ip6tables, we get the different ways to call the command: ip6tables [-t table] -[AD] chain rule-specification [options] ip6tables [-t table] -I chain [rulenum] rule-specification [options] ip6tables [-t table] -R chain rulenum rule-specification [options] ip6tables [-t table] -D chain rulenum [options] ip6tables [-t table] -[LFZ] [chain] [options] ip6tables [-t table] -N chain ip6tables [-t table] -X [chain] ip6tables [-t table] -P chain target [options] ip6tables [-t table] -E old-chain-name new-chain-name

So, we can add, delete, replace a rule in a specified table and chain. We can also flush a chain/table or list the current rules. Ip6tables also offers the possibility to add new chains. When adding a new rule, other than specifying where it goes (table & chain) we can configure what will trigger the rule and what will happen then. This allows us to differentiate the traffic as much as we want. The criteria to differentiate the traffic are called matches. Some of the matches are (commands with [!] allow for negation):

Match Command Example -p icmpv6 The protocol used. -p, --protocol [!] protocol (ICMPv6 packets) --source ! 2001:a:1::3 The source address. -s, --source [!] address[/mask] (all packets NOT coming from 2001:a:1::3) -d 2001:a:1::0/64 The destination address. -d, --destination [!] address[/mask] (all packets going to 2001:a:1::0/64) -i eth0 Receiving interface. -i, --in-interface [!] name (all packets coming to interface eth0) --out-interface eth1 Exiting interface. -o, --out-interface [!] name (all packets leaving the node through interface eth1)

More matches are available in the man page of iptables/ip6tables. After the matches have been specified, an action has to be taken. Be it ACCEPT, DROP or something more sophisticated. These are called targets. To do the action, -j has to be written before the target. Some examples:

Target Command Example DSCP --set-dscp val -j DSCP --set-dscp-class EF Change the DSCP field. (set the DSCP to Expedited Forwarding) DSCP --set-dscp-class class -j MARK --set-mark 100 Change the netfilter mark. MARK --set-mark mark[/mask] (sets the mark to 100) Classify the packets for -j CLASSIFY --set-class 20:10 using the Queue Disciplines CLASSIFY --set-class major:minor Now, let’s imagine a few fictional situations to create examples. The network for the examples is the following:

94

Figure 35: Fictional network for the iptables example

Network 1 belongs to an ISP. The ISP controls N33 and uses it as gateway to Network 2 and Situation Network 3. The ISP is selling a priority connection service to N11. Requirements All traffic coming from and going to N11 must have higher priority than that from N12. (executed in N33) # ip6tables –t mangle –A PREROUTING –s 2001:a:1::1 –j DSCP --set-dscp-class EF Commands # ip6tables –t mangle –A POSTROUTING –d 2001:a:1::1 –j DSCP --set-dscp-class EF # ip6tables –t mangle –A PREROUTING –s 2001:a:1::2 –j DSCP --set-dscp-class BE # ip6tables –t mangle –A POSTROUTING –d 2001:a:1::2 –j DSCP --set-dscp-class BE

N41 is a mobile router and it is connected to N31 and N32, having two different CoA. The link with N32 has a very high delay while that with N31 is ok. Situation The binding id (BID) of the interface eth1 is 100. The BID of the interface eth2 is 200. Different applications are used by the MNNs. VoIP traffic (UDP) should not suffer the high delay and so, it should always go through the Requirements access router N31. (executed in N41) Commands # ip6tables –t mangle –A PREROUTING –p udp –j MARK --set-mark 100

N42 wants to give VoIP traffic priority over all other traffic. Situation QoS is deactivated so there is no use for setting the DSCP field. Two queues disciplines have been created using TC with 100:1 having higher priority than 100:2. VoIP traffic (UDP) should go through the 100:1 queue. All other traffic should go to the Requirements other queue. (executed in N42) Commands # ip6tables –t mangle –A POSTROUTING –p udp –j CLASSIFY --set-class 100:1 # ip6tables –t mangle –A POSTROUTING –p ! udp –j CLASSIFY --set-class 100:2

95

N33 and networks 1 and 2 belong to a company. N33 is used as a gateway to the exterior. Situation Network 2 is private and only traffic coming from / going to network 1 is allowed. Drop all traffic that doesn’t come from or doesn’t go to network 1 and has destination Requirements network 2. (executed in N33) Commands # ip6tables –A FORWARD –s ! 2001:a:1::/64 –d 2001:a:2::/64 –j DROP # ip6tables –A FORWARD –d ! 2001:a:1::/64 –s 2001:a:2::/64 –j DROP

1.1.4 IPv4 testing The test consists on sending three ping requests from one user (192.168.10.69) to another (192.168.10.23) with different traffic priorities: default, EF and AF11. Then, using wireshark, the packets will be analysed to determine if the DSCP field changed. The commands to set the EF and AFxy (xy = 11, 12, 13, 21, 22, 23, 31, 32, 33, 41, 42 or 43) priorities are: # iptables -t mangle -A POSTROUTING -d 192.168.10.23 -j DSCP --set-dscp- class EF

# iptables -t mangle -A POSTROUTING -d 192.168.10.23 -j DSCP --set-dscp- class AFxy These commands are executed in the 192.168.10.69 machine. Also, the previous entry on the table should be erased before adding the new one. It can be either done targeting the entry or if no other filtering is done, flushing the chain: # iptables -t mangle -F POSTROUTING The wireshark captures are:

Figure 36: iptables test, default priority

96

Figure 37: iptables test, Expedited Forwarding priority

Figure 38: iptables test, Assured Forwarding 11 priority

As we can see in the figures, the DSCP value changed. Also, the default value is 0x00, that is, “best effort”.

1.1.5 IPv6 testing IPv6 traffic can also be marked with different priorities. The IPv6 header has a reserved field for this matter, the “Traffic class” or “Packet priority” field. It is also possible to change this priority using iptables. The test involves sending 3 packets from one host (fe80::219:99ff:fe94:be55) to another fe80::219:99ff:fe94:be77) using the same priorities as before. The commands change slightly: # ip6tables -t mangle -A POSTROUTING -d fe80::219:99ff:fe94:be77 -j DSCP -- set-dscp-class EF

# ip6tables -t mangle -A POSTROUTING -d fe80::219:99ff:fe94:be77 -j DSCP -- set-dscp-class AFxx

The traffic class field changes accordingly with the desired DSCP value. The value is stored in the first 6 bits out of the total 8 bits of the Traffic class field. The DSCP values are highlighted in the following figures: the EF value is 0x2e (10 1110) and the AF11 value is 0x0a (00 1010)

97

Figure 39: ip6tables test, Expedited Forwarding priority

Figure 40: ip6tables test, Assured Forwarding 11 priority

1.2 Iproute2 Iproute2 is a collection of tools for traffic management. The most important ones are IP and TC. They will be examined separately.

1.3 IP

1.3.1 Introduction The IP command provides different functions to control IPv4 and IPv6 configuration. The command accepts different options as shown in the manual pages. The most useful will be “-6”. When adding it to the command line, it will choose protocol IPv6. “-4” can be used for IPv4 but this is also the default, so there is no need to add it. After the options comes the aspect to configure: Command Description ip link Manages the network devices. ip addresses Manages the IP addresses attached to the network devices. ip route Manages the routing tables. ip rule Manage the rules that indicate what routing table should be looked at. ip tunnel Manages tunnels to encapsulate IP over IP. Table 7: IP commands

Now, let’s take a peek at each command. More detailed information can be found on the manual pages.

1.3.2 IP link This command is useful for showing all the devices (ip link show) or for setting up/down one device (ip 98 link set device up|down).

1.3.3 IP addresses Using IP addresses we can check or modify the IP addresses of the configured devices.

1.3.4 IP route IP route allows us to modify the settings in the different routing tables. The tables created by default are main (ID 254), default (ID 253) and local (ID 255). To show the contents of a table, use: ip route show table tableID (if no ID is specified, main is shown). The tables can be found in the file /etc/iproute2/rt_tables. So, to add a new table table1 (ID 100): # echo 100 table1 > /etc/iproute2/rt_tables New routes can be added to the tables. For example, if we want to send all packets that come with TOS 0x8 (maximise throughput) through 192.24.1.1 and the rest through 192.24.1.2 (and both through eth0): ip route add tos 0x8 table 100 metric 256 via 192.24.1.1 dev eth0 ip route add table 100 metric 1 via 192.24.1.2 dev eth0 The metric value indicates, in case of perfect match of more than one entry in the table, which one has higher priority (bigger metric, higher priority).

1.3.5 IP rule With IP rule we can decide the routing table that applies to different traffic. We can show the rules using: ip rule show. The default rules are: 0: from all lookup 255 32766: from all lookup main 32767: from all lookup default New rules should be added between priority 0 and priority 32766. For example, if we want to make all traffic coming from 192.24.1.0/24 use a new table called table1, mark as unreachable all traffic going to 172.1.3.0/24 and make maximum throughput TOS marked traffic check table1 first, then: ip rule add from 192.24.1.0 prio 100 table table1 ip rule add to 172.1.3.0 unreachable prio 200 ip rule add tos 0x8 prio 300 table table1 After adding the rules, the table looks like: 0: from all lookup 255 100: from 192.24.1.0 lookup table1 200: from all to 172.1.3.0 lookup main unreachable 300: from all tos throughput lookup table1 32766: from all lookup main 32767: from all lookup default There is no 172.1.3.0 in my network. Therefore when I make a ping, it times out. However, after adding the rule, the following message is displayed: connect: network is unreachable Note that the rule list can be flushed. When done, all the tables except the local (prio 0) will be deleted. To delete a single entry use del option instead of add and choose the rule using prio xxx. 99

Let’s delete rule 300: ip rule del prio 300

1.3.6 IP tunnel IP tunnel is used to create a tunnel between two hosts, encapsulating the IP packets inside another IP packet. The tunnel consists of three addresses. The remote address and local addresses are the addresses of the end points of the tunnel. The packets entering the tunnel will travel from the local address to the end address. These addresses correspond to the outer header of the tunnelled packet. Then, the address of the tunnel is an address assigned to the virtual interface of the tunnel. If we want to create different kinds of tunnels, then the ip command requires a different mode. Also, the address assigned to the tunnel will be of a different protocol. Remember that if the command line includes an IPv6 address, “-6” has to be added after “ip”: ip tunnel add TUNNELNAME mode MODE remote REMOTE local LOCAL ip link set TUNNELNAME up ip addr add TUNNELADDR dev TUNNELNAME

Tunnel type MODE REMOTE LOCAL TUNNELADDR IPv4 in IPv4 ipip IPv4 IPv4 IPv4 IPv4 in IPv6 ip4ip6 IPv6 IPv6 IPv4 IPv6 in IPv6 Ip6ip6 IPv6 IPv6 IPv6 Table 8: IP tunnel types

1.3.7 Example We want treat the previously explained example in iptables where we wanted to isolate network 2 from anything but network 1 using ip. First, we will add a table for each network called table1 and table2. # echo 100 table1 > /etc/iproute2/rt_tables # echo 200 table2 > /etc/iproute2/rt_tables Now, let’s make the rules to make that all traffic related to network 2 is handled by this table (note that it is IPv6 traffic): # ip -6 rule add from 2001:a:1::/64 prio 1000 table 100 # ip -6 rule add from 2001:a:2::/64 prio 1001 table 200 Now, we block all traffic going to network 2. It should be unreachable. # ip -6 route add unreachable 2001:a:2::/64 metric 256 table main However, traffic coming from network1 must be routed correctly and allowed to go to network2 (through the device eth2). Note that if a packet is sent somewhere else, this table will be checked first and because there will be no match, the main table will be the next table to look for a match. # ip -6 route add to 2001:a:2::/64 metric 256 dev eth2 table table1 Finally, only packets going to network 1 should be routed when coming from network 2. # ip -6 route add to 2001:a:1::/62 metric 256 dev eth1 table table2 # ip -6 route add unreachable default metric 1 table table2

100

1.3.8 Example with iptables For these examples, the fictional network used in the iptables section will be used (Figure 35). Let’s imagine that there are two IPsec ESP tunnels between N33 and N41. Each tunnel should go through one different “access router” N31 and N32. The tunnels consist in two SA, going each way. The tunnel going through N31 SPIs are: 333141 (from 33 to 41) & 413133 (from 41 to 33). The tunnel going through N32 SPIs are: 333241 (from 33 to 41) & 413233 (from 41 to 33). First of all, we have to mark the packets to differentiate them. (executed in N33) # ip6tables –A OUTPUT –p esp –m esp --espspi 333141 –j MARK --set-mark 31 # ip6tables –A OUTPUT –p esp –m esp --espspi 333241 –j MARK --set-mark 32 (executed in N41) # ip6tables –A OUTPUT –p esp –m esp --espspi 413133 –j MARK --set-mark 31 # ip6tables –A OUTPUT –p esp –m esp --espspi 413233 –j MARK --set-mark 32 Now that the packets are marked, we can route them the right way with ip route. We create two new tables tunnel131 and tunnel132 (in both N33 and N41): # echo 131 tunnel31 > /etc/iproute2/rt_tables # echo 132 tunnel32 > /etc/iproute2/rt_tables Next, the rules are added (in both N33 and N41): # ip -6 rule add fwmark 31 prio 1000 table 131 # ip -6 rule add fwmark 32 prio 1000 table 132 This forces the packets of the different tunnels to check different tables. Finally, in N33 the gateway (N31 or N32) has to be specified and in N41, the device (or link) that will be used. (executed in N33) # ip -6 route add via 2001:a:3::1 metric 256 dev eth1 table table131 # ip -6 route add via 2001:a:3::2 metric 256 dev eth1 table table132 (executed in N41) # ip -6 route add default metric 256 dev eth1 table table131 # ip -6 route add default metric 256 dev eth2 table table132

1.4 TC

1.4.1 Introduction Traffic Control (TC) is a useful tool that can be used for traffic shaping and control. It provides the means to distribute the bandwidth to the different flows as well as to limiting it. Also, we can use it to simulate high RTT / lossy channels. A guide can be found in [LARTC], chapter 9.

1.4.2 Packet tagging The DS field value (the so-called DiffServ Code Point, DSCP) will determine the Per-Hop Behaviour (PHB) group that the packet is assigned to, that is, the treatment that the packet will have when traversing a router. The packets classification can be done using iptables using the criteria as listed in the following table:

Criteria iptables syntax Description

Source -s or --source [!] address[/mask] It can be either a source host or network address. Destination -d or --destination [!] It can be either a destination host or network

101

address[/mask] address. Receiving -i, or --in-interface [!] name The interface from where the packet is interface received.

Sending -o, or --out-interface [!] name The interface to where the packet is routed interface when leaving the host.

Protocol p, or --protocol [!] protocol The protocol of the packet. It can be either tcp, udp, icmp, all or the numeric value that corresponds to the desired protocol.

DSCP field -m dscp --dscp value The DSCP value / class. -m dscp --dscp-class DiffServClass ESP SPI --espspi [!] spi[:spi] The ESP Security Parameters Index.

Table 9: Packet tagging using iptables

The exclamation mark (!) denotes the negation of the criterion. The square-brackets ([]) denotes optional arguments. Additional criteria and information can be found on the iptables man page. Based on the above packet classification, the packets can then be tagged using the following options: Option iptables syntax Description

DSCP value -j DSCP --set-dscp value Changed the DSCP value of the packet to the specified value.

DSCP class -j DSCP --set-dscp-class class Changes the DSCP value of the packet to match that of one of the defined DSCP classes.

Table 10: Netfilter DSCP target iptables rules are stored in different tables and chains that are called at different times along the packet’s travel inside the kernel protocol stack. The packets have to be tagged at the “mangle” table. The chain will depend on the classifying criteria (see the manual page). The iptables command will then be: iptables –t mangle –A CHAIN criteria –j DSCP --set-dscp value iptables –t mangle –A CHAIN criteria –j DSCP --set-dscp-class class If tagging is done for IPv6 packets, ip6tables has to be used instead of iptables. For example, if all IPv6 traffic coming to interface eth0 has to be tagged as BE, and that entering eth1 has to be tagged with a DSCP value of 0x1a then: ip6tables –t mangle –A PREROUTING –i eth0 -j DSCP --set-dscp-class BE ip6tables –t mangle –A PREROUTING –i eth1 -j DSCP --set-dscp 0x1a More than one criterion can be used. If all forwarded TCP traffic from 192.168.10.23 has to be classified as EF then: iptables –t mangle –A FORWARD –p tcp –s 192.168.10.23 -j DSCP --set-dscp-class EF To verify that the entries have been entered correctly, the following commands are used to list all configured iptables/ip6tables rules:

102 iptables –t mangle –L –v ip6tables –t mangle –L –v All of the entries can be easily deleted by typing the same command to add them but replacing the -A option with -D.

1.4.3 PHB definition

The Per Hop Behaviour can be controlled in Linux using different queueing disciplines. Queueing disciplines determine the apply rate control and scheduling to the traffic. When a packet is sent to an interface, the kernel queues it to the queueing discipline configured for that interface. Immediately afterwards, the kernel tries to get as many packets as possible from the queue to give them to the network adaptor driver so they can be delivered. There are two types of queueing disciplines: classless and classful. Classless queueing disciplines have no configurable internal subdivisions while classful queueing disciplines might contain multiple queueing disciplines, called classes. Additional classes or queueing disciplines can be attached to the classes and so they are called children. The original queueing discipline is in turn, the parent of the children. The queueing disciplines can also be classified depending on their role. Some are shapers. They modify the traffic shape, for example, delaying or dropping packets if the maximum transmission rate is achieved. Some others are schedulers. They reorder packets, that is, they decide what packets are dequeued earlier than others. Some queueing disciplines are described in the following sections. For more information on them or other queueing disciplines, check the tc man page. The current qdiscs in action can be checked using the command: # tc qdisc show

1.4.4 Queueing discipline family The queueing disciplines can be chained in a tree-like mode. The branches of the tree keep growing as we add more classful queueing disciplines and classes. Note that classless queueing discipline cannot have more queueing disciplines attached, so when used, the branch cannot grow any more. For example:

103

Figure 41: Queueing discipline family example

As it can be seen in the figure, after a classful queueing discipline one (green) or more (orange and purple) classes can be attached. Also, a class might have additional classes (green) or queueuing disciplines (blue, yellow and pink) attached to it. Note that once a classless queueing discipline is added, the branch stops to grow. The classes also implement some kind of queueing discipline. The effect depends on what classful queueing discipline they depend on. The base of the tree is the root and it is there where the packets are first sent to. The root might queue the packet into a child class, which in turn might queue the packet in one of its children. When dequeuing, the kernel makes a request to the root for packets. This one in turn, will ask its children and these will do exactly the same with theirs. The packets are dequeued through all intermediary queues. This means that the rules that apply to the parents, also apply to their children. For example, a packet queued into the classless queueing discipline attached to the orange class would be dequeued through itself, orange, “two”, yellow, green and “one”. Parents and children have a set of two numbers that identify them. The major and the minor that are written major : minor. The root is usually named 1: (which is equivalent to 1:0). The children classes will then be 1:1, 1:2, etc.

104

All queueing disciplines have a unique major that is shared with its children classes. The minors have to be unique inside a queueing discipline and its classes. Following the previous example, a possible numeration could be:

Figure 42: Queueing disciplines and classes numeration

1.4.5 Creating the queueing disciplines and classes The command to create, replace or delete a queueing discipline is: tc qdisc [ add | change | replace | del ] dev device [ parent qdisc-id | root ] [ handle qdisc-id ] qdisc [ qdisc specific parameters ] To create, replace or delete a class: tc class [ add | change | replace | del ] dev device parent qdisc-id [ classid qdisc-id ] qdisc [ qdisc specific parameters ] The parameters are: Parameter Effect

Add or change or replace or del Specifies if the queueing discipline or class is added (add), deleted (remove) or replaced (change or replace). If change is used, neither the parent nor the handle/classid parameters can be changed.

105 dev device The network interface to which the queueing discipline or class is attached. parent qdisc-id or root If the queueing discipline is on top of the tree, use root. If not, qdisc-id is the identification number of the queueing discipline or class to which it is attached. handle qdisc-id The identification number of the queueing discipline according to the tree structure. classid qdisc-id The identification number of the class according to the tree structure. qdisc [ qdisc specific Specifies which type of queueing discipline is applied. parameters ]

Figure 43: qdisc/class parameters

To verify that the queueing disciplines and classes were created correcty use the following commands: tc –d qdisc show dev device tc –d class show dev device It is also possible to see the statistics, like the packets that have gone through each queueing discipline or class by changing the –d option for –s. When specifying the queueing discipline specific parameters it is possible to use units to express numbers. Bandwidths or rates can be specified with the following units: Unit tc syntax

Kilobytes per second kbps Megabytes per second mbps Kilobits per second kbit Megabits per second mbit Bytes per second bps or bare number. Figure 44: TC bandwidth units syntax

Amounts of data: Unit tc syntax

Kilobytes kb or k Megabytes mb or m Kilobits kbit Megabits mbit Bytes b or bare number. Figure 45: TC data size units syntax

106

And time durations: Unit tc syntax

Seconds s, sec or secs Milliseconds ms, msec or msecs Microseconds us, usec, usecs or bare number Figure 46: TC time units syntax

1.4.6 Packet distribution into the queues When the packets are queued, each queueing discipline or class looks at its children to decide at which one it will be queued at. If there is more than one, the criteria depend on the classful queueing discipline chosen. However, the packets can also be assigned directly into one part of the tree using the tc filter or iptables commands. It is possible to decide where the packet is queued depending on some packet characteristics like the protocol, address or the DS field. The syntax of iptables has already been presented so it will be the choice for distributing the packets into the queues. To classify a packet into a queue, use the following command: iptables -t mangle -A POSTROUTING criteria -j CLASSIFY --set-class major:minor The criteria syntax can be found on Error! Reference source not found.. The major and minor parameters correspond to classid of the class (and not a standard queueing discipline) that the packet is assigned to. When the packet is assigned a class, the class checks its children to queue it. These will in turn, check their children and so on until the packet is queued at the end of a branch. Following the previous example, if the packet is classified into 1:10 it will be queued at its children (10:0). The ip6tables command can be used instead of iptables for IPv6 packets.

1.4.7 (p or b) fifo Fifo is a classless queueing discipline that sends all data out in the order it arrives. Using pfifo the data is handled in packets. Using bfifo, it is done in bytes. Parameter Effect

limit limit Maximum number of bytes (bfifo) or packets (pfifo) in the queue. Table 11: pfifo / bfifo qdisc parameters

For example, to add a bfifo with limit 5 kilobytes at the root of eth0:. tc qdisc add dev eth0 root handle 1:0 bfifo limit 5kb

1.4.8 Token Bucket Filter (tbf) This classless queueing discipline provides with an easy way to limit the rate at which the packets are sent, allowing for short bursts. It shapes the traffic. TBF has a buffer for the outgoing packets and a “bucket” where packets are “thrown in”. Both are limited in

107 size and the tokens arrive at a constant rate. Each packet leaving the buffer requires consuming a token from the bucket, so the transmission rate is limited to the token arrival rate, which is configurable. However, packets may vary in size. That is why current implementations of TBF relate tokens to bytes and not tokens to packets. As of kernel 2.6.1 this class was changed to classful and it has a single band. More classes and queueing disciplines can be attached to it. Some configurable parameters are: Parameter Effect limit bytes or latency ms Limit is the maximum number of bytes waiting in the buffer before a token arrives. Latency is how long a packet can wait before there are enough tokens for it to leave. They are mutually exclusive. burst bytes The size of the bucket. No more than this amount of bytes can be sent at once. The keywords buffer and maxburst can also be used. This value should be at minimum, rate / HZ. The HZ constant since 2.6.13 kernel is set to 250 by default. mpu bytes The minimum packet unit determines the minimum amount of tokens that a packet must use to be sent. Default is zero. rate rate The token rate and also the maximum throughput rate (without bursts). Table 12: tbf qdisc parameters

For example, to add a token bucket filter that limits the rate at 6 megabits per second and a latency of 30 milliseconds at the root of eth0: tc qdisc add dev eth0 root handle 1:0 tbf latency 30ms rate 6mbit

1.4.9 Stochastic Fairness Queuing (sfq) SFQ is a classless queueing discipline that tries to give the same amount of resources to all the flows entering the queue. Therefore, it is a scheduler. The distribution of resources is done by creating a large amount of FIFO queues and dividing the traffic using a hashing algorithm. Because of the hashing, multiple sessions might end up in the same bucket, halving each other’s speed. To prevent this effect from being noticeable, it is recommended that the hashing is renewed periodically (recommended 10 sec). SFQ has two different options: Parameter Effect

Perturb seconds The period of reconfiguration of the hashing. When unspecified, the value is 0 (no reconfiguration). Quantum bytes Amount of bytes a stream is allowed to dequeue before the next gets a turn. Default to 1 MTU sized packet. Shouldn’t be set below the MTU sized packet or else there is the risk of blocking the traffic. Table 13: sfq qdisc parameters 108

Note that this queue has a limit of 128 packets. This value is hardcoded and it cannot be changed with any option.

1.4.10 PRIO This classful queueing discipline has different bands, each containing one class that by default is a fifo queueing discipline. The packets in the first band are always sent first. If there are no packets in the first band, then the kernel will look for packets in the second band and so on. It is possible to substitute the fifo classes for other queueing disciplines. Also, queueing disciplines can be attached after each band. The packets are distributed into the different queues using a map of priorities that relates the DTR fields in the TOS field and the band to which the packet is assigned [LARTC]. Parameter Effect

Bands The number of bands to create. The value by default is 3. Priomap The mapping between the DTR fields and the bands used. Table 14: PRIO qdisc parameters

Note that if you don’t use the default number of bands, the Priomap must be specified. The packets can also be distributed into the bands using other means like iptables or tc filter. This option is recommended due to the DTR fields having been deprecated in favour of the Differentiated Services field [DSCP].

1.4.11 Hierarchical Token Bucket (htb) HTB is a classful token bucket filter. Unlike PRIO, it is not necessary to define the bands when the queueing discipline is created, they can be added later on using tc class. Like TBF, it provides shaping capabilities. HTB can be configured with tc qdisc and with tc class if it creates or replaces a class. The parameters differ depending on the command. For tc qdisc: Parameter Effect default minor-id The class below HTB that the packets are sent to by default Table 15: htb qdisc parameters

For tc class: Parameter Effect prio priority In the round-robin process, classes with the lowest priority field are tried for packets first. rate rate Maximum rate this class and all its children are guaranteed. Mandatory. ceil rate Maximum rate at which a class can send, if its parent has bandwidth to spare. Defaults to the configured rate, which implies that no bandwidth is borrowed. burst bytes Amount of bytes that can be burst at ceil speed, in excess of the configured rate. Should be at least as high as the highest burst of all children. 109 cburst bytes Amount of bytes that can be burst at 'infinite' speed, in other words, as fast as the interface can transmit them. For perfect evening out, should be equal to at most one average packet. Should be at least as high as the highest cburst of all children. Table 16: htb class parameters

1.4.12 Netem Netem is a classful queueing discipline that can be used to emulate different channel conditions like delay, loss, packet reordering and duplication and bit errors (packet corruption).. When the Netem queueing discipline is created, it has one band with a single class. Some configurable conditions are (optional parameters are within brackets): Condition Syntax Description

Delay delay mean [sigma] Adds delay to the communication. If no [correlation] distribution is specified, then the delay is uniformly distributed between mean-sigma and mean+sigma. correlation makes the delay depend on the previous value. It is given as a percentage. Delay distribution distribution The delay distribution. It can be either normal, pareto or paretonormal. Loss loss value [correlation] value is the probability of a packet being randomly dropped. correlation makes the loss depend on the previous value. Both parameters are given as a percentage. Duplication duplication value value is the probability of a packet being [correlation] duplicated. correlation makes the duplication probability depend on the previous value. Both parameters are given as a percentage. Bit Error corruption value value is the probability of a bit being erroneous. Reordering reorder value value is the probability of a packet being [correlation] reordered. correlation makes the reordering probability depend on the previous value. Both parameters are given as a percentage. Table 17: netem qdisc parameters

An alternate way to implement reordering is by adjusting netem’s delay and delay variance/standard deviation.

110

1.5 IP fragmentation

1.5.1 Software There are two solutions to be tested for IP fragmentation: PMTUD and PLPMTUD. Both are already implemented in Linux.

1.5.2 PMTUD installation PMTUD is enabled by default in Linux. To disable it, use the following command (as root): # echo 1 > /proc/sys/net//ip_no_pmtu_disc To enable it again, use: # echo 0 > /proc/sys/net/ipv4/ip_no_pmtu_disc However, these changes are not permanent and the default option will be set back when rebooting. To make the change permanent, add one of the following lines to the /etc/sysctl.conf file (0 to enable PMTUD, 1 to disable PMTUD): net.ipv4.ip_no_pmtu_disc = 0 or net.ipv4.ip_no_pmtu_disc = 1

1.5.3 PLPMTUD installation Linux has an implementation of PLPMTUD that is disabled by default. It uses the TCP protocol to perform the “packetisation”. PLPMTUD is controlled in Linux by the tcp_mtu_probing flag. This flag can take three values. 0 disables PLPMTUD (default), 1 enables PLMTUD only when a TCP black hole is detected i.e. there is a router that doesn't send back/blocks the ICMP message signaling the packet being too big. Setting this flag to 2 will enable PLPMTUD. These values can be set using one of the following commands: # echo 2 > /proc/sys/net/ipv4/tcp_mtu_probing # echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing # echo 0 > /proc/sys/net/ipv4/tcp_mtu_probing Just like with PMTUD, any change in this flag is not permanent. However, /etc/sysctl.conf can be modified to make the change permanent by adding one of the following lines: net.ipv4.tcp_mtu_probing = 2 net.ipv4.tcp_mtu_probing = 1 net.ipv4.tcp_mtu_probing = 0

1.6 Header compression (ROHC)

1.6.1 Software The solution for the overhead problem is to implement header compression. The ROHC algorithm has been chosen for this task. The ROHC library is available online for free at [ROHC website].

1.6.2 Installation The ROHC installation guide can be found in [ROHC installation].

111

1.6.3 Testing I have tested the ROHC libraries using the testbed. I have created a ROHC tunnel between two computers, main and test1. The tools used for analysing the traffic are Wireshark and the ROHC traces on the terminal. Since the IP header is compressed, a workaround has to be found for being able to send the packets. While in reality they would be sent using a lower layer header (just one hop), for this test an UDP tunnel is then created to route the packets. The encapsulation adds uncompressed UDP and IP headers that wouldn’t be present in a real case. The test has been adapted from the ROHC website [ROHC example].The example consists in sending 6 ping requests between two nodes. The requests are sent from 172.24.0.1 (main) to 172.24.0.2 (test1). The uncompressed packet is formed by an IPv4 header (20 bytes), ICMP header (8 bytes) and ICMP payload (56 bytes). Therefore, the total size of the uncompressed packets is 84 bytes. The ICMP header and payload are not modified by the compressor. According to the ROHC traces, the first request sent is 85 bytes long, 21 bytes for the header and 64 for the ICMP header and payload. The first packet is a bit longer, but later packets have a smaller size when the compressor advances in state. The first reply is even longer, for there are 4 bytes of feedback information and 1 byte indicating the feedback size. The formulas seen in the solution section are used to calculate efficiency and relative gain. They are slightly modified to include the feedback as part of the charge caused by the compressed header:

Ping request ROHC header ICMP header + Total size Efficiency Relative Gain payload (including feedback) (including feedback) (including feedback) 21 64 85 - 5.00 % 0.015 21 64 90 - 30.00 % 0.012 21 64 85 - 5.00 % 0.015 11 64 75 45.00 % 0.028 11 64 75 45.00 % 0.028 11 64 75 45.00 % 0.028 Table 18: ROHC example (packet sizes)

Ping reply ROHC header ICMP header + Total size Efficiency Relative Gain payload (including feedback) (including feedback) (including feedback)

112

21 64 90 - 30.00 % 0.012 21 64 85 - 5.00 % 0.015 21 64 85 - 5.00 % 0.015 1 64 65 95.00 % 0.313 1 64 65 95.00 % 0.313 1 64 65 95.00 % 0.313 Table 19: ROHC example (packet sizes)

Because the ROHC compressed packet is encapsulated in an UDP tunnel, the UDP payload size should correspond to the size of the ROHC compressed packet. It is, according to Wireshark, 2 bytes bigger than the size given by the ROHC traces. This is because these two bytes do not belong to the ROHC packet but rather to the tunnel encapsulation. They are used to store the tunnel sequences number that in turn is used to trace when packets are lost [ROHC UDP tunnel].

Figure 47: Wireshark capture of the ROHC example ping packets. Highlighted is the size of the first ping request plus the two encapsulation bytes (85 + 2 = 87bytes).

1.7 Mobility

1.7.1 Software I have tested the mobility software using the Newsky testbed. The testbed consists in three machines that simulate a mobility network, each machine simulating different nodes using virtual machines. The configuration is shown in Figure 48.

1.7.2 Tests Two tests have been carried out. In the first test, I’ve done a ping from the IPsec GW in the Ubuntu2 (equivalent to pinging from the MNN) to the IPsec GW in the zuse machine (equivalent to pinging the

113 corresponding node). For the second test I have done the ping the other way around.

1.7.2.1 Ping from MNN to CN root@ubuntu2:~# ping6 2001:a:1::3 -c 1 -s 1000 PING 2001:a:1::3(2001:a:1::3) 1000 data bytes 1008 bytes from 2001:a:1::3: icmp_seq=1 ttl=62 time=4.62 ms

--- 2001:a:1::3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 4.627/4.627/4.627/0.000 ms

MNN traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 11:08:01.226241 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo request, seq 1, length 1008 11:08:01.230843 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo reply, seq 1, length 1008

MR traces: listening on eth3, link-type EN10MB (Ethernet), capture size 96 bytes 09:57:48.553163 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo request, seq 1, length 1008 09:57:48.556631 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo reply, seq 1, length 1008

AR1 traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 09:57:48.954637 IP6 2001:3::216:3eff:fe25:66f4 > 2001:5c0:1505:6100::1: IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: [|icmp6] 09:57:48.957066 IP6 2001:5c0:1505:6100::1 > 2001:3::216:3eff:fe25:66f4: IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: [|icmp6]

HA traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 11:57:48.291783 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo request, seq 1, length 1008 11:57:48.292134 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo reply, seq 1, length 1008

114

CN traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 10:50:02.150011 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo request, seq 1, length 1008 10:50:02.150060 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo reply, seq 1, length 1008

1.7.2.2 Ping from CN to MNN root@zuse:~# ping6 2001:5c0:1505:6101::3 -c 1 -s 1000 PING 2001:5c0:1505:6101::3(2001:5c0:1505:6101::3) 1000 data bytes 1008 bytes from 2001:5c0:1505:6101::3: icmp_seq=1 ttl=62 time=7.09 ms

--- 2001:5c0:1505:6101::3 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 7.092/7.092/7.092/0.000 ms

MNN traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 11:40:32.961336 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo request, seq 1, length 1008 11:40:32.961453 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo reply, seq 1, length 1008

MR traces: listening on eth3, link-type EN10MB (Ethernet), capture size 96 bytes 10:30:20.276114 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo request, seq 1, length 1008 10:30:20.277694 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo reply, seq 1, length 1008

AR1 traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 10:30:20.726583 IP6 2001:5c0:1505:6100::1 > 2001:3::216:3eff:fe25:66f4: IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: [|icmp6] 10:30:20.727160 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo request, seq 1, length 1008 10:30:20.730538 IP6 2001:3::216:3eff:fe25:66f4 > 2001:5c0:1505:6100::1: IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: [|icmp6]

HA traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 12:30:19.963927 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo request, seq 1, length 1008 12:30:19.964176 IP6 2001:5c0:1505:6100::1 > 2001:3::216:3eff:fe25:66f4: [|ip6] 12:30:19.965804 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo request, seq 1, length 1008 12:30:19.966674 IP6 2001:3::216:3eff:fe25:66f4 > 2001:5c0:1505:6100::1: [|ip6] 12:30:19.970292 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo reply, seq 1, length 1008

CN traces: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 11:22:33.797871 IP6 2001:a:1::3 > 2001:5c0:1505:6101::3: ICMP6, echo request, seq 1, length 1008 11:22:33.804938 IP6 2001:5c0:1505:6101::3 > 2001:a:1::3: ICMP6, echo reply, seq 1, length 1008

115

Figure 48: Newsky testbed machine and node configuration

116

Figure 49: First mobility test

Green line: Echo request from MNN to CN (executed at the IPsec gateways). Blue line: Echo request reply The numbers correspond to different links. The addresses in the headers of these packets can be found in the following pages.

117

Figure 50: Second mobility test

Green line: Echo request from CN to MNN (executed at the IPsec gateways). Blue line: Echo request reply The numbers correspond to different links. The addresses in the headers of these packets can be found in the following pages.

118

It can be seen in Figure 49 and Figure 50 that the packets go through the Home agent before being delivered to its destination. The following tables contain the source and destination addresses of the packets. Link Type Source address Destination address 1 Echo request 2001:5c0:1505:6101::3 2001:a:1::3 2 Echo request 2001:3::216:3eff:fe25:66f4 2001:5c0:1505:6100::1 3 Echo request 2001:3::216:3eff:fe25:66f4 2001:5c0:1505:6100::1 4 Echo request 2001:5c0:1505:6101::3 2001:a:1::3 5 Echo reply 2001:a:1::3 2001:5c0:1505:6101::3 6 Echo reply 2001:5c0:1505:6100::1 2001:3::216:3eff:fe25:66f4 7 Echo reply 2001:5c0:1505:6100::1 2001:3::216:3eff:fe25:66f4 8 Echo reply 2001:a:1::3 2001:5c0:1505:6101::3 Table 20: IP addresses of the packets in mobility test 1

Link Type Source address Destination address 1 Echo request 2001:a:1::3 2001:5c0:1505:6101::3 2 Echo request 2001:5c0:1505:6100::1 2001:3::216:3eff:fe25:66f4 3 Echo request 2001:5c0:1505:6100::1 2001:3::216:3eff:fe25:66f4 4 Echo request 2001:a:1::3 2001:5c0:1505:6101::3 5 Echo reply 2001:5c0:1505:6101::3 2001:a:1::3 6 Echo reply 2001:3::216:3eff:fe25:66f4 2001:5c0:1505:6100::1 7 Echo reply 2001:3::216:3eff:fe25:66f4 2001:5c0:1505:6100::1 8 Echo reply 2001:5c0:1505:6101::3 2001:a:1::3 Table 21: IP addresses of the packets in mobility test 2

Note that the packets in 2, 3, 6 and 7 go through a mobile IP tunnel and the addresses are different from the originals.

1.7.3 Mobile IP and DSCP To avoid the QoS issue, the packets must maintain the DSCP value after encapsulation. This feature is disabled by default in the mobility software. A patch is applied to enable it. This patch has been tested and it does indeed, copy the DSCP values of the original packet into the Mobile IP encapsulated packet. The pcap files of the test can be found in My Work -> Images and Captures -> mobileIP

1.7.4 Mobile IP and handovers To verify that handovers could be simulated, the Mobile IP software has been tested. The mobile router was connected to a virtual machine that represented the access routers using different devices.

119

Figure 51: Mobile IP network

For the test to be successful, when the link between the eth1 devices is down (Satellite Network 2001:3::/64 in the figure), the packets have to be transmitted through eth2 and vice versa. If the working link is brought down and the other one up then the flow should switch automatically from one to the other. During the test it was noticed that once the link were the traffic was flowing was brought down, the flow didn’t resume until the same link was brought up again. The reason was that the Mobile IPv6 (MIP6) daemon creates a default entry in the route table to send the packets through a working link but it doesn’t delete it after the link is no longer working. Because the route entry had the same priority as the other default entries created by the MIP6 daemon and it came first in the routing table, it was always used. The software has been corrected adding a code to delete the route entry when the connection to the access router is lost.

1.8 IPsec (strongSwan)

1.8.1 Introduction The IPsec implementation for linux that will be used in the project is strongSwan. This free software replaces the old openSwan and includes IKEv2 support. Also, it includes a User-Mode-Linux (UML) testing environment formed by 8 virtual machines that can be installed in your system. It comes with a large number of premade tests to verify strongSwan’s functionality.

1.8.2 Installation StrongSwan can be downloaded (along with the UML testing environment) from their website [strongSwan].

120

1.8.3 Newsky testbed

Figure 52: Test configuration for Strongswan

The two IPsec gateways, main and test1, are added to the Newsky testbed to test IPsec. All nodes have a device eth0 connected to a network that allows us to control them through secure shell (ssh).

1.8.4 Configuration The main configuration file is /etc/ipsec.conf. This file contains the different encryption and authentication options as well as the parameters that indicate the IPsec connection properties. For testing we’ve added no encryption (plain text) and no authentication. These features will be tested later on.

1.8.4.1 Main ipsec.conf # ipsec.conf - strongSwan IPsec configuration file

# basic configuration config setup # plutodebug=all #crlcheckinterval=180 #strictcrlpolicy=no # cachecrls=yes # nat_traversal=yes #charonstart=yes plutostart=no

Charon (IKEv2) and Pluto (IKEv1) are set to yes by default. If we want to use IKEv2, we have to disable Pluto. # Add connections here. conn %default ikelifetime=60m keylife=20m rekeymargin=3m keyingtries=1 keyexchange=ikev2 mobike=no authby=secret esp=null-null

# Uncomment for IPv6 in IPv?

121

#conn net-net # also=host-host # leftsubnet=2001:3::/64 # rightsubnet=2001:2::/64

# Uncomment for IPv4 in IPv? conn net-net also=host-host leftsubnet=192.168.3.0/24 rightsubnet=192.168.2.0/24

# Uncomment for IPv? in IPv6 conn host-host left=fec0::1 right=fec0::2 auto=start

# Uncomment for IPv? in IPv4 #conn host-host # left=10.0.0.1 # right=10.0.0.2 # auto=start

1.8.4.2 Test1 ipsec.conf # ipsec.conf - strongSwan IPsec configuration file

# basic configuration config setup # plutodebug=all #crlcheckinterval=180 #strictcrlpolicy=no # cachecrls=yes # nat_traversal=yes #charonstart=yes plutostart=no

Charon (IKEv2) and Pluto (IKEv1) are set to yes by default. If we want to use IKEv2, we have to disable Pluto. # Add connections here. conn %default ikelifetime=60m keylife=20m rekeymargin=3m keyingtries=1 keyexchange=ikev2 mobike=no authby=secret esp=null-null

# Uncomment for IPv6 in IPv? #conn net-net # also=host-host # leftsubnet=2001:3::/64 # rightsubnet=2001:2::/64

# Uncomment for IPv4 in IPv? conn net-net also=host-host leftsubnet=192.168.3.0/24

122

rightsubnet=192.168.2.0/24

# Uncomment for IPv? in IPv6 conn host-host left=fec0::1 right=fec0::2 auto=start

# Uncomment for IPv? in IPv4 #conn host-host # left=10.0.0.1 # right=10.0.0.2 # auto=start

1.8.4.3 ipsec.secrets The file ipsec.secrets contains the keys, credentials and PINs. The file is the same for both main and test1. The line with the secret should be uncommented depending on the tunnel IP version. # This file holds shared secrets or RSA private keys for inter-Pluto # authentication. See ipsec_pluto(8) manpage, and HTML documentation.

# RSA private key for this host, authenticating it to any other host # which knows the public part. Suitable public keys, for ipsec.conf, DNS, # or configuration of other implementations, can be extracted conveniently # with "ipsec showhostkey".

# this file is managed with debconf and will contain the automatically created private key #fec0::1 fec0::2 : PSK "a little secret between us" 10.0.0.1 10.0.0.2 : PSK "a little secret between us"

#include /var/lib/strongswan/ipsec.secrets.inc

1.8.5 IPv4-in-IPv6Tunnel test First we want to configure an IPv4 in IPv6 tunnel. The routes are as follows: AR1: 192.168.3.0/24 dev eth1 proto kernel scope link src 192.168.3.2 192.168.2.0/24 via 192.168.3.3 dev eth1 169.254.0.0/16 dev eth1 scope link 172.21.0.0/16 dev eth0 proto kernel scope link src 172.21.0.7 default via 172.21.0.254 dev eth0

Main: 192.168.3.0/24 dev eth1 proto kernel scope link src 192.168.3.3 172.21.0.0/24 dev eth0 proto kernel scope link src 172.21.0.21 default via 172.21.0.1 dev eth0 metric 100 fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth1 metric 1024 mtu 1500 advmss 1440 hoplimit 0 fec0::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0

Test1: 192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.2.3 172.21.0.0/24 dev eth0 proto kernel scope link src 172.21.0.22 123 default via 172.21.0.254 dev eth0 metric 100 fe80::/64 dev eth1 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fec0::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0

AR2: 192.168.3.0/24 via 192.168.2.3 dev eth1 192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.2.2 169.254.0.0/16 dev eth1 scope link 172.21.0.0/16 dev eth0 proto kernel scope link src 172.21.0.8 default via 172.21.0.254 dev eth0

The first thing we want to verify is that the tunnel is created and it works. The results were negative. Since strongSwan has already been tested for IPv4-in-IPv6 and the results are shown in their website, we deduce that we have a misconfiguration. According to our setup, a packet sent from AR1 to AR2 shouldn’t be delivered because there are no routes in main and test1 allowing it. However, when the IPsec tunnel is up, the packet is encapsulated in IPv6 at the gateways and transmitted through the fec0::/64 link. Then, upon reaching the other end of the tunnel, it should be decapsulated and delivered to the local network. This didn’t happen. While in the following text I only explain the case of a ping from AR1 to AR2, AR2 to AR1 has been tested with symmetric results. The packet was encapsulated and transmitted. We followed the steps of a ping sent from AR1 to AR2 and it was lost at test1 after being decapsulated. It was never delivered to the IPv4 network. Using IPtables to mark the packets we know which steps the packets took in the netfilter IPsec processing:

Figure 53: Ping packet through the netfilter processing at the remote gateway

Figure 53 shows the theoretical packet path. It arrives as an IPv6 packet that encapsulates the IPv4 packet (green line). Because its destination is the test1 gateway, it enters the input chain. The policies determine that the packet has to be decapsulated. The inner IPv4 packet (blue line) is sent to the PREROUTING chain. The packet is lost at a point where it should be forwarded (red cross). The route exists and it points to the right

124 direction: the packet has source 192.168.3.2 and destination 192.168.2.2 and the following entry is in the route table: 192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.2.3

Using this route, test1 should know that it has to forward the packet through the device eth1 and so, send it to the forward chain. After installing the UML test environment and checking for the example that does the same kind of tunnelling, we came to realise that the IPsec implementation requires a route to be added: Main: 192.168.2.0/24 via 10.0.0.1 dev eth2 Test1: 192.168.3.0/24 via 10.0.0.2 dev eth2

These routes apparently, don’t seem to add anything. If instead we add these routes the packets are also delivered: Main: 192.168.2.0/24 dev eth2 scope link src 192.168.3.3 Test1: 192.168.3.0/24 dev eth2 scope link src 192.168.2.3

The common factor is that we have to add the remote network with the device to which it will be sent. We also try only adding the route on one of the gateways and not in the other. If we eliminate the route from the remote gateway, the echo request is lost there. If we eliminate the local gateway route, the echo request is delivered but the echo reply is lost at the local gateway. While we have found no proof in the means of an article or manual explaining why, the router seems to apply ingress filtering. That is, it drops the packet if no routes exist to the source address, even if they are need used for routing the packet. It seems to require to link it to the device from which the packet is coming, but the actual routing is unimportant (since it is not used for routing but from verifying it is a legitimate packet).

1.8.6 IPv6-in-IPv6 configuration IPv6 in IPv6 tunnels also require some analogous routes to be added: Main: 2001:2::/64 dev eth2 metric 1024 mtu 1500 advmss 1440 hoplimit 0 Test1: 2001:3::/64 dev eth2 metric 1024 mtu 1500 advmss 1440 hoplimit 0

AR1: 2001:2::/64 via 2001:3::50 dev eth1 metric 1024 expires 21098035sec mtu 1500 advmss 1440 hoplimit 4294967295 2001:3::/64 dev eth1 metric 256 expires 20513386sec mtu 1500 advmss 1440 hoplimit 4294967295 fe80::/64 dev eth0 metric 256 expires 20513377sec mtu 1500 advmss 1440 hoplimit 4294967295 fe80::/64 dev eth1 metric 256 expires 20513382sec mtu 1500 advmss 1440 hoplimit 4294967295

Main: 2001:2::/64 dev eth2 metric 1024 mtu 1500 advmss 1440 hoplimit 0 2001:3::/64 dev eth1 proto kernel metric 256 mtu 1500 advmss 1440 125 hoplimit 0 fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth1 metric 1024 mtu 1500 advmss 1440 hoplimit 0 fec0::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0

Test1: 2001:3::/64 dev eth2 metric 1024 mtu 1500 advmss 1440 hoplimit 0 2001:2::/64 dev eth1 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth1 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fe80::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0 fec0::/64 dev eth2 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0

AR2: 2001:2::/64 dev eth1 metric 256 expires 20513360sec mtu 1500 advmss 1440 hoplimit 4294967295 2001:3::/64 via 2001:2::50 dev eth1 metric 1024 expires 21098021sec mtu 1500 advmss 1440 hoplimit 4294967295 fe80::/64 dev eth0 metric 256 expires 20513351sec mtu 1500 advmss 1440 hoplimit 4294967295 fe80::/64 dev eth1 metric 256 expires 20513356sec mtu 1500 advmss 1440 hoplimit 4294967295

1.8.7 IPv4 in IPv4 configuration AR1: 192.168.3.0/24 dev eth1 proto kernel scope link src 192.168.3.2 192.168.2.0/24 via 192.168.3.3 dev eth1 169.254.0.0/16 dev eth1 scope link 172.21.0.0/16 dev eth0 proto kernel scope link src 172.21.0.7 default via 172.21.0.254 dev eth0

Main: 10.0.0.0/24 dev eth2 proto kernel scope link src 10.0.0.1 192.168.2.0/24 dev eth2 scope link src 192.168.3.3 192.168.3.0/24 dev eth1 proto kernel scope link src 192.168.3.3 172.21.0.0/24 dev eth0 proto kernel scope link src 172.21.0.21 default via 172.21.0.1 dev eth0 metric 100

Test1: 10.0.0.0/24 dev eth2 proto kernel scope link src 10.0.0.2 192.168.3.0/24 dev eth2 scope link src 192.168.2.3 192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.2.3 172.21.0.0/24 dev eth0 proto kernel scope link src 172.21.0.22 default via 172.21.0.254 dev eth0 metric 100

AR2: 192.168.3.0/24 via 192.168.2.3 dev eth1 192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.2.2

126

169.254.0.0/16 dev eth1 scope link 172.21.0.0/16 dev eth0 proto kernel scope link src 172.21.0.8 default via 172.21.0.254 dev eth0

1.8.8 Authentication & Encryption To add authentication or encryption we have to select the algorithms in ipsec.conf under: conn %default ikelifetime=60m keylife=20m rekeymargin=3m keyingtries=1 keyexchange=ikev2 mobike=no authby=secret

If ESP is required, the entry “esp” will be added. AH is set using “ah”. However, AH is not supported when using IKEv2 and it is barely supported with IKEv1 (it might be buggy). When using ESP, the encryption and authentication protocols have to be chosen. The entry consists of first the encryption algorithm and then the authentication algorithm separated by a “-“. If one of the options is not wanted, then write “null”. For example, if we want AES 128 bits encryption with SHA-1 then: esp=aes128-sha1

If no encryption is desired: esp=null-sha1

More information on the supported algorithms can be found in [Algorithms].

1.8.9 DSCP / TOS test The goal of this test is to verify if the original packet’s QoS requirement is maintained after being encapsulated in IPsec. For that, three echo requests are sent with different values on the TOS field: First is best effort traffic, no special requirements are set. Therefore, the TOS field is 0x00. Second ping has to minimise the delay. The TOS field is 0x10. Last ping requests maximum throughput and so, the TOS filed is 0x08.

The traces are captured using tcpdump and shown with Wireshark. The QoS is shown as with the DSCP value: TOS field DSCP value Comment 0x00 0x00 First echo request 0x10 0x04 Second echo request 0x08 0x02 Last echo request Table 22: Correspondance between TOS field and DSCP value

The value is set using IPtables. Since we ping 192.168.2.2: # iptables –t mangle –A OUTPUT –d 192.168.2.2 –j TOS --set-tos 0x00 # ping 192.168.2.2 –c 1 # iptables –t mangle –R OUTPUT 1 –d 192.168.2.2 –j TOS --set-tos 0x10 # ping 192.168.2.2 –c 1 127

# iptables –t mangle –R OUTPUT 1 –d 192.168.2.2 –j TOS --set-tos 0x08 # ping 192.168.2.2 –c 1 # iptables –t mangle –D OUTPUT –d 192.168.2.2 –j TOS --set-tos 0x08

Figure 54: QoS test capture in AR1 using IPv4

As it can be observed in the encapsulated packets, the outer header copies the QoS to the Traffic Class field of the IPv6 header. The captures on the four nodes can be found in my folder (My work -> Report -> Images and Captures -> Strongswan -> TOS test) The IPv4 in IPv4 and the IPv6 in IPv6 cases also work.

1.8.10 IPv6 fragmentation test (IPv4 in IPv6) According to IPv6 standards, only the end hosts are allowed to fragment a packet. Let’s consider the example that we use the networks 192.168.3.0/24, 192.168.2.0/24 and fec0::/64, and we ping from 192.168.3.2 to 192.168.2.2. The end hosts are the latter and so, the IPsec gateways that are using IPv6 between them, shouldn’t fragment. However, because they are creating a tunnel, they become the end hosts in such a tunnel. So it might be possible that they fragment. In this test we want to analyse this behaviour and verify if it happens. To avoid interference, PMTUD has been disabled at access routers. The middle link MTU to 1300 using the following ip link command in main and test1:

# ip -6 link set mtu 1300 dev eth2

Then, a ping from the access routers has been launched, with a size small enough not to be fragmented because of the link between the access routers and the gateways but large enough not to go through the IPv6 tunnel. The payload of the ping packet is 1400, so the packet size after the headers are added is:

# ping 192.168.2.2 –s 1400

128

Figure 55: Large echo request from 192.168.3.2 to 192.168.2.2 (capture on AR1 dev eth1)

As seen in Figure 55 the total frame size is 1442 bytes. This includes the previously calculated size and the Ethernet header size (+14 bytes). It must be remembered that the MTU is specified without taking into account the Ethernet header (so a MTU of 1500 bytes means a maximum of 1514 bytes counting the Ethernet header). If there would be no fragmentation in main the packet size would be the 1428 bytes plus the new IPv6 header (+40 bytes), plus the ESP header (+8 bytes) and the ESP trailer.

Figure 56: First fragment of the encapsulated packet from 192.168.3.2 to 192.168.2.2 (capture on main dev eth2)

129

Figure 57: Second fragment of the encapsulated packet from 192.168.3.2 to 192.168.2.2 (capture on main dev eth2)

Ten packets were sent and all of them were fragmented at the tunnel ends. According to [RFC 2473], section 7.2 if this is the expected behaviour because the “Don’t Fragment” bit is clear. It can be seen in Figure 56 and Figure 57 that the packet is sent as two IPv6 fragments that already contain the ESP header. Therefore, fragmentation occurs in the tunnel ends and it occurs after encapsulation. This is a bad thing because: Fragmentation adds overhead. The receiving IPsec gateway must wait for all fragments before it can decapsulate the packet. The final packet size after encryption but before fragmentation is 1440 bytes. This is caused by the original packet and the ESP header and trailer (with two bytes of padding): ESP header IPv4 header ICMP header Payload ESP trailer 8 bytes 20 bytes 8 bytes 1400 bytes 4 bytes

An IPv6 header (40 bytes) will be added to each fragment and also, the IPv6 fragmentation extension header (8 bytes). Each packet is cut in fragments to fit the 1300 MTU. Also, all fragments but the last must have a size multiple of 8 bytes. Therefore, the fragments will have a maximum size of 1296 bytes, of which 40 bytes will be IPv6 header, 8 bytes fragmentation header and 1248 bytes will be data from the original packet. This size is verified in Figure 56. So, at the end, two fragments of 1296 bytes and 240 bytes are sent (1536 total). This means an additional 108 bytes from the original packet. If no fragmentation would have occurred, the additional overhead would have been 52 bytes (IPv6 header + ESP header + ESP trailer). The difference comes from the additional IPv6 header from the second fragment (40) and the two fragmentation extension headers, one on each fragment (8+8). Case Data size Overhead Original packet size 1428 bytes - Encapsulated IPsec packet without fragmentation 1480 bytes 52 bytes (+3.64%) (MTU 1500) Data transmitted after IPsec processing with fragmentation 1536 bytes 108 bytes (+7.56%) 130

(MTU 1300) Table 23: Transmitted data per packet for a 1400 data ping into an IPsec ESP IPv4 in IPv6 tunnel

An IPv4 packet cannot be decapsulated until the two IPv6 fragments have arrived and so, computational load is also added in the receiving IPsec gateway. The reassembled and decapsulated echo request packets can be seen in the eth1 device of test1:

Figure 58: Reassembled "Echo request" packets in test1 dev eth1

If the DF bit is set, the packets should be dropped and an ICMP message sent back with the advised MTU. The test is repeated with the DF bit set using: # ping 192.168.2.2 –s 1400 –M want With the “want” option the DF bit is set to 1 to use PMTUD. If we would have set it to “do” then no fragmentation would occur, not even at AR1.

Figure 59: Fragmentation in IPv4-in-IPv6 tunnel when DF bit is SET (capture at AR1 dev eth1).

131

So, when the DF bit is set, the IPsec gateway sends back an ICMP message so the end host fragments the packet before it arrives at the gateway. This solution, PMTUD, is better because even though we aren’t avoiding fragmentation, it is not the gateways that perform it. So, they don’t suffer from additional computational load and they can decapsulate any packet without needing to wait for another fragment.

1.8.11 IPv6 fragmentation test (IPv6 in IPv6) Before entering into the test, it is important to understand how fragmentation works on IPv6 tunnelling. The behaviour is described in [RFC 2473], sections 7 and 7.1. In a few words, if the packet will not fit the tunnel, then two things could happen. If the packet size is bigger than IPv6 minimum MTU (1280 bytes) then it will be dropped and a “Datagram Too Big” ICMPv6 message will be sent. If the packet is smaller or equal, then the tunnel gateway will accept the packet and fragment it. Let us explain what behaviour we expect in the test. The packets sent from AR1 to AR2 are: Packet number Packet size IPv6 header ICMPv6 header Ping data size 1 1148 40 8 1100 2 1248 40 8 1200 3 1348 40 8 1300 4 1448 40 8 1400 Table 24: Packet size for the IPv6 fragmentation test

The Link MTU between the two IPsec gateways is set to 1300 bytes. IPsec is tunnelling using ESP without encryption or authentication. This adds a minimum overhead of 50 bytes (40 bytes of the IPv6 header, 8 bytes of the ESP header and 2 bytes of the trailer length and next header fields) when there is no padding. Therefore, the Tunnel MTU is 1250 bytes (Link MTU minus overhead). Any incoming packet larger than this value will exceed the Link MTU after being encapsulated. RFC 2401 describes this Tunnel MTU as the MTU that should be advertised when sending ICMP messages [RFC 2401]. The first two pings have a packet size lower than the Tunnel MTU and so, they should be delivered correctly. Packet 3 has a size of 1348 bytes. The packet arrives at main and it is checked against the Tunnel MTU. According to [RFC 2473] section 7.1.a, the packet is dropped and a “Datagram Too Big” ICMP message is sent back with the advised MTU being the maximum of the Tunnel MTU and IPv6 minimum MTU:

Then, AR1 receives the ICMP message. The packet has been dropped but now AR1 knows the Path MTU (which is the Tunnel MTU). Therefore, all packets with bigger size than that will be fragmented. Packet 3 is never retransmitted, instead we continue with packet 4. Packet 4 is cut into two fragments. The first will have a size of 1280 bytes and the second 224 bytes. Note that each fragment has its own IPv6 and fragmentation extension headers but that the ping data and ICMP header is distributed between the two fragments. When the first fragment arrives at the main node, it will again check against the Tunnel MTU. Because 1280 is bigger than 1250, the packet doesn’t fit. However, unlike the previous situation the packet is smaller or equal to IPv6 minimum MTU and so, it will not be dropped. Instead, the “fragment” will be fragmented in two to fit

132 the Tunnel MTU of 1250 bytes. The second original fragment will fit the tunnel and it will be encapsulated and sent without a problem.

Figure 60: IPv6 fragmentation behaviour

For doing the practical test, pings have been sent using the ping6 command. The ping data size is the value that has to be entered when using the “-s value” option of ping6. The two first packets are sent without being fragmented. Note that the 1200 data bytes packet is not

133 fragmented but it is close to it. The packet sent is 1248 bytes (data+icmp+ header) which is smaller than the Tunnel MTU. The traces show that the packet is, as expected, not fragmented. The encapsulation is done adding the minimum 50 bytes and 2 bytes of padding, making a total of 1300 Bytes. The first packet to be fragmented is the 1348 Bytes sized packet. A “Datagram too big” ICMPv6 packet is sent back to AR1 and the packet is dropped. The advertised MTU in the ICMPv6 message is 1250, which is lower than the Link MTU. This makes sense because this value corresponds to the Tunnel MTU (link MTU minus encapsulating headers). The IPsec gateway will add an additional overhead of at least, 50 bytes: 40 for tunnelling (IPv6 header), 8 for ESP header and at least 2 for ESP trailer (no padding). However, this is a non compliant behaviour according to [RFC 2473] section 7.1.a. It should advertise the maximum between the Tunnel MTU (1250) and the IPv6 minimum link MTU (1280). Note that the packets are fragmented at AR1. The fragment size should correspond to the advertised MTU. However, in this case the advertised MTU in the Datagram Too Big packet is wrong! The observed behaviour is that the fragmentation process ignores the advertised MTU and uses the minimum IPv6 MTU of 1280 bytes. First fragment is composed of 1232 bytes of data (not to be confused with ping data), 40 bytes of IPv6 header and 8 Bytes of fragmentation header. Because the size is bigger than the Tunnel MTU, the first fragment is dropped when it arrives at main. The second fragment is encapsulated and sent to test1 and finally to AR2. So, the packets are not correctly sent and no reply is received. To verify this behaviour, a new MTU is set. This time is set at 1310. With this value, the expected Tunnel MTU is 1260. However, the announced MTU value is 1258. This is because after adding the 50 bytes at the gateway, not only the size has to be smaller than the link MTU but also a multiple of4 bytes (ESP requirement). If the Linux implementation ignores MTU values smaller than 1280, then a first fragment size of 1280 is expected. This is the case. Now, let’s try with a link MTU large enough to be bigger than 1280 bytes after subtracting the minimum 50 bytes required for ESP and tunnelling. That value would be 1330 but it is not multiple of 4, so the MTU has to be bigger than 1332. The MTU will be set to 1348. If the hypothesis is right, the announced MTU should be 1298 bytes and the first fragment should fit to this size, remaining multiple of 8 bytes (IPv6 fragmentation requirement). Therefore, the first fragment size should be 1296 bytes. The observed size is 1296. To verify that the behaviour is the same when the packet is small enough to fit the Link MTU but not the Tunnel MTU another test is done setting the MTU to 1400 bytes. The ping packet is 1378 bytes sized including all headers. When it arrives at main, a “Packet too big” is immediately sent back with announced MTU 1350 bytes. As a conclusion, there seems to be two bugs here. First, the announced MTU in the “Datagram Too Big” should never be smaller than 1280 bytes. It seems like the announced value skips checking for this calculating the maximum between the Tunnel MTU and 1280 bytes. Also, fragments are never sized smaller than IPv6 minimum MTU, even when the announced MTU is smaller. This shouldn’t be considered completely as a bug, for in theory all IPv6 links should support at least, the minimum MTU of 1280 bytes. So, if we want to avoid this problem and still use the Linux implementation, choosing a Tunnel MTU bigger than

134

1280 bytes is a must. To do so, choose a Link MTU should be bigger than the first multiple of 4 that is bigger than the minimum overhead + 1280 bytes.

Figure 61: IPv6 fragmentation behaviour in Linux

A strange behaviour was detected while testing IPv6 in IPv6. After fragmentation had occurred, the end host AR1 added the fragmentation extension header even to packets small enough to fit the announced MTU. Two tests were carried out. In first test the link MTU is set to 1300 (tunnel MTU 1250) and in test 2 the link is set to 1400 bytes (tunnel MTU 1350). The packet sequence is for the first test: Packet number Packet size Result 1 1248 Delivered. 2 1348 Packet Too Big. Announced MTU = 1250. 3 1248 Fragmentation header added. Size becomes 1256 bytes. Packet Too Big. Announced MTU = 1250. 4 248 Fragmentation header added. Size becomes 256 bytes. Delivered.

135

5 1248 Fragmentation header added. Size becomes 1256 bytes. Packet Too Big. Announced MTU = 1250. Cache table flushed 6 1248 Delivered. 7 1348 Packet Too Big. Announced MTU = 1250. 8 248 Fragmentation header added. Size becomes 256 bytes. Delivered. 9 1248 Fragmentation header added. Size becomes 1256 bytes. Packet Too Big. Announced MTU = 1250. Table 25: Packet sequence for test 1 of the fragmentation header after IPv6 fragmentation

And for test 2: Packet number Packet size Result 1 1348 Delivered. 2 1448 Packet Too Big. Announced MTU = 1350. 3 1348 Delivered. 4 248 Delivered. 5 1348 Delivered. Cache table flushed 6 1348 Delivered. 7 1448 Packet Too Big. Announced MTU = 1350. 8 248 Delivered. 9 1348 Delivered. Table 26: Packet sequence for test 2 of the fragmentation header after IPv6 fragmentation

Therefore, it seems that the fragmentation header is added to packets that would normally not need to be fragmented if the announced MTU is smaller than the minimum IPv6 MTU. Note: To flush the cache, use: ip -6 route flush table cache

1.8.12 IPv6 announced MTU Other than the fact that the announced MTU ignores that it shouldn’t be lower than the minimum IPv6 MTU (1280 bytes), we want to know if it acts correctly depending on the tunnelling overhead. From the previous test we can see that if the Link MTU changes, the Tunnel and announced MTUs are updated. So, to verify it, we will make two more tests. One without tunnelling to check if the announced MTU corresponds to the Link MTU and one with an additional overhead from authentication. In the first test, we have created a route for packets from AR1 (2001:3::2) to test1 (fec0::2) without need of encryption or tunnelling. The MTU of the link between main and test1 is set to 1300 bytes. The packet size is 1448 bytes (1408 IPv6 payload size).

136

As we can see in the following traces, the first ping is dropped and replied with a “Datagram Too Big” message. The following packets are fragmented by AR1 before being sent. 11:41:22.571753 IP6 2001:3::2 > fec0::2: ICMP6, echo request, seq 0, length 1408 11:41:22.571821 IP6 2001:3::50 > 2001:3::2: ICMP6, packet too big, mtu 1300, length 1240 11:41:23.572560 IP6 2001:3::2 > fec0::2: frag (0|1248) ICMP6, echo request, seq 1, length 1248 11:41:23.572654 IP6 2001:3::2 > fec0::2: frag (1248|160) 11:41:23.572777 IP6 fec0::2 > 2001:3::2: frag (0|1248) ICMP6, echo reply, seq 1, length 1248 11:41:23.572840 IP6 fec0::2 > 2001:3::2: frag (1248|160) 11:41:24.572700 IP6 2001:3::2 > fec0::2: frag (0|1248) ICMP6, echo request, seq 2, length 1248 11:41:24.572795 IP6 2001:3::2 > fec0::2: frag (1248|160) 11:41:24.572947 IP6 fec0::2 > 2001:3::2: frag (0|1248) ICMP6, echo reply, seq 2, length 1248 11:41:24.573010 IP6 fec0::2 > 2001:3::2: frag (1248|160)

The announced MTU is 1300, exactly the value of the Link MTU and just as expected. To check with authentication with add SHA1 to the configuration file. It adds 98 bits (12 bytes). Therefore, when setting a Link MTU of 1500 bytes, the Tunnel MTU should be 1438 bytes, the 1450 we had before minus the additional overhead from the authentication.

Figure 62: MTU advertised in the "Datagram Too Big" with ESP null encryption, SHA1 authentication

The MTU is indeed 1438. Therefore, it seems that the advertised MTU has adapted to the tunnel requirements.

1.9 Sandra Testbed The testbed used in the following tests will be a different one. The configuration is the following:

Figure 63: SANDRA testbed as used for IPsec and ROHC testing

137

Using the Sandra testbed four different scenarios have been tested to make sure that the configuration parameters are known. All four scenarios have been tested with IPv4-in-IPv4, IPv4-in-IPv6 and IPv6-in-IPv6. The scenarios are: IPsec ESP tunnelling between main and test1. IP tunnel inside IPsec ESP. ROHC inside IPsec ESP. ROHC inside an IP tunnel which in turn is inside an IPsec ESP tunnel. So for example, in the last scenario, a packet going from MN1 to CN would be first encapsulated with the ESP and IP headers. Then, it would enter the IP tunnel that would add an additional IP header. Finally it would enter the ROHC compressor before being sent to the link 192.168.200.0/24. When the packet would reach test1 the inverse process would happen. The packet would be first decompressed using ROHC. Then, the IP tunnel header would be processed and finally, the IP and ESP headers from applying IPsec. The addresses assigned to the different virtual interfaces are: Main (IPv4) Main (IPv6) Test1 (IPv4) Test1 (IPv6) IP tunnel 10.3.0.1/24 2001:3::1/64 10.3.0.2/24 2001:3::2/64 ROHC 10.0.0.1/24 2001:eeee::1/64 10.0.0.2/24 2001:eeee::2/64 Table 27: Addresses of the IP tunnel and ROHC interfaces in the SANDRA testbed tests

1.10 Don’t Fragment Bit manipulation In the case of IPv4 in IPv6 and IPv4 in IPv4 we don’t want the IPsec gateway to perform fragmentation. However, if the original packet doesn’t have the DF bit, the packet will be fragmented at the gateway. Therefore, a solution to avoid this would be to force the DF bit to SET when it arrives at the gateway and before any decision is made about it. This means, manipulating the bit in the PREROUTING chain. The problem is that after decapsulation, the DF bit would still be set to 1, and this might not be what the source host originally intended. For that, the DF bit is restored to its original value at the FORWARD chain. All packets bigger than the IPsec tunnel MTU are forced to be fragmented at the source node. Because the original DF field value is kept after the tunnel, the packet might be fragmented between the output IPsec gateway and the destination host, but then it would not affect the IPsec gateways and so, it wouldn’t be a problem. Therefore the packets are guaranteed not to be fragmented in the IPsec tunnel. Therefore, the packet follows these steps: 1. PREROUTING chain: The packet is internally marked if the original DF bit is CLEAR. The DF bit is forced to SET. 2. ROUTING: Because the packet has the DF to SET, it will be dropped if it exceeds the tunnel MTU. 3. FORWARD chain: If the packet goes through, the DF bit is cleared if the packet was marked (i.e. originally clear). If the packet isn’t marked then it was originally SET, so no action is required. 4. INPUT chain: If the packet goes through to the INPUT chain, it receives de same treatment as in FORWARD. Note that this is supposing that the packet size is verified against the tunnel MTU at the routing step after the 138

PREROUTING chain. This statement has to be verified with a test. The first thing that comes to mind is to use iptables to change the bit. However, there currently exists no target capable of doing such thing. No other simple solution has been found. For that reason, I have written a new target for the Don’t Fragment bit based on the DSCP target source code. The netfilter targets are modules of the kernel. For that reason, the module has had to be added and the kernel recompiled. The new target allows the user to choose either to set or clear the DF bit. If necessary, it will update the packet with the new value and recalculate the ip header checksum. A series of tests are run to verify first the DF target and then the use in the scenario previously described.

1.10.1 Test1 The first test (DFbitTest1.pcap) changes the DF bit for the request packets. Six pings are sent and for some, a DF bit value is enforced at the POSTROUTING chain. 1. DF = clear and no action. 2. DF = set and no action. 3. DF = clear and enforcing DF bit to set using iptables. 4. DF = set and enforcing DF bit to set using iptables. 5. DF = clear and enforcing DF bit to clear using iptables. 6. DF = set and enforcing DF bit to clear using iptables. In the first two cases the original choice is maintained. In the rest, the iptables enforced value is found on the packets. The checksum is correct. Therefore, the DF target works correctly. To repeat the test, use the following commands at main: # iptables –t mangle –F # ping 192.168.3.2 –M dont –c 1 # ping 192.168.3.2 –M do –c 1 # iptables –t mangle –A POSTROUTING –j DF --set # ping 192.168.3.2 –M dont –c 1 # ping 192.168.3.2 –M do –c 1 # iptables –t mangle –F # iptables –t mangle –A POSTROUTING –j DF --clear # ping 192.168.3.2 –M dont –c 1 # ping 192.168.3.2 –M do –c 1 # iptables –t mangle –F

1.10.2 Test2 The second test (DFbitTest2.pcap) aims at verifying that only the DF bit is changed. Because of the ip header struct used in linux, there is the risk that the fragment offset and more fragments are modified if the target is not well coded. To test that, big packets (slightly more than 1700 bytes) are sent so they are fragmented. Originally, the first packet had the DF bit to clear and the second to set. The DF bit is enforced to clear. In the test results it can be seen other than the DF flag, the other fields are not modified.

1.10.3 Test3 The test will serve to verify what happens if the packet DF bit is forced to SET at the POSTROUTING chain while the size is too big so it will require fragmentation. The first packet is sent with the DF flag already at set. The packet is considered too big and it is not even sent. 139

An error message is shown stating that the packet is bigger than the MTU (1500 bytes) and the DF is set. The second packet (shown in DFbitTest3.pcap) originally has the DF bit to clear. It is then sent with the DF as CLEAR even though iptables sets the bit at the POSTROUTING chain.

1.10.4 Test4 All previous tests did the manipulation of the DF bit at the same machine where packet as generated. Next test wants to verify the situation that it is a router on the path that performs the manipulation of the DF bit. This is the case that really interests us. The test consists on sending two packets with the DF bit originally at clear. The first one is a small packet. It can be observed (DFbitTest4.pcap) that the packet arrives at the destination and with the DF bit to SET. The second packet is larger than the MTU before the router. Therefore, the sending host fragments the packet in two. The router receives the two packets, changes the DF bit on both, and then forwards them to the receiver. The capture of these fragments at the receiver is shown in the pcap file. The DF bit is set correctly. Note that the pings are sent from Test1 to AR1 in the Newsky testbed, adding the necessary routes to make that possible.

1.10.5 Test5 For this test we will use the Newsky testbed as configured previously with IPsec ESP using IPv4 in IPv6 tunnel mode. The change is that the PREROUTING chain of main has an entry that sets the DF bit. Packets will be sent from AR1 to AR2. 1. DF bit clear. Packet size smaller than tunnel MTU. 2. DF bit clear. Packet size bigger than tunnel MTU but smaller than the other link’s MTUs. 3. DF bit clear. Packet size bigger than tunnel MTU but smaller than the other link’s MTUs. 4. DF bit clear. Packet size bigger than tunnel MTU but smaller than the other link’s MTUs. Note: The size is 20 bytes for the IPv4 header, 8 for the ICMP header and 1450 bytes for the payload. The tunnel MTU is 1450 bytes so it will definitely be exceeded. The results can be observed in both AR1 and AR2 at the files DFtest5AR1.pcap and DFtest5AR2.pcap. The first packet goes through without trouble. The second has the DF bit set at main and because it is too big, it sends back the ICMP packet “too big” to AR1. The rest of the packets are fragmented at AR1, just like intended. This way, the IPsec gateways don’t fragment the packets.

1.10.6 Test6 Test 6 aims at verifying the previous statement that the packet size is verified against the tunnel MTU at the routing just after the PREROUTING chain. For that, the DF is set at main at the PREROUTING chain and then it is cleared at the FORWARD chain. Therefore, the only point on the whole path where the DF bit is set to 1 is at the routing step. If the statement is true, then main will send an ICMP message back to AR1 saying the packet is too big. If not, it will be main that fragments the packets. Also, the packets at AR2 should have the DF bit cleared. The same packets as in test 5 are sent. It can be seen that both the DF is clear at AR2 and that fragmentation occurs at AR1. 140

1.10.7 Test7 This final test aims at verifying the functionality of the DF bit target in the specific case that we want it for. The test is performed in the Newsky testbed with IPsec running with an IPv6 tunnel between main and test1. We want main to enforce PMTUD, so the DF bit has to be set for all incoming packets. However, once the packet is encapsulated, the packet should keep the original value of the flag. The following commands have to be executed to enable this behaviour. Note that the mark number is set to 100 for the example but it can be changed if it conflicts with other rules. # iptables –t mangle –I PREROUTING 1 –m u32 –-u32 “3&0x40=0x00” –j MARK –- set-mark 100 # iptables –t mangle –I PREROUTING 2 –j DF --set # iptables –t mangle –A FORWARD –m mark –-mark 100 –j DF –-clear # iptables –t mangle –A INPUT –m mark –-mark 100 –j DF –-clear

Note that the rules at PREROUTING have been added in order. This is important, or else, the packet wouldn’t be checked against the original value of the flag. The following packets are sent: 1. DF bit clear. Packet size smaller than tunnel MTU. 2. DF bit set. Packet size smaller than tunnel MTU. 3. DF bit clear. Packet size bigger than tunnel MTU but smaller than the other link’s MTUs. 4. DF bit clear. Packet size bigger than tunnel MTU but smaller than the other link’s MTUs. Packets 1 and 2 go through and it can be seen that the flag is conserved after going through the IPsec tunnel (at AR2). Packet 3 is dropped at main and so, AR1 receives “Packet too big”. Because of that, packet 4 is sent fragmented from AR1. As a conclusion, PMTUD is enforced at the IPsec gateways but the behaviour for the rest of the path is left to the end hosts to decide. Note that in IPv4 in IPv4 another entry has to be added to the tables. When the encapsulated reply packet arrives at main it enters the PREROUTING chain and the DF bit is checked at the outer header. Then, the packet goes to INPUT and it is decapsulated. The packet goes PREROUTING again but this time it is the inner header that is checked. Then it goes to the FORWARD chain and so on. If the outer header had the DF bit to 0 and the inner to 1, because at the PREROUTING (outer) it will have been marked, at FORWARD (inner) it is cleared. To avoid this effect, we want to mark only packets coming from interface eth1. # iptables –t mangle –I PREROUTING 1 –i eth1 –m u32 –-u32 “3&0x40=0x00” –j MARK –-set-mark 100 # iptables –t mangle –I PREROUTING 2 –i eth1 –j DF --set

1.11 Modifying the TCP stack of Linux The three parameters of TCP that will be changed when using enhanced TCP are the use of window scaling, selective acknowledgements (SACK) and a different congestion avoidance algorithm. All these parameters are configurable in Linux. To modify window scaling: Action Command 141

Verify current status of window scaling (1=enabled, 0= disabled) sysctl net.ipv4.tcp_window_scaling

Verify current status of receive window buffer sysctl net.ipv4.tcp_rmem

Verify current status of transmit window buffer sysctl net.ipv4.tcp_wmem

Enable window scaling sysctl –w net.ipv4.tcp_window_scaling=1

Disable window scaling sysctl –w net.ipv4.tcp_window_scaling=0

Change the size of the receive window buffer Sysctl –w net.ipv4.tcp_rmem wmin wdef wmax (wmin = window minimum size in bytes, wdef = default window size in bytes, wmax = maximum window size in bytes)

Change the size of the transmit window buffer Sysctl –w net.ipv4.tcp_wmem wmin wdef wmax (wmin = window minimum size in bytes, wdef = default window size in bytes, wmax = maximum window size in bytes) Table 28: Window scaling parameters and commands

Note that TCP’s maximum receiving window size without enabling Window Scaling is 64 Kbytes. The initial size of the congestion control window can also be increased. Rather than changing this parameter for all traffic with sysctl, it has to be changed on a per route basis with the ip command. To select the initial window size of a route, add “initcwnd value” at the end of the command used to create/change the route. For example, if we want to route packets going to 192.168.0.0/24 through interface eth0 and forcing an initial window size of 10 MSS, then: ip route add 192.168.0.0/24 dev eth0 initcwnd 10 Similar commands are used to enable the use of SACK: Action Command

Verify current status (1=enabled, 0= disabled) sysctl net.ipv4.tcp_sack

Enable selective acknowledgments sysctl –w net.ipv4.tcp_sack=1

Disable selective acknowledgments sysctl –w net.ipv4.tcp_sack=0 Table 29: Enable/disable SACK commands

The congestion control algorithm can be selected using: Action Command

Verify current algorithm sysctl net.ipv4.tcp_congestion_control

Check the available algorithms sysctl net.ipv4.tcp_available_congestion_control

Change the algorithm to name sysctl –w net.ipv4.tcp_congestion_control=name Table 30: TCP congestion control algorithm commands

Some examples of congestion control algorithms are Reno and Cubic.

142

2 References

[Algorithms] http://www.strongswan.org/docs/readme4.htm#section_14.1 [Cisco on VoIP] http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml [Cisco window] http://www.cisco.com/en/US/docs/ios/12_3t/12_3t14/feature/guide/gt_iarwe.html IABG, TriaGnoSys. “Security in broadband satellite systems for commercial and [ESA Report] institutional scenarios”, September 2011 (version 5.4). [iptables] http://www.netfilter.org/projects/iptables/index.html O. Andreasson. “Iptables tutorial 1.2.2” [iptables tutorial] http://www.frozentux.net/iptables-tutorial/iptables-tutorial.html [LARTC] B. Hubert. “Linux Advanced Routing and Traffic Control” http://lartc.org/ [ROHC example] https://answers.launchpad.net/rohc/+faq/639 http://bazaar.launchpad.net/~didier- [ROHC UDP tunnel] barvaux/rohc/main/annotate/head%3A/app/tunnel/tunnel.c [ROHC website] https://launchpad.net/rohc [MCoA] http://www.nautilus6.org/doc/tc-nepl-howto-20060209-KuntzR/nepl-howto.html#6f [netem] http://swik.net/netem/Examples+of+Use [netfilter] http://www.netfilter.org/index.html [Netfilter tables] http://en.wikipedia.org/wiki/File:Netfilter-packet-flow.svg C. Kissling, F. Hoffmann, E. Hafid Fazli, C. Baudoin, D. Niddam, F. Arnal and T. Gräupl. [Newsky] “Efficient Resource Management Techniques”, October 2009. [RFC 1191-5] RFC 1191 (Path MTU Discovery) section 5. [RFC 2001] RFC 2001 (TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms) [RFC 2401] RFC 2401 (Security Architecture for the Internet Protocol) Section 6.1.2.2 [RFC 2473] RFC 2473 (Generic Packet Tunneling in IPv6 Specification) [RFC 2525-2.13] RFC 2525 (Known TCP Implementation Problems) section 2.13. [RFC 2597] RFC 2597 (Assured Forwarding PHB Group) [RFC 2923-2] RFC 2923 (TCP Problems with Path MTU Discovery) section 2. [RFC 5857] RFC 5857 (IKEv2 Extensions to Support Robust Header Compression over IPsec). [strongSwan] http://www.strongswan.org/ “A Comparative Analysis of TCP Tahoe, Reno, New-Reno, SACK and Vegas” [TCP comparison] http://inst.eecs.berkeley.edu/~ee122/fa05/projects/Project2/SACKRENEVEGAS.pdf S. Ha, I. Rhee and L. Xu. “CUBIC: A New TCPFriendly HighSpeed TCP Variant” [TCP CUBIC] http://netsrv.csc.ncsu.edu/export/cubic_a_new_tcp_2008.pdf C. Cainin and R. Firrincieli. “TCP Hybla: a TCP enhancement for heterogeneous networks” [TCP Hybla] http://citeseerx.ist.edu/viewdoc/download?doi=10.1.1.100.2600&rep=rep1&type=pdf [TUN/TAP] http://en.wikipedia.org/wiki/TUN/TAP

143