NEAT A New, Evolutive API and Transport-Layer Architecture for the Internet

H2020-ICT-05-2014 Project number: 644334

Deliverable D3.3 Extended Transport System and Transparent Support of Non-NEAT Applications

Editor(s): Karl-Johan Grinnemo Contributor(s): Zdravko Bozakov, Anna Brunstrom, Maria Isabel Sanchez Bueno, Thomas Dreibholz, Kristian Evensen, Gorry Fairhurst, Karl-Johan Grinnemo, Audun Fosselie Hansen, David Hayes, Per Hurtig, Mohammad Rajiullah, Tom Jones, David Ros, Tomasz Rozensztrauch, Michael Tüxen, Eric Vyncke

Work Package: 3 / Extended Transport System Revision: 1.0 Date: November 30, 2017 Deliverable type: R (Report) Dissemination level: Confidential, only for members of the consortium (including the Commission Services) D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Abstract This deliverable summarises and concludes our work in Work Package 3 (WP3) to extend the transport services provided by the NEAT System developed in Work Package 2, and to enable non-NEAT applications to harness the transport services offered by NEAT. We have demonstrated how a policy- and information-based selection of transport pro- tocol by NEAT could provide a more efficient transport service for web applications. The information on which NEAT makes its transport selection decisions resides in the Charac- teristics Information Base (CIB). The CIB is populated by various CIB sources, and in WP3 we have designed, implemented, and evaluated various CIB sources, including meta data from mobile broadband networks, passive measurements, IPv6 Provisioning Domain pro- tocols and the Happy Eyeballs mechanism, which caches the outcome of its connection attempts. A key property of NEAT is that it not only “vertically” decouples applications from transport protocols, but also “horizontally”. Particularly, it enables applications to harness information about resource availability and policies from Software Defined Net- working (SDN) controllers in managed networks, without these applications actually being SDN-aware. To extend the use of NEAT to non-NEAT applications, we have implemented a BSD- compatible sockets API on top of NEAT and a NEAT proxy that intercepts and replaces stan- dard TCP connections with NEAT flows, i.e., with the transport solutions deemed most ap- propriate by NEAT. We have also proposed a way for non-NEAT applications to make use of NEAT through the deployment of NEAT-enabled virtual appliances in SDN-controlled net- works: connections from these applications are routed via an SDN-controlled proxy that terminates the original connection and replaces it with a NEAT-selected connection.

Participant organisation name Short name

Simula Research Laboratory AS (Coordinator) SRL Celerway Communication AS Celerway EMC Information Systems International EMC MZ Denmark APS Mozilla Karlstads Universitet KaU Fachhochschule Münster FHM The University Court of the University of Aberdeen UoA Universitetet i Oslo UiO Cisco Systems France SARL Cisco

2 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Contents

List of Abbreviations 5

1 Introduction 9

2 Extensions to the transport system 10 2.1 New transports for web browsing...... 11 2.1.1 Multi-streaming for Web traffic...... 11 2.1.2 The QUIC protocol...... 18 2.1.3 Role in NEAT and next steps...... 19 2.2 Extended policy system and transport selection...... 19 2.2.1 CIB sources...... 20 2.2.2 Transport selection and configuration...... 25 2.2.3 Role in NEAT and next steps...... 28 2.3 SDN controller integration...... 29 2.3.1 Integration strategies...... 29 2.3.2 NEAT external interfaces...... 31 2.3.3 Selected implementation scenario...... 33 2.3.4 SDN controller integration...... 35 2.3.5 Role in NEAT and next steps...... 36 2.4 PvD integration...... 36 2.4.1 Detailed description...... 37 2.4.2 Getting PvD information into NEAT...... 38 2.4.3 PvD JSON format and properties...... 38 2.4.4 Deployment scenarios...... 39 2.4.5 Role in NEAT and next steps...... 39

3 Transparent support of non-NEAT applications 41 3.1 NEAT proxy solutions...... 41 3.1.1 Traffic identification...... 43 3.2 SDN middleware...... 45 3.2.1 Network Hypervisor Integration...... 45 3.2.2 Next Steps...... 46 3.3 NEAT Sockets API...... 46 3.3.1 Implementation...... 47 3.3.2 Usage examples...... 48 3.3.3 with_neat...... 49

4 Conclusions 50

References 57

A NEAT Terminology 58

B Paper: Evaluating the Impact of Transport Mechanisms on Web Performance 61

3 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

C Paper: Raising the Datagram API to Support Transport Protocol Evolution 73

D Paper: A Datagram API for Evolving Networks Beyond 5G 80

E Paper: A NEAT Approach to Mobile Communication 83

F Paper: A NEAT Framework for Enhanced End-Host Integration in SDN Environments 90

G Demo: A NEAT framework for application-awareness in SDN environments 98

H NEAT Sockets API: list of API function calls 101

I Internet Draft: NEAT Sockets API 104

4 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

List of abbreviations

AAA Authentication, Authorisation and Accounting

AAAA Authentication, Authorisation, Accounting and Auditing

API Application Programming Interface

BE Best Effort

BLEST Blocking Estimation-based MPTCP

CC Congestion Control

CCC Coupled Congestion Controller

CDG CAIA Delay Gradient

CIB Characteristics Information Base

CM Congestion Manager

DA-LBE Deadline Aware Less than Best Effort

DAPS Delay-Aware Packet Scheduling

DCCP Datagram Congestion Control Protocol

DNS Domain Name System

DNSSEC Domain Name System Security Extensions

DPI Deep Packet Inspection

DSCP Differentiated Services Code Point

DTLS Datagram Transport Layer Security

ECMP Equal Cost Multi-Path

EFCM Ensemble Flow Congestion Manager

ECN Explicit Congestion Notification

ENUM Electronic Telephone Number Mapping

E-TCP Ensemble-TCP

FEC Forward Error Correction

FLOWER Fuzzy Lower than Best Effort

FSE Flow State Exchange

FSN Fragments Sequence Number

GUE Generic UDP Encapsulation

5 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

H1 HTTP/1

H2 HTTP/2

HE Happy Eyeballs

HoLB Head of Line Blocking

HTTP HyperText Transfer Protocol

IAB Internet Architecture Board

ICE Internet Connectivity Establishment

ICMP Internet Control Message Protocol

IETF Internet Engineering Task Force

IF Interface

IGD-PCP Internet Gateway Device – Port Control Protocol

IoT Internet of Things

IP Internet Protocol

IRTF Internet Research Task Force

IW Initial Window

IW10 Initial Window of 10 segments

JSON JavaScript Object Notation

KPI Kernel Programming Interface

LAG Link Aggregation

LAN Local Area Network

LBE Less than Best Effort

LEDBAT Low Extra Delay Background Transport

LRF Lowest RTT First

MBB Mobile Broadband

MBC Model Based Control

MID Message Identifier

MIF Multiple Interfaces

MPTCP Multipath Transmission Control Protocol

MPT-BM Multipath Transport-Bufferbloat Mitigation

MTU Maximum Transmission Unit

6 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

NAT Network Address (and Port) Translation

NEAT New, Evolutive API and Transport-Layer Architecture

NIC Network Interface Card

NUM Network Utility Maximization

OF OpenFlow

OS Operating System

OTIAS Out-of-order Transmission for In-order Arrival Scheduling

OVSDB Open vSwitch Database

PCP Port Control Protocol

PDU Protocol Data Unit

PHB Per-Hop Behaviour

PI Policy Interface

PIB Policy Information Base

PID Proportional-Integral-Differential

PLUS Path Layer UDP Substrate

PM Policy Manager

PMTU Path MTU

POSIX Portable Operating System Interface

PPID Payload Protocol Identifier

PRR Proportional Rate Reduction

PvD Provisioning Domain

QoS Quality of Service

QUIC Quick UDP Internet Connections

RACK Recent Acknowledgement

RFC Request for Comments

RSerPool Reliable Server Pooling

RTT Round Trip Time

RTP Real-time Protocol

RTSP Real-time Streaming Protocol

SCTP Stream Control Transmission Protocol

7 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

SCTP-CMT Stream Control Transmission Protocol – Concurrent Multipath Transport

SCTP-PF Stream Control Transmission Protocol – Potentially Failed

SCTP-PR Stream Control Transmission Protocol – Partial Reliability

SDN Software-Defined Networking

SDT Secure Datagram Transport

SIMD Single Instruction Multiple Data

SPUD Session Protocol for User Datagrams

SRTT Smoothed RTT

STTF Shortest Transfer Time First

SDP Session Description Protocol

SIP Session Initiation Protocol

SLA Service Level Agreement

SPUD Session Protocol for User Datagrams

STUN Simple Traversal of UDP through NATs

TCB Transmission Control Block

TCP Transmission Control Protocol

TCPINC TCP Increased Security

TLS Transport Layer Security

TSN Transmission Sequence Number

TTL Time to Live

TURN Traversal Using Relays around NAT

UDP User Datagram Protocol

UPnP Universal Plug and Play

URI Uniform Resource Identifier

VoIP Voice over IP

VM Virtual Machine

VPN Virtual Private Network

WAN Wide Area Network

WWAN Wireless Wide Area Network

8 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

1 Introduction

Introducing new native transport protocols in the Internet has become more or less impossible. This ossification of the transport layer seems to be the experience for all standard protocols other than TCP or UDP. To challenge this state of affairs, the NEAT System provides a transport API that is oblivious to specific transport protocols, and instead focuses on requested transport services. Applications pro- vide the NEAT System with information about their traffic requirements, pre-specified policies, and measured network conditions. Based on this information, the NEAT System establishes and config- ures appropriate connections. Through its design, the NEAT System enables new network and trans- port functions and protocols to be added incrementally and transparently. Figure1 provides an overview of the architecture of the NEAT System. The NEAT User Module com- prises five groups of components: NEAT Framework, NEAT Selection, NEAT Policy, NEAT Transport, and NEAT Signalling. The core functionality of the first four component groups has been developed in Work Package 2 (WP2). Work Package 3 (WP3), which in large part has run in parallel with WP2, has enhanced and extended this core NEAT System developed in WP2. Deliverable D3.2 wrapped up our work on transport-protocol enhancements, and this report complements D3.2 by documenting the extensions and transparent support for non-NEAT applications we have added to the NEAT System in WP3. Particularly, this document provides a final summary of our work on the following Task 3.2 and Task 3.3 activities:

• Task 3.2, “Transport system with extended functionalities”:

1. New transports for web browsing (Section 2.1): The work in this activity takes place in two threads: (1) Web traffic over existing alternative transports, (2) Web traffic over a new UDP transport component. In (1), the benefits of using SCTP and multi-streaming for web transport have been evaluated in different scenarios, considering several network and bot- tleneck buffer configurations. In (2), our previous work on a new UDP-based protocol, SDT, has been refocused and used as a starting point to implement the emerging QUIC protocol. Mozilla’s QUIC for Firefox implementation supports almost all active Internet Drafts of the IETF QUIC working group. 2. Extended policy system and transport selection (Section 2.2): An extended policy system has been designed. The policy system includes a passive network-path bandwidth esti- mation scheme that is also able to estimate the level of congestion. The extended policy system has been evaluated in several mobile use cases [32], which demonstrate how appli- cations with different transport-service requirements could employ the NEAT System in a multi-access WLAN and 4G/LTE environment, and, in so doing, obtain a significantly better service than would otherwise have been the case. 3. SDN controller integration (Section 2.3): Work on integrating NEAT with Software De- fined Networking (SDN) controllers has been completed in WP3. This work enables external CIB sources and allows NEAT applications to make use of network status information from SDN-enabled switches. 4. PvD integration (Section 2.4): The concept of Provisioning Domains (PvD) was defined in RFC 7556 [2] as a set of network configuration information which can be used by hosts in order to access the network. As part of our work on developing CIB sources, NEAT has been extended so that it is able to make use of PvD information in its transport selection process.

9 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

APP Class 0 APP Class 1 APP Class 2 APP Class 3 APP Class 4

Middleware

NEAT APP Support API NEAT APP Support Module

NEAT User API Policy Interface Policy Information DIAG & NEAT Framework STATS Base Transport Selection H and S Components Components Components Characteristic Information Userspace Transport NEAT Base SCTP/ SPUD/ Exp … Policy UDP UDP Mech USER Manager Policy Interface KERNEL KPI Traditional Socket NEAT Socket

NEAT Kernel Module

Experimental Experimental PCAP RAW IP TCP UDP SCTP SCTP/UDP TCP Minion Mechanisms Mechanisms IP

Figure 1: The architecture of the NEAT System.

• Task 3.3, “Transparent support of non-NEAT enabled applications”:

1. NEAT proxy solutions (Section 3.1): A NEAT Proxy solution was completed in Task 3.3. This NEAT Proxy is a way for non-NEAT applications to harness features of the NEAT System. It is a local proxy that intercepts and forwards TCP connections as NEAT flows. 2. SDN middleware (Section 3.2): In WP3, work on a middleware approach to integrate NEAT with SDN has been carried out. The work has resulted in a SDN-controlled VMware virtual app (vApp) that embeds NEAT in a virtualised environment, and in doing so enables for legacy network applications to make use of NEAT in an SDN context. 3. The NEAT Sockets API (Section 3.3): The BSD Sockets API is the de facto standard API for networking, spanning a wide range of operating systems. Although the NEAT project pro- motes a callback-based API [81], we have as part of WP3 implemented a sockets API shim layer, on top of NEAT, that is compliant with the BSD sockets API [18]. The NEAT Sockets API makes it fairly straightforward to allow a large part of the existing socket-based applications to leverage parts of NEAT functionality.

The document concludes in Section4. AppendixA provides a detailed explanation of the specific terminology used in NEAT, and AppendicesB throughG collect NEAT research publications directly related to the above-mentioned activities. Finally, AppendicesH andI give a detailed description of the functions comprising the NEAT Sockets API presented in Section 3.3.

2 Extensions to the transport system

This section reports on the activities that have been carried out during WP3 to extend the functional- ities of the NEAT transport system. It provides an updated view of the work in Task 3.2, first reported

10 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017 in Section 3 of Deliverable D3.1 [29].

2.1 New transports for web browsing

The NEAT User API enables transport selection and choice of an alternative transport protocol/mech- anisms by a web browser. NEAT WP3 therefore investigated mechanisms (like multi-streaming) that support web protocols and how such transport mechanisms influence web browsing latency. This section summarises the activities conducted in WP3 to evaluate the mechanisms across a range of transport protocols available in NEAT and their suitability for supporting the web use case. By exploring the root causes of latency and the benefit of individual mechanisms, this work can inform design of new transport protocols that could be supported in the future by NEAT.

2.1.1 Multi-streaming for Web traffic

Over the last decade web pages have evolved from simple compositions of images and text, to highly complex structures embedding interactive content and web applications [22]. To maintain a consis- tent performance with modern rich web content, several transport protocols and application-level performance improvement mechanisms have been introduced. The primary motivation for many of these techniques was to bypass the limitations introduced by deploying HTTP/1.1 [34] over a tradi- tional TCP service [71]. The absence of the ability to multiplex multiple requests or responses over a single connection in HTTP/1.1 encourages browser vendors to open multiple TCP sessions in parallel. In practice, most modern browsers (such as Mozilla Firefox and ) open up to six connections per host. In addition, servers often distribute web resources across multiple domains, a practice known as shard- ing, implicitly allowing the browsers to use even higher levels of parallelism. Although parallelism has benefits, there is no free lunch. First, the client-server session may ex- perience a large number of under-utilised connections (e.g., a connection may transfer only a small amount of data), which reduces efficiency due to the overhead required to open and maintain each connection. Second, breaking the transmission flow into many independent connections reduces the ability to provide adequate congestion control, making web traffic more aggressive towards other competing traffic [21, 44, 72]. Even so, it is still common for HTTP/1.1 clients to use multiple paral- lel connections to the same web server [28]. As a multi-streaming transport, SCTP [74] provides an alternate way to realise parallelism in the transport layer. Using a single SCTP association, a multi- streaming approach can identify sub-streams and relate these to the objects being transported. Al- though not widely supported, SCTP has been presented as a viable alternative for the web in [57]. For the web use case, we have explored the use of multi-streaming in SCTP against multiple TCP connections for the web extending the earlier analysis in [57]. As a part of WP3 activities we have built a custom web client, pReplay1, to generate a workload and modified a web server2 to support multi-streaming. We have also contributed a web workload model derived from the traces of HTTP requests/responses collected from a well-known measurement campaign [80]. Our evaluation con- siders different networking scenarios including a range of RTTs and capacities. Starting from a simple artificial loss based network with FIFO queuing, we extend our evaluation to congestion based net- work scenarios with several modern Active Queue Management (AQM) mechanisms.

1https://github.com/mrajiullah/pReplay 2https://github.com/nplab/thttpd/

11 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Table 1: Summary of the Webpage size and number of objects spread at 5, 50 and 95 percentile rank.

Size (KB) and Size (KB) and Size (KB) and Group name Size-rank (KB) # objects at 5% # objects at 50% # objects at 95% A 0.05-118 0.05 (1) 23 (6) 109 (39) B 119-565 129 (3) 325 (21) 532 (67) C 566-873 567 (6) 690 (25) 846 (69) D 874-1242 878 (6) 964 (45) 1183 (82) E 1243-1945 1286 (24) 1546 (55) 1901(119) F 1946-3315 2070 (49) 2454 (127) 3309 (228)

Table 2: Statistics for the web pages in the experiment.

(Approximate) Average Page Resource count Page size (KB) resource size (KB) google 8 23 9 dmm 21 330 15 siteadvisor 40 701 17 amazon 53 977 18 pinterest 6 1548 258 mediafire 75 2474 33

2.1.1.1 Experiment setup

An experimental analysis of transport mechanism performance requires a traffic workload model rep- resentative of actual web usage. Rather than developing a new model, we utilised a data set [80] that has been previously used for web performance analysis. The dataset provides the number and size of HTTP resources (objects) from 170 recorded web pages. The dataset also includes graphs represent- ing the dependency between resource requests and their processing times at the client. To build the workload model, we categorised the web pages according to the total size of all resources in a page. The total was used to put each page into one of the six groups (size-ranks), labeled A to F,organised so that each statistically significant group got at least 28 samples (see Table1). In the following, we only show results for the websites at the 50th percentile of each group. The statistics of the sites are shown in Table2. Our performance analysis used a set of three computers emulating a web client, a network, and a web server as shown in Figure2. We modelled a range of path RTTs representative of both desktop and mobile users, drawn from a distribution derived from an empirical study done by Mozilla (see AppendixB). We considered a range of symmetric paths at 2, 10 and 100 Mbps. We used two differ- ent bottleneck scenarios: (1) a simple one with no competing traffic and predefined patterns of loss (explored in our work in [65]); (2) one with competing traffic through a network bottleneck. In scenario 1, the network was emulated by the Dummynet traffic shaper [11], configured with given bottleneck capacity, delay, buffer size, and packet loss rate. Scenario 2 considered a bottleneck with the default FIFO queuing in Linux and the use of AQM, controlled via Traffic Control (tc) com- mands. Bulk TCP flows saturate the buffer at the bottleneck for the entirety of each experiment. The competing flows used Cubic CC. In our setup, both client and server supported TCP (Linux v. 4.2.0-42 and BSD) and SCTP (under

12 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Experiment network

Web client Emulated network Web server (pReplay) (Scenario 1/Scenario 2) (thttpd)

Control network Figure 2: Experiment setup.

BSD). The multi-streaming web server has been modified from the lightweight web server thttpd (tiny HTTP daemon) [78] supporting HTTP/1.1. Our custom made client, pReplay, emulates a HTTP/1.1 browser enabling requests with either a number of parallel TCP connections3 (1, 6 and 18) or a single SCTP connection using a number of streams. The number of streams is not a significant factor when using multi-streaming in SCTP,so we allowed up to 100 parallel streams. As a note, in our experiments, the number of parallel streams actually used was much lower. A key benefit of multi-streaming is the lightweight cost for additional streams [58], and this allows flows to open as many streams as they need. In our results description, we call it nTCPs when multiple TCP connections are used or multi- streamed SCTP when a single SCTP connection with multiple streams is used. The same Initial Window (IW) was used for both TCP and SCTP. The client used an IW of three packets, recommended by the IETF and common for Windows users. The server used an IW of 10, common for Linux-based servers, and an experimental IETF specification. The experiment parame- ters are summarised in Table3. The following two sections present a systematic study of web page load time (PLT) using HTTP/1.1 over both 1 TCP, nTCPs and multi-streamed SCTP in two different bottleneck scenarios. Our goal is to understand the conditions that benefit multi-streamed SCTP as compared to nTCPs. pReplay was used to measure PLT, the time between making the first web request and the time when either the last response is received or the last computation is completed. The results present data for an average of 30 runs, plotted with 95% confidence intervals. Cookies were not found to influence the PLT, so we omit results with different cookie sizes. The processing times in our dataset [80] were assumed to be an upper bound. Since this data was collected, advances in client platforms and in the way resources are parsed and processed have reduced this bound. We therefore also plot the PLTs with no additional processing time, to present a minimum bound. In the following we only include a subset of the results, whereas the draft paper in AppendixB contains the whole set of results.

2.1.1.2 Scenario 1

Figure3 shows the PLT with different numbers of parallel TCP connections compared to a single SCTP connection supporting multiple streams (100 in this case). We first discuss the case of no process-

3Common browsers open up to six connections to a single domain, but sharding contents across multiple web servers is also common.

13 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Table 3: Experimental parameters.

Category Factor Range/value

RTTs 20, 50, 100, 200, 800 ms Network Bottleneck capacity 2, 10, 100 Mbps Packet loss No loss, 1.5%, 3%

IW client (IW 3), server (IW 10) CWND validation no TCP/SCTP # parallel TCP flows 1, 6, 18 # streams in SCTP 100

Requests Cookie Size NULL, 512 B, 2 KB ing time dependency and no emulated loss (tail drop loss from router buffers was observed in some experiments). Parallelism allows multiple transport connections to each simultaneously send a resource, reduc- ing the PLT when multiple resources can be sent. Our experiments considered two ways to obtain parallelism, either using parallel TCP connections (each independently managing congestion control) or using multiple streams (where all streams shared a single congestion controller) in a single SCTP association. Figure3 shows the benefit of increased parallelism. In general, the PLT improved with increasing parallelism, except for pinterest in Figure 3e. In this case six and 18 TCP connections have a similar PLT because pinterest has only six objects (see Table2), In most cases, (except for the google site in Figure 3a), a multi-streaming approach provided a smaller PLT than the nTCPs, which suffer overhead from setting up multiple connections, and self- induced congestion from concurrency. Google is a small page (see Table2), the combined IW provided by nTCPs require a lower number of RTTs to fetch the page as compared to the multi-streamed SCTP with a single IW. Web page structure also had an impact on the PLT. When there is no parallelism, the number of resources influences the PLT more than the overall page size. This may be seen in Figure 3d, for 1 TCP, where the amazon page (with a larger number of smaller resources) completes much later than the pinterest page in Figure 3e (with fewer but larger resources, see Table2). Therefore, the number of resources and the average size of the objects have more impact on the overall web performance than the total webpage size. Parallelism alleviates this by reducing the delay from HoLB dependency for pages with many resources (e.g., the PLT for amazon is lower than that for pinterest when either multi-streamed SCTP or nTCPs are used). Our evaluations also consider packet loss. Figure4 shows the impact of a simple loss model on the PLT. Loss for a single TCP flow (1 TCP) results in head of line re-transmission delay and reduced con- gestion window. The PLT is reduced in the nTCPs case. Only the TCP connection(s) that experience loss are impacted by loss recovery, the throughput of other parallel flows is unchanged. In contrast, with multi-streamed SCTP head-of-line blocking only impacts the stream that experiences loss, but any loss impacts the congestion window for all streams sharing an association. This more conserva- tive congestion control results in a higher PLT. Next, we examine the impact of processing times on the PLT, see Figure5. The additional process- ing time does not significantly increase the PLT of a single connection (1 TCP). The request overhead

14 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

8 30 50 1 TCP 1 TCP 1 TCP 7 6 TCPs 6 TCPs 45 6 TCPs 25 18 TCPs 18 TCPs 40 18 TCPs 6 100s SCTP 100s SCTP 100s SCTP 35 20 5 30 4 15 25

PLT [s] 3 PLT [s] PLT [s] 20 10 15 2 10 5 1 5 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) google (b) dmm (c) siteadvisor

60 40 100 1 TCP 1 TCP 1 TCP 6 TCPs 35 6 TCPs 90 6 TCPs 50 18 TCPs 18 TCPs 80 18 TCPs 100s SCTP 30 100s SCTP 100s SCTP 70 40 25 60 30 20 50

PLT [s] PLT [s] 15 PLT [s] 40 20 30 10 20 10 5 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (d) amazon (e) pinterest (f) mediafire

Figure 3: PLT for 10 Mbps capacity, no loss, without processing time. for each resource dominates. Parallelism eliminates this, therefore the processing delay has a direct impact on the PLT.

2.1.1.3 Scenario 2

This section provides a deeper understanding of the impact of network bottlenecks under load (using two parallel competing flows) on the PLT. It considers a bottleneck with a drop-tail buffer (FIFO), and a bottleneck that uses AQM, either with CoDel [60] or flow-queuing CoDel (FQ-CoDel) [31]. The results in the previous section suggest that the number of web resources and the average size of web resources impact the transport more significantly than the total page size. The pinterest and mediafire web pages are suitable candidates to show that our claim is valid. Therefore, we have chosen pinterest and mediafire webpages to conduct experiments under AQM schemes.

FIFO In our experiments FIFO operates as a simple drop-tail queue. In an unmanaged buffer, the induced delay will add to the path latency. At 10 Mbps, assuming a 1500-B packet size, a 127- packet FIFO buffer requires 152.4 ms to completely drain. This extra delay negatively impacts the PLT performance. The difference between PLTs of pinterest webpages in Figures 3e and 6a clearly depict the impact of the induced delay with an unmanaged buffer at the bottleneck. Figure 6a shows the comparison of PLTs for pinterest. Since pinterest carries only few relatively large objects (six objects with average size of 258 KB, see Table2), a huge number of TCP connections are therefore not needed, at most three were used. nTCPs fail to provide any benefit in this scenario

15 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

14 45 90 1 TCP 1 TCP 1 TCP 40 80 12 6 TCPs 6 TCPs 6 TCPs 18 TCPs 18 TCPs 18 TCPs 100s SCTP 35 100s SCTP 70 100s SCTP 10 30 60 8 25 50 20 40

PLT [s] 6 PLT [s] PLT [s] 15 30 4 10 20 2 5 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) google (b) dmm (c) siteadvisor

140 120 250 1 TCP 1 TCP 1 TCP 6 TCPs 6 TCPs 6 TCPs 120 100 18 TCPs 18 TCPs 200 18 TCPs 100s SCTP 100s SCTP 100s SCTP 100 80 80 150 60

PLT [s] 60 PLT [s] PLT [s] 100 40 40 50 20 20

0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (d) amazon (e) pinterest (f) mediafire

Figure 4: PLT for 10 Mbps capacity, 1.5% packet loss, without processing time.

8 60 100 1 TCP 1 TCP 1 TCP 7 6 TCPs 6 TCPs 90 6 TCPs 50 18 TCPs 18 TCPs 80 18 TCPs 6 100s SCTP 100s SCTP 100s SCTP 70 40 5 60 4 30 50

PLT [s] 3 PLT [s] PLT [s] 40 20 30 2 20 10 1 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) google (b) amazon (c) mediafire

Figure 5: PLT for 10 Mbps capacity, no loss, with processing time.

16 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017 because more flows create more losses from the increased congestion as revealed from the trace anal- yses. Multi-streamed SCTP provides no benefit here. The initial congestion window (init_cwnd) of 10 segments, IW10, is the crucial parameter here. Given the size of each object in pinterest, it is not feasible to use IW10 to multiplex multiple objects in a single connection, thus no benefit is obtained from parallelism. Unlike the pinterest page, for the mediafire webpage multi-streamed SCTP performs better than 1 TCP,see Figure 7a. Here the objects are smaller (33 KB, see Table2), so multiplexing multiple objects in the same connections is more likely. However, since all the streams still share a single congestion window, losses in the long RTT paths degrade the performance as compared to the case with nTCPs. The poorest performance over 1 TCP connection stems from the fact that HTTP/1.1 requests are se- rialised over just one connection; a new request is made once the response to the previous request is received. In the case of mediafire the number of objects is 75, so nTCPs show benefits. 18 TCPs per- forms better in all RTTs for mediafire webpage under FIFO. This could be seen as a positive impact of individual CC for each connection.

CoDel The Controlled Delay [60] algorithm cuts down the queuing delay by maintaining a target value and monitoring the target over a pre-set time interval. If the minimum acceptable delay exceeds the target over the pre-set interval, packets are dropped from the tail of the queue at a calculated rate until the queuing delay drops below the target value. PLT significantly improves for both TCP and SCTP under CoDel compared to FIFO for both pin- terest and mediafire webpages, see Figures 6b and 7b. This is mainly due to actively managing the queuing delay. For RTTs 200 ms PLTs are largely similar for pinterest for all types of connections, ≤ including multi-streaming. nTCPs and multi-streamed SCTP perform similarly for mediafire webpage under CoDel in Figure 7b. 1 TCP performs much poorer than the rest for mediafire page even though the queuing delay is controlled under CoDel. This is due to the fact that HTTP/1.1 requests and re- sponses are serialised over one connection and the aggressive dropping nature of CoDel under load causes TCP SYN segments to be lost, therefore the request needs to be re-serialised. The results con- firm that parallelism is needed to improve the performance of webpages with many objects.

FQ_CoDel FQ_CoDel [31] is a hybrid algorithm which implements the CoDel algorithm on the sub queues of the flow queuing scheduler. The scheduler uses a five-tuple hashing algorithm to en- queue packets onto sub queues, and a deficit round robin (DRR) scheme to dequeue the packets from sub-queues. The FQ_CoDel mechanism ensures flow segregation and byte-based fairness when de- queuing packets from sub flows. Figures 6c and 7c show that PLT performance of mediafire and pinterest webpages is similar for nTCPs and multi-streamed SCTP. A single TCP flow achieves worse results, largely because head-of- line blocking dominates performance. A single SCTP association does derive benefit. This could in some cases be due to the lower RTT under load and due to the lack of collateral damage by the traffic with which it shares the bottleneck. Our results show that CoDel performs similarly to FQ_CoDel for web. This indicates that the pres- ence of flow queuing may not be essential to boost the PLT performance, a conclusion also found in previous research [33].

17 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

80 80 80 1 TCP 1 TCP 1 TCP 70 6 TCP 70 6 TCP 70 6 TCP 18 TCP 18 TCP 18 TCP 60 SCTP 60 SCTP 60 SCTP 50 50 50 40 40 40

PLT [s] 30 PLT [s] 30 PLT [s] 30 20 20 20

10 10 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) FIFO (b) CoDel (c) FQ-CoDel

Figure 6: PLT for 10 Mbps capacity, Pinterest website, congested bottleneck.

140 140 140 1 TCP 1 TCP 1 TCP 120 6 TCP 120 6 TCP 120 6 TCP 18 TCP 18 TCP 18 TCP SCTP SCTP SCTP 100 100 100

80 80 80

PLT [s] 60 PLT [s] 60 PLT [s] 60

40 40 40

20 20 20

0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) FIFO (b) CoDel (c) FQ-CoDel

Figure 7: PLT for 10 Mbps capacity, Mediafire website, congested bottleneck.

2.1.1.4 Summary

Our evaluation has focused on the TCP and SCTP protocols, however, the results are applicable to other transports that also need to work across an Internet path. In particular, the results are presented at a time when the IETF is developing the base mechanisms for a new web transport, QUIC [67]. Our results allow a better understanding of some of the mechanisms that will be utilised by QUIC. Specif- ically, the new method is expected to favour a multi-streamed approach, rather than a single stream (as originally proposed for TCP) or multiple parallel transport sessions (now the norm). The proto- col will share one congestion state (as has been implemented for SCTP) with persistent reuse of open connections (as in SCTP, but also emerging in SPDY [27] and standardised in HTTP/2.0 [6]), and loss recovery will be designed to eliminate head-of-line blocking and closely integrate with the require- ments for supporting HTTP/2.0. At the time of writing, almost 1/8 of web servers have introduced HTTP/2 support.

2.1.2 The QUIC protocol

Google has recently proposed a new experimental UDP based transport called QUIC [27, 51]. At the 96th IETF meeting the QUIC Working Group [64] was formed to drive standardisation of the QUIC protocol. This process has transformed QUIC from a proprietary protocol to an open standard pro- tocol, and this has led to a number of changes to the original protocol. The protocol’s focus on HTTP

18 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017 traffic has been broadened to include additional use cases, e.g., DNS. For better integration into the open ecosystem, the use of QUIC-crypto [51] as the security protocol has been replaced by the TLS 1.3 security protocol [66]. Many core features have been changed to make the protocol more robust and resilient against ossification. This process is still ongoing, and at the time of writing the protocol is described in four drafts [8, 35, 36, 77]. Early work in WP3 by Mozilla focused on a UDP-based protocol, SDT (Secure Datagram Transport), sharing some goals and features with QUIC. This work has been phased out in favor of refocusing on QUIC, keeping track of and contributing to the QUIC standardisation activities at the IETF. We have implemented an almost feature-complete QUIC protocol in accordance with the drafts cited above. The implementation covers most of the features the current QUIC drafts foresee, including: protocol handshake, version negotiation, encrypted streams, server stateless retry and flow control [36, 77]. However, the standardisation process is still in progress and the final architecture and an interface to the application are still in flux and not clear at the moment.

2.1.3 Role in NEAT and next steps

We have completed research that sought to better understand the impact of the transport system on the web performance experienced by users. This has evaluated a range of transport mechanisms and provided guidance on the design of a transport stack for web browsing. The results inform the design of policies for transport selection (e.g., understanding when to select a multi-streaming transport). Knowledge about the impact of network conditions and features avail- able in the NEAT System and the knowledge about benefits of specific algorithms in certain network conditions can further improve performance. To gain experience, a version of Firefox using the NEAT System was implemented. Testbed experiments in WP4 can aid understanding of the requirements and challenges in supporting web browser clients. Moving a transport protocol implementation into user space facilitates faster development and deployment as well as offering accessibility to enhancements in older operating systems. This has provided an opportunity to reflect upon and incorporate experience collected through observation of using a variety of transport mechanisms — and suggests suitable mechanisms to be incorporated in a new user-space implementation of transports (such as SDT or QUIC) that could in the future be provided below the NEAT User API. Although QUIC is a well-suited protocol for the NEAT System, its development is in too early a stage at the moment to be incorporated into the NEAT software. On the other hand, the analysis described in this section is expected to provide useful input into the QUIC standardisation. Some key mechanisms have been identified as benefiting from further practical experimentation with the NEAT System, and these will be further explored in WP4.

2.2 Extended policy system and transport selection

The NEAT System is developed to make an optimal transport selection based on the services requested by the application. Building the transport selection features includes developing policies that choose transport options in terms of protocols, parameters and interface(s). The result is based on match- ing application service requirements, path and interface characteristics, and policies, as described in section 5.4.3 of deliverable D1.1 [24] and in section 3.4 of deliverable D2.3 [45]. The following NEAT components are involved:

• NEAT Policy components: Collecting path and interface statistics and developing policies.

19 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

• NEAT Selection components: Selection of transport candidates based on above-mentioned matching.

• NEAT Transport components: Configuring transport protocol parameters and extensions.

Transport selection is among the most common features in all use cases presented in deliverable D1.1 [24]. The Mozilla and EMC use cases employ transport selection to select best supported trans- port options for different flows over available paths for Mozilla services and EMC data centre services, respectively. The Celerway use case utilises transport selection to provide optimal quality for different flows and optimal network utilisation. Finally, the Cisco use case uses transport selection to ensure prioritisation and low latency of particular applications. In order to facilitate optimal transport selection, characteristics about interfaces, paths and end- hosts should be collected. As described in D1.1 [24], the collected information will be stored in Char- acteristics Information Bases (CIBs) and the tools collecting such information are named CIB sources. Policy Information Bases (PIBs) will store policies that are matched with application requirements and CIB entries to find the optimal protocol, parameters and interface(s) for a certain NEAT Flow. Section 2.2.1 presents different CIB sources that have been implemented and that will be used to enhance application performance:

• A CIB source collecting meta-data information about mobile broadband connections will help selecting a suitable interface and transport protocol based on physical properties.

• A CIB source doing passive estimation of congestion and capacity of interfaces will help selecting interfaces with capacity to serve the application.

• A CIB source caching Happy Eyeballs (HE) results will save connection setup time as HE can be avoided for the next flows with similar path.

Section 2.2.2 presents an example of how the meta-data CIB source is used with policies in order to improve performance by selecting correct interface and transport protocol (TCP vs MPTCP). In WP4, we will combine the CIB sources and validate the performance gains in more detail.

2.2.1 CIB sources

In this section, we present a selection of different CIB sources that improve interface and transport selection in NEAT.

2.2.1.1 Mobile broadband meta-data

Meta-data about the device and its interfaces can be useful when selecting transport options and in- terfaces for a certain application. We have developed tools that collect and process relevant meta- data from devices and interfaces as described below. It includes current values, and also statistics and trending behaviour that can be used by the Policy Manager to make the best possible selection of in- terface and transport options for the applications. Currently, it is used and evaluated in the MONROE testbed 4.

• Type of device can be vendor and device model that might indicate for instance if it is mobile, stationary or IoT-like type of use.

4www.monroe-project.eu

20 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

• Battery level and knowledge of saving mode can be used to select the interface with least energy consumption.

• Type of interface can be vendor, model and type like mobile broadband, Ethernet, Wi-Fi, etc.

• Technology limited latency can often be given by type of interface and protocol standard. A mo- bile broadband modem, an Ethernet port and a Wi-Fi radio provide different levels of latency.

• Technology limited bandwidth can often be given by type of interface protocol standard. A mo- bile broadband modem, an Ethernet port and a Wi-Fi radio provide different levels of bandwidth.

• Frequency used by different radio interfaces like Wi-Fi and mobile broadband can result in dif- ferent levels of bandwidth, latency, stability and coverage features.

• Protocol standard means the supported standard of an interface modem. For Wi-Fi it could be 802.11g, .11n, .11ac, while on mobile broadband it can be CDMA or GSM. These standards give very different network characteristics and hence might affect transport selection.

• Protocol mode in mobile broadband can be 2G, 3G, LTE, etc.

• Sub mode for 3G mobile broadband can for instance be HSPA+.

• Signal strength is reported by radio modems like Wi-Fi and mobile broadband, and different levels may affect link quality.

• Cell ID is reported by mobile broadband modems, and statistics about this might hint to the stability of the interface quality.

• ISP is often reported by mobile broadband modems, and statistics and coverage of the ISP can be used to predict interface quality.

• GPS position is given by many devices, and this information can be used to predict stability and quality of certain interfaces.

2.2.1.2 Passive bandwidth estimation

In order to select the most feasible interface or path for a flow on multi-homed devices, knowledge about the capacity of the interface is important. We therefore developed a passive (zero overhead) estimation CIB source to be used for better interface selection. In a multi-homed device, being able to identify bottlenecks and actual capacity will help improve the application performance as congested interfaces can be avoided. Many of the previously existing tools for capacity and available bandwidth estimation are based on active measurements [13, 37, 52, 76], that is, they inject probe traffic into the network in order to perform the estimation, affecting in this way the current state of the network. On the contrary, passive measurements [5, 23, 41, 43, 49, 69] try to estimate based on the traffic that is currently observed in the network by implementing a monitoring or analysing tool. While passive measurements are less intrusive, they are often less accurate, as they lack control over the probing traffic. In NEAT, we should not rely on active probing that will consume data quota on mobile subscriptions , and hence we develop passive techniques trying to overcome some of the known challenges.

21 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Figure 8: Scenario for the downlink access capacity and bandwidth estimation Mostly, bottleneck detection or available bandwidth estimation techniques rely on the time be- tween consecutive packets, or “packet pairs”. These techniques can be categorised as either coopera- tive (with access to information from the sender side, which are typically active tools) and uncooper- ative environments, with information only from one side of the communication, which are typically passive estimation techniques that only monitor the traffic flows:

• Cooperative:

– Receiver Based Packet Pair (RBPP) [63]: Both transmission times and reception times are available. It is the most accurate technique, but requires deploying software at both the sender and the receiver.

• Uncooperative:

– Sender Based Packet Pair (SBPP) [63]: Measures bandwidth from that host to any other host. Uses the arrival time of the acknowledgements at transport or application level. However, they are susceptible to the acknowledgment policy of TCP,which makes it the least accurate. A solution offered to this is to add an active probing to cause large packets to flow in both directions and that can cause prompt acknowledgements (similar to sting [70]). – Receiver Only Packet Pair (ROPP) [48]: Measures bandwidth from any other host to the mea- surement host. When there is little cross-traffic, it is close in accuracy to RBPP but the level of cross-traffic is hard to predict.

We designed a scenario, as the one depicted in Figure8, where the receiving side of an end-to-end communication aims to estimate the level of congestion and the availability of bandwidth in its access links. We have limited information about the paths and we must assume an uncooperative approach, with no support or access to the sender side. Therefore, focusing the evaluation on the receiver side, we have designed a fully passive mechanism to measure the capacity and estimate the level of the congestion or the available bandwidth in the downlink, from the receiver’s point of view. The accuracy of the algorithm depends heavily on the accuracy of the packet reception timestamps. It has been

22 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

t t t t1 2 3 4 …

Tidle F1 F2 F3 F4 … t T

Figure 9: Packet inter-arrival times and inter-frame spaces tested on Linux platforms, which provide access to the time stamp at the receiving interface. If other platforms can grant access to this information with enough (per packet) accuracy, the algorithm could also be applied. Considering the arrival of packets at the receiving interface similar to the one in fig.9, our algo- rithm uses the packet size, Li, and the packet inter-arrival times, τi, measured as the time between the arrival of the first bit of two consecutive packets (τ = a a ). The packets arriving at the receiving i i+1 − i interface have traversed a variable number of hops, with different bandwidth and latency character- istics, but the maximum capacity that we are able to measure is that of the last hop, our access link. If previous hops have higher capacity, the aggregated traffic will have to queue and be transmitted at the pace given by the last hop capacity. Otherwise, if the previous hops have a lower capacity, the chances of having the aggregated packets queuing and transmitted at the rate given by the last hop are lower. Nevertheless, the packet inter-arrival times at the receiver allows the load in the access link to be monitored and to detect whether this link becomes congested, compared to the previous hops in the ongoing connections. Therefore, for every packet, the capacity estimation is: C = Li . Then, the i τi capacity of the link is the maximum capacity that we can measure following eq. (1).

Li Cb = maxCi = max . (1) τi Moreover, with the inter-arrival times and packet sizes we are able to use four different metrics to estimate the actual usage of the immediate downlink over a time interval T:

∑(Li) • Average rate (Ravg): total received data over the time interval: Ravg = T .

∑(Ci) • Equivalent per-packet rate (Rpkt ): Rpkt = N . Per-packet utilization of the link. If all the packets are transmitted back-to-back, it will be equal to the maximum capacity and closer to the average rate. It gets closer to the maximum capacity value (and to the average rate) as more packets queue. Therefore, the average in time and in number of samples (packets) do not match, unless all the packets are transmitted at the highest rates.

• Number of bursts: Consecutive packets are part of the same burst if the gap between them is not big enough to allocate the second packet (as transmitted at the maximum capacity of the link, once estimated), that is, it is not possible to fit adjacent packets in the inter-arrival gap: i tidle <= Li+1/max(C) τi <= (Li + Li+1)/max(C). Normalizing by the total number of packets re- ceived during the interval, it will be bounded between 0 and 1. The closer to 1, means that most “bursts” consist of just one packet. The closer to 0, means that most packets are received in a burst of more than one packet, thus the queue is mostly full at the access link router. The bursti- ness of the traffic will determine how much of the available capacity (idle time) is actually useful

23 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Figure 10: Example of the metrics for available bandwidth estimation on an Ethernet network.

(usable) and can translate into capacity for new connections without compromising our access link.

• Burst length (in bytes or packets): we use the mean or the median of the burst length in the

observed interval. We can also consider the burst length in time units (Bt ) as the total time that ∑(tburst ) packets are received in a burst over the total time interval under consideration: Bt = T

Considering these metrics we can estimate the usage level of our access link and, under some cir- cumstances, the proximity of a bottleneck (and its approximated value) and the capability of the ac- cess link to allocate more traffic given the current traffic load on the downlink direction. For instance, Figure 10 shows an example of the measurements from a receiver downloading traffic from different servers in an Ethernet network. The capacity of the access link, measured following eq. (1), is close to 1Gbps. However, the actual utilisation of the link and the equivalent per-packet rates are much lower than that. In addition, measuring the burstiness of the downloaded traffic, we can offer an estima- tion of the available resources and the traffic pattern that would be most suitable for opening new connections. In the case of mobile broadband network interfaces, we need to conduct further experiments, in order to avoid erroneous measurements due to the scheduling and the transmission intervals inherent to these kinds of networks. Next steps include further experimentation to adjust the thresholds to ac- count for a more accurate estimation. Preliminary tests in the emulation platform as well as in a wired and WiFi connection setup have provided promising results. Our objective for the future work in WP4 is to make our algorithm more robust and accurate. To do so, we plan on extending the experiments to include a wider set of traffic patterns and adapting the algorithm to work on mobile broadband network interfaces using the MONROE testbed 5 for evaluation.

5www.monroe-project.eu

24 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

2.2.1.3 Happy Eyeballs caching

NEAT uses HE to efficiently select the most appropriate transport solution. As pointed out in RFC 6555 [83], a HE algorithm should not waste networking resources by routinely making simultaneous con- nection attempts. In NEAT, we cache the outcome of previous connection attempts to the same peer. This information is stored in a CIB to be used by future flows. In this sense, the HE mechanisms in NEAT can be regarded as a CIB source. The cache lifetime is considered system dependent and should be set on a case-by-case basis. The impact and efficiency of the HE algorithm used in NEAT have been evaluated in Papastergiou et al. [62]. The paper suggests that caching significantly reduces the CPU load imposed by a HE mech- anism. It also indicates that the internal-memory footprint of a HE mechanism is essentially the same as for single-flow establishments.

2.2.2 Transport selection and configuration

The extended policy system, described above, enables transport selection and configuration based on detailed network information (e.g., link quality). Thus, the extension makes it possible to map appli- cation requirements and knowledge to transport services tailored for specific operating environments. To evaluate NEAT’s extended policy system in this context we performed a number of experiments in the multi-access environment shown in the right part of Figure 11. The details of how these experi- ments were conducted and their outcome can be found in a paper presented at the ACM SIGCOMM 2017 MobiArch workshop [32], included in AppendixE. The setup contained a client that accessed a web server, using both WLAN and 3G/4G. The server was under our control, and the mobile node was a NEAT-enabled MONROE mobile broadband (MBB) measurement node [1]. The MONROE node was equipped with dedicated software for measuring and experimenting in both WLAN and MBB net- works, and was able to continuously collect information about e.g. the signal quality (RSSI, RSCP,RSRP, RSRQ). For the WLAN interface, information about link quality, signal level, coding/modulation and link rate was collected. All this meta data was continuously fed to the NEAT CIB. To make full use of both available interfaces (WLAN and 3G/4G), both the client and the server were equipped with MPTCP-enabled Linux kernels. The left part of Figure 11 shows how the extended policy system was used, at the client, to create a transport service. The application provided, via the NEAT User API, a set of desired communication properties. The NEAT PM used this information together with the meta-data stored in the CIB and the policy information in the PIB to create a suitable transport service, in this example using TCP as transport. Next, we will showcase two sets of real experiments using this setup, i.e., using NEAT to compose suitable transport services using application knowledge, local hardware information, and run-time link-quality metrics.

2.2.2.1 Selecting transport

The amount of data to be sent often plays an important role. For example, interactive applications transmitting small amounts of data are often more sensitive to latency than bulk traffic applications. Simply put, applications have different requirements on the underlying transport. While a longer transfer can benefit from capacity aggregation through, e.g., MPTCP, shorter transfers are typically hindered by such approaches as performance is dominated by the worst path.

25 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

properties: Application PX = 100KiB v NEAT User API policy: if PX <= 100KiB or CY < 4G: ...NEAT User ModulePolicy Manager TCP New PIB CIB TCP MPTCP Transport

v New UDP TCP MPTCP metadata: Transport WLAN CY = 4G IPv4/IPv6 Internet LTE 3G/4G Client Server

Figure 11: Multi-access scenario.

To determine if NEAT can compose suitable transport services given different transfer sizes, we transmitted objects of different sizes using three transport-layer configurations: TCP, MPTCP and NEAT. For the TCP and MPTCP transfers, the respective protocol was used to download the objects. For NEAT, we created a policy that favored TCP over MPTCP when the amount of data to transfer was less than or equal to 100 KiB, and preferred MPTCP over TCP when the data size was greater than 100 KiB. The limit of 100 KiB was taken from previous work [12] on MPTCP over WLAN and LTE. The re- sults from this experiment are shown in the left graph of Figure 12, where the download time for each protocol, relative to TCP, is shown by the y-axis of the graph. The x-axis of the graph represents the object size. Each experiment was repeated 30 times and the variation is represented by 95% confi- dence intervals. The results confirm that MPTCP is a poor choice for transfers smaller than or equal to 100 KiB in this scenario. For instance, when transmitting 10 KiB the time required for MPTCP to fin- ish the transfer is approximately 18% longer than for TCP.The reason MPTCP performs worse is that data is sent over the slower LTE path. Staying exclusively on the faster WLAN path is clearly a faster alternative. However, when the amount of data to send increases, the benefit of using MPTCP,which load-balances the traffic over both paths, becomes evident. For the experiments with 1,000 KiB and 10,000 KiB transfers the gain of using MPTCP instead of TCP, translates to a 50-55% reduction of the transfer time. NEAT, on the other hand, selected the transport depending on the actual object size, and is therefore able to match the performance of the most suitable transport protocol in all experiments. Given the results one might think that TCP is always preferable to MPTCP for small data trans- fers, and vice versa for longer ones. This is not the case, as the quality of the respective connection, WLAN and cellular, plays a central role. MPTCP does not work well if paths are highly asymmetric, e.g., in terms of capacity and delay. While LTE connections often are roughly symmetric to WLAN connections, it is not always the case. When cellular coverage is poor, a fallback to 3G or 2G might be necessary, resulting in increased asymmetry which makes MPTCP a poor choice. This problem is illustrated by another set of experiments, shown in the right graph of Figure 12, where the mobile broadband modem was fixed to 3G. Compared with the previous results, the effect of switching from LTE to 3G is significant. For the shorter transfers, the effect of using MPTCP is similar to the LTE sce- nario. However, when the object size increases, so does the transmission time. For the largest objects, MPTCP actually requires 150% more time to complete. The reason for the increase is MPTCP’s mode of operation, which causes data to be sent over 3G as soon as it cannot send over WLAN. This situation occurs frequently during a transfer, as MPTCP deems the WLAN path to be unavailable whenever the congestion window of that subflow is full. NEAT is able to circumvent this performance problem by, first, not choosing MPTCP for transfers shorter than or equal to 100 KiB. This is due to the aforemen- tioned policy stating that TCP should be used for short transfers. Furthermore, NEAT continuously

26 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

WLAN and LTE WLAN and 3G Protocol 2.5 TCP MPTCP NEAT 2.0

1.5

1.0 Relative download time 0.5

0.0 1 10 100 1000 10000 1 10 100 1000 10000 Size [KiB] Size [KiB]

Figure 12: Relative download performance using TCP,MPTCP,and NEAT over WLAN and LTE/3G. collected quality metrics from the MONROE framework. This meta-data contained run-time infor- mation on, e.g., what technology was used by the mobile broadband modem. In this scenario, when the LTE interface was restricted to 3G, the CIB information in conjunction with a corresponding sys- tem policy indicated that the mobile interface was not suitable for MPTCP sessions, causing the PM to generate a transport service candidate based on TCP instead.

2.2.2.2 Transport Configuration

In the previous experiments, the application requirements, policies, and characteristics employed by NEAT led to different transport protocol choices. In many scenarios, the selection of protocols to chose from are limited. For instance, a protocol might only be available at one of the peers, or the chosen protocol needs to be configured properly to meet application requirements. Let us consider the same scenario; a client with both WLAN and LTE capabilities using MPTCP to access a remote web server. Furthermore, instead of selecting transport, the extended policy system of NEAT is instructed to sup- port low-latency traffic by configuring MPTCP to use the STTF algorithm (reported in D3.1 [29] and D3.2 [30]) to schedule data among different subflows. For the experiments, we chose five websites of different sizes (listed in increasing order): google.com, wikipedia.com, instagram.com, amazon.com, and theguardian.com. To perform the actual experiments, we used a set of applications on the client and server to perform the downloads. On the server, we used the nghttp2 web server which sup- ports HTTP2 with, e.g., HPACK header compression. To model the download process as thoroughly as possible we used data and dependency graphs from Epload [79] together with a custom-made client. The results of the experiments are shown in Figure 13. The graphs show both the average page load times (left) and the average object download times (right) for 30 repetitions of downloading each site using: the standard MPTCP scheduler (LRF), the BLEST [26] scheduler and NEAT which, given the low-latency policy used for the experiment, configured MPTCP to use the STTF scheduler. Figure 13 shows the average page load times and web object download times with 95% confidence intervals. The values are shown for the default LRF scheduler, BLEST, and NEAT (STTF). We define the

27 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Page load times Object load times 6000 700 Scheduler Scheduler LRF LRF 600 5000 BLEST BLEST NEAT (STTF) NEAT (STTF) 500 4000

400 3000 300

2000 200 Average transfer time [ms]

1000 100

0 0

google google amazon amazon instagram wikipedia instagram wikipedia

Figure 13: Page and object load times using two static MPTCP configurations (LRG, BLEST) and a dynamic NEAT configuration. object download time as the average time required to download a particular web object. The down- load time of an object is a relevant metric, as users’ browsing experience sometimes rely more on individual objects than an entire page. LRF poses the worst performance of the three schedulers, as shown in both graphs. BLEST can decrease latency compared to LRF, and NEAT (STTF) further re- duces latency for object downloads, with an exception to the Wikipedia site where the amount of data (132 Kbyte) simply was not enough to have significantly differing decisions among the schedulers. For some sites, the performance improvement given by, e.g., NEAT (STTF) is very significant. For instance, STTF completes web object transfers up to 51% faster than LRF for google.com. The reason for the differences in performance is simply that BLEST and NEAT (STTF) use the WLAN path more than LRF. The LRF scheduler sends approximately 60% of the data over the best path (WLAN), while BLEST and STTF use this path for almost 80% and 90% of the traffic, respectively. More details on STTF can be found in D3.1 [29] and D3.2[30].

2.2.3 Role in NEAT and next steps

This section has presented different CIB sources than can be used to optimise interface and trans- port selection for NEAT applications. Moreover, we have demonstrated how transport configuration and interface selection based on CIBs and policies improve performance. In further WP4 work, we will combine multiple CIB sources and demonstrate how NEAT can improve application quality and bandwidth utilisation. In this work, we will use the MONROE testbed with the NEAT library installed.

28 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

2.3 SDN controller integration

Traditionally, data centre networks have been built with proprietary, closed switches and routers that forward traffic based on local decisions made by each device, supported by a large number of complex networking protocols. This networking model cannot meet the flexibility required by modern appli- cations dealing with huge amounts of data that must often be transferred in real time: data centre networks are expensive to maintain, expensive to scale, and lack the capability for fast innovation. To address these problems, enterprises are turning to Software-Defined Networking (SDN). SDN solutions, typically controlled and monitored from a logically centralised controller, maintain a global view of the entire data centre network. With SDN, network traffic is managed with software appli- cations, which are more dynamic and easy to optimise and tune depending on the specific applica- tion requirements. Data centre applications are increasingly deployed on virtualised infrastructures, where data is exchanged mainly between virtual machines through virtual switches and routers. In such highly virtualised environments, SDN software controllers are becoming a crucial component of the virtual network layer connecting applications. EMC’s use case is focused on traffic in a data centre network managed by an SDN controller. The integration between the NEAT System and a network controller enables a flexible and direct influence on network policies: the global knowledge of the data centre network state provided by a network controller enables better transport optimisations in order to meet application requirements. The SDN controller that monitors and provisions the data centre network interacts with NEAT Policy compo- nents, pushing into the NEAT System additional information which policies can work on. More generally, the NEAT System can leverage any external device capable of providing useful net- work information or measurements to improve locally made decisions with broader feedback. In this sense, the SDN integration demonstrates the extensibility of the NEAT architecture. The following NEAT components are involved in the realisation of the NEAT-SDN integration: • NEAT Framework components: integrating statistics and measurements provided by an SDN controller into the NEAT System, to be consumed by a diagnostic application and user requests.

• NEAT Policy components: collecting statistics and policies from an SDN controller and inform- ing the SDN controller about existing policies.

• NEAT Selection components: selecting transport candidates based on the additional informa- tion provided by an SDN controller.

• NEAT Transport components: configuring transport protocol parameters and extensions ac- cording to policy installed by an SDN controller or triggered by additional information coming from the network.

• NEAT Signalling components: exchanging statistics and meta data between the NEAT System and an SDN controller.

2.3.1 Integration strategies

Activities within the NEAT-SDN integration topic can be classified into two optimisation areas:

Application optimisation: this involves approaches that aim to improve the performance of specific NEAT-enabled applications. In SDN environments with centralised control, this can be achieved by supplying applications with detailed information about the network state or by provisioning network resources based on specific application requirements.

29 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Network optimisation: this involves approaches that aim to centrally optimise the overall perfor- mance of data-centre networks by evaluating application requirements expressed through the NEAT interface, and by matching these with currently available physical resources in the net- work.

In the EMC use case — consisting of a data centre network with OpenFlow-enabled switches and NEAT-enabled end-hosts — we further consider the approaches outlined in Table4 to achieve the aforementioned goals of network and application optimisation.

Table 4: NEAT-SDN goals.

Application optimisation

Dissemination of network An SDN controller pushes information about the network topology metrics and associated path characteristics to some set of NEAT-enabled hosts. This information is ingested by the Policy Manager’s CIB component. Through the Policy Manager the CIB information is made available to NEAT applications running on the host to select the most suitable net- work interfaces and transport protocols. Targeted application tuning For a known type of applications an SDN controller installs policies into the PIB of the corresponding NEAT hosts, which enforce the use of a transport protocol or network path which is known to be suitable for the particular application. Example of such policies may be the use of MPTCP for elephant flows [50, 61] generated by data replication ser- vices or the use of pre-provisioned low latency paths for VoIP applica- tions. Active application requests In order to fulfill their requirements, NEAT applications may use a northbound interface of the SDN controller to actively notify it about the application’s desired resources and path properties during the con- nection establishment phase. The controller may choose to respond to such a request by provisioning resources that meet the given require- ments. Network optimisation

Global application view An SDN controller actively queries known NEAT hosts, e.g. by ran- domly polling information from the PIB/CIB, in order to determine the type of applications running on the hosts attached to the network (e.g. VoIP or backup clients). This information is used to augment the con- troller’s global network view, enabling the classification and prioritisa- tion of application traffic, as well as the provisioning of network paths. As a result the controller can ensure an efficient utilisation of the avail- able network resources. Targeted application han- An SDN controller targets specific NEAT applications which have pre- dling viously actively registered with the controller through a northbound SDN interface. In this scenario the Policy Manager acts as an agent which transmits the application properties and requirements (i.e., “database client” or “bulk file transfer”) to the controller prior to open- ing a connection.

These approaches illustrate that the optimisation goals can be achieved by exploiting informa- tion available at the NEAT hosts, at the controller, or a combination of the two. Hence, we further characterise the NEAT-SDN integration strategies with respect to the NEAT host interaction with the controller:

• Controller initiated: the SDN controller injects policies, and information about the resource availability in the managed network into the CIB/PIB of the relevant NEAT-enabled end-hosts.

30 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Conversely, the SDN controller may query NEAT hosts to collect information about flows origi- nating from the host and their respective requirements.

• NEAT initiated: NEAT applications explicitly notify the SDN controller about their requirements using a northbound SDN interface. NEAT initiated communication may consist of informational notifications to the controller containing application requirements.

The controller initiated approach has been developed and implemented as the main mechanism for the integration between NEAT and SDN. The current implementation of the Policy Manager can be adapted to support the NEAT initiated approach with minor modification.

2.3.2 NEAT external interfaces

The communication between NEAT end-hosts and an SDN controller can be classified according to the three dimensions presented in Table5. The features targeted by the NEAT-SDN integration activi- ties that are implemented in the controller initiated approach, are shown in the table in bold italics.

Table 5: NEAT-SDN communication.

Direction of communication

NEAT to SDN From the end-host NEAT stack to the SDN controller SDN to NEAT From the SDN controller to the end-host NEAT stack; needed for the controller initiated approach Bidirectional Bidirectional communication between the NEAT host and SDN con- troller Communication channel

In-band Information exchange between SDN controller and end-host NEAT stack occurs in the data plane, by encoding information in data packet headers (e.g., using DSCP values) Out of band Information exchange using dedicated connection and custom pay- load, such as standard or extended OpenFlow control messages or any other communication mechanism separated from the data plane Communication API

NEAT policy interface Interface providing CIB/PIB access to external entities; enables the controller initiated approach Controller API SDN northbound interface (i.e. REST) or implicitly leveraging the stan- dard OpenFlow packet handling and custom header or meta data in- side packets Bidirectional Exposed interfaces at both end-hosts and SDN controller to implement a duplex interaction

The CIB is the most appropriate entry point in the NEAT architecture for granting external devices a way of providing network statistics and measurements summaries. In the SDN scenario, we assume that the SDN controller knows how to reach NEAT hosts and can inject feedback into the correspond- ing CIBs using an out-of-band mechanism. In this scenario, the SDN controller acts as a CIB source. Similarly, the controller may be granted access to the PIB repository in order to install policies aiming to improve the overall performance of the network. In the current implementation, the SDN controller installs JSON encoded CIB nodes, containing relevant statistics directly into the CIB repository using a dedicated REST API. A CIB node can aug- ment existing entries in the CIB or insert new ones as appropriate. The SDN controller periodically

31 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Table 6: Network measurements provided by the SDN controller.

Name Description JSON property key Possible values

Path capacity Maximum bandwidth of a physi- path_capacity_mbps Single integer (in cal path between the source and Mbps) destination host, defined by the link providing the minimum band- width in the path Path available band- Available bandwidth on a physical path_abw_mbps Single integer (in width path between the source and desti- Mbps) nation host Number of concurrent The number of established and concurrent_flows Single integer flows in the network running flows (defined by the OpenFlow 5 tuple identification) in the SDN-managed network Available paths to a Number of physical distinct paths available_paths Single integer destination between the local source and the destination pushes and updates CIB nodes containing properties obtained from network measurements as well as topology information derived form the global network view. SDN controllers can directly obtain a number of statistics from OpenFlow-enabled switches6 which may be directly incorporated into CIB nodes:

Flow statistics: duration, priority, timeouts, packet count, byte count.

Table statistics: active entries, packet look-up count, matched packets count.

Port statistics: received/transmitted/dropped packets, received/transmitted bytes, received/trans- mitted errors.

Port description: port state, current/supported speeds, current/advertised/supported features.

Queue statistics: transmitted bytes/packets/errors.

A key advantage of the SDN integration with NEAT is the ability to provide NEAT hosts with end- to-end information about the network—such as topology, path utilisation, queue length and delays— which is not readily available today. Because SDN controllers possess a global view of the network, such characteristics can be derived by aggregating and correlating individual flow and port statistics. A set of relevant network statistics is listed in Table6. Each statistic is mapped to a property managed by the policy components, defined by a key for the JSON formatted CIB file and possible values that property can have. Listing1 shows an example of a CIB file containing the measurements and statistics provided by an SDN controller: a network path exists between a local interface on the end-host and a previously known remote endpoint, with 1 Gbps capacity and 345 Mbps currently available.

6https://www.opennetworking.org/images/stories/downloads/sdn-resources/onf-specifications/openflow/ openflow-switch-v1.5.1.pdf

32 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

1 { 2 "uid": "path_8", 3 "description": "SDN controller generated path information", 4 "match":[{"uid":{"value": "local_23"}},{"local_ip":{"value":["10.2.1.1","10.2.0.0 "]}}] 5 "properties":{ 6 "path_abw_mbps":{"value":345}, 7 "path_capacity_mbps":{"value":1000} 8 }, 9 } Listing 1: CIB node generated by SDN controller.

2.3.3 Selected implementation scenario

The above subsections described the general integration strategies and capabilities enabled by the SDN-NEAT integration. Below we further describe the particular data centre scenario and implemen- tation options chosen to illustrate these capabilities.

2.3.3.1 Data-centre environment scenario

The implementation of an effective (with respect to EMC’s use case) NEAT-SDN integration is driven by the following scenario within a data-centre environment. An SDN controller manages the data-centre network and has already provisioned several distinct physical paths between two NEAT-enabled end-hosts. An application, such as storage replication or disaster recovery, transfers large amounts of data from one host to the other through a standard TCP connection. The main objective is to optimise such transfer by leveraging the pre-provisioned physical paths and using multiple connections transparently with MPTCP. This scenario can be divided in three subsequent implementation phases:

1. SDN controller interaction: how can the controller feed CIB and PIB information into the NEAT System of the hosts.

2. Elephant flow detection: the data transfer application must be detected as a generator of ele- phant flows, in order to handle those flows more carefully and not to impact other flows running in the network, which may be latency sensitive.

3. MPTCP connection management: how the elephant flows can be mapped to MPTCP subflows and then to distinct physical paths in the network.

Phase #1: SDN interaction

The Policy Manager CIB and PIB repositories have been implemented in the NEAT System. The Open- Daylight framework is used as the SDN controller7, because of its popularity among the SDN commu- nity and enterprises deploying SDN environments. Furthermore, it was used by consortium partners in other research activities. The integration between the NEAT stack and OpenDaylight is implemented through a REST API which exposes each NEAT host’s Policy Manager (and the associated PIB/CIB repositories) for external

7https://www.opendaylight.org/

33 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

NEAT transport API / system

Application request properties

Policy Information Base async 1 access Expanded request Profiles properties 3 Policies Policy enforced candidates

Common Information Base REST API End-host 2 Host, protocol, path

Filtered properties External controller

NEAT Policy Manager NEAT candidates

Ranked connection candidates

4 NEAT connection selection

Figure 14: NEAT Policy Manager workflow. Applications submit their flow requirements as a set of NEAT properties through the NEAT transport API. (1) Profiles expand high-level properties to con- crete ones. (2) Result is filtered against properties collected in local CIB, generating set of feasible candidates. (3) Policies are applied to each generated candidate. (4) Ranked candidate list is used to initialize connections. The SDN controller interacts with the PM through the REST API. access. As a result the controller is able to inject CIB entries, e.g., for advertising network paths, as well as policies, e.g., to govern the handling of elephant flows. The integration of the REST API with the workflow of the Policy Manager is illustrated in Figure 14. At any point during the workflow the SDN controller is granted read and write access to the PIB and CIB repositories enabling it to obtain a list of installed policies, the system’s current view of the attached networks and to augment these if deemed necessary. On the other hand, the application can communicate with the controller implicitly, by adding entries to the PIB/CIB, or explicitly by instruct- ing the NEAT system to generate a message to the controller containing the requested properties. The latter is implemented by including a special to_controller property within a request.

Phase #2: Elephant flow handling

Traffic flows are often classified as elephants and mice, depending on the amount of data transferred and the lifetime of the connection. While mice flows are characterised by their short duration (and being often latency-sensitive), elephant flows are responsible for a huge portion of bandwidth con- sumption, related to long-lasting flows (e.g., file transfers, VM backups, or data synchronisation). In an SDN-managed network it is critical to identify elephant flows and to handle these by taking their characteristics as well as the requirements from the user/application into account. After the detection of an elephant flow, the SDN controller can then make decisions that may impact both the flow itself, and previously installed flows. As possible decision-making actions, the controller may install new

34 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

1 {"uid": "bulk_flow_policy", 2 "priority":14, 3 "match":{ 4 "data_volume_Gb":{ 5 "value":{"start":10, "end": "Inf"}} 6 }, 7 "properties":{ 8 "transport":{"value":"MPTCP"}, 9 "wired_interface":{"value":true, "precedence":1}, 10 "elephant_flow":{"value":true}, 11 "SO/IPPROTO_IP/IP_TOS":{"value":40, "precedence":2},} 12 }, 13 "replace_matched": false, 14 "expire":1493307808.030206} Listing 2: NEAT policy setting threshold for elephant flows.

flow rules matching elephant flows (e.g., to use a high-capacity path), manage QoS configurations in the switches or remotely optimise network protocol configurations at the end-hosts. As discussed in Deliverable D3.1 [29], several methods are possible to detect elephant flows. Within the context of EMC’s main use case involving integration of the NEAT System with SDN networks, we have focused on the implementation and evaluation of elephant flow detection in the NEAT end-hosts using policies mapping specific properties to elephant flows. Listing2 depicts a NEAT policy, which appends four new properties to any flow request specifying that the application intends to transmit more than 10 Gb of data, classifying this flow as elephant_flow.

Phase #3: MPTCP association management

Given an SDN architecture, there are two different generic MPTCP management strategies: (a) NEAT- based, SDN assisted and (b) SDN-based, NEAT assisted. For our scenario we have chosen the SDN- based, NEAT assisted strategy. By having an SDN-based approach, assisted by the NEAT System, an SDN controller can manage the selection of MPTCP and the creation of MPTCP sub-flows for bulk file transfers in the end-hosts based on local network knowledge, or based on the output from CIB queries. By pushing a policy, such as the one described in Listing2, the controller can instruct the NEAT system to use MPTCP and set other relevant properties for elephant flows. The controller can then use these properties to map the MPTCP subflows to suitable paths in the network.

2.3.4 SDN controller integration

Reference [9] (available in AppendixF), produced by project participants, summarizes the key findings outlined above and describes the current implementation of the NEAT Policy Manager API used for in- teractions with the SDN controller. The paper uses the handling of elephant flows in a managed data centre network as its primary scenario. In addition, the NEAT Policy Manager workflow and primi- tives utilised to implement arbitrary policies in NEAT-enabled hosts, e.g., handling elephant flows, are detailed. A demonstrator of the presented concepts was presented in [68] (available in AppendixG) where it received the Best Demo Award. Specifically, in these works the OpenDaylight controller is extended with a Northbound API to en- able communication with a REST API exposed on each NEAT enabled host. Through this interface the

35 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Northbound API

NEAT Connector

Path Flow Latency CalculatorNorthboundScheduler API Monitor

NEAT Policies SDN Controller

Path information OpenFlow rules

low latency path

Host NEAT Host

bulk transfer path

Figure 15: Topology for NEAT-SDN integration scenario. controller can inject entries into the PM’s PIB and CIB repositories. A network topology consisting of multiple paths connecting a NEAT-enabled data transfer application and a corresponding server, was implemented using the CORE emulator (Fig. 15). AS CORE is based on container technology running real software, the emulated setup translates easily to physical network deployments. In the main considered scenario, the controller publishes a policy to relevant NEAT hosts that es- tablishes the handling of elephant flows. This is achieved by defining a threshold above which data transfers are labeled as such. In the chosen scenario, we assume that the NEAT application speci- fies the amount of data that it intends to transmit as a property during the connection establishment phase. In addition, the controller installs a PIB entry specifying that any flow that is tagged as an ele- phant flow should be forwarded along a pre-provisioned path associated with a specific DSCP value. To this end, the deployed policy contains a socket option which is subsequently applied by the NEAT system, as needed. The SDN controller monitors the overall utilization of the network and ensures that the thresholds and paths for forwarding elephant flows are continuously updated, to achieve a fair utilization of the disjoint network paths.

2.3.5 Role in NEAT and next steps

This section has presented the integration of NEAT with SDN. This integration not only opens the door for fine grained, application-aware resource optimization strategies in SDNs, but also illustrates the flexibility of the NEAT framework. The provided REST API can be made available to any external device capable of providing useful network information. In further WP4 work, we will evaluate the performance benefits offered by the framework for the selected data-centre scenario.

2.4 PvD integration

The Sockets API offers a minimal set of information to applications, there are no mechanisms to make choices to support the most efficient use of network services. This traditional API has no available

36 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Cellular PvD Signalling

APP Tactile Provider Cellular IF Network NEAT NEAT IP Private WiFi API Stack WiFi IF APP AP

Conference Public WiFi INTERNET UDP Transport WiFi IF Provider AP Network Mobile APP CIB SRC

Download Public WiFi CIB SRC CIB WiFi PvD Signalling

Policy Private WiFi Manager CIB SRC

Figure 16: PvD integration with a NEAT host system with multiple available network interfaces. information to choose the interface that best suits the requirements of the application; instead, appli- cations must implement more and more features to best use the available network information. The INTAREA Working Group of the IETF is finalizing the specification of a network-to-host sig- naling system around provisioning domains (PvD) [2]. This work has support from all client operat- ing systems as well as some network vendors. PvDs are transmitted over IPv6 Router Advertisement messages so they work mainly in a network supporting IPv6. The information provided by the net- work goes beyond node addresses or DNS servers to also include the presence of a captive portal, the first hop network bandwidth, or whether the use of this link is free or metered (i.e., hotspot, mobile network). Applications can then select to use one specific PvD with full knowledge of the network services. NEAT leverages PvD into the CIB, converting characterising and configuration provided by PvD into CIB format information and stored as an extension to the per-interface information. Applications can be used with NEAT to signal desired flow properties, allowing the application to take advantage of new network technology as it emerges, without the application requiring to be rebuilt. NEAT uses the CIB to select interfaces based on application class indicated in the NEAT User API. In the example depicted in Figure 16, the Tactile application connects over an ultra-low latency interface, The Conference application is given a highly-reliable cellular interface matching robustness requirements, and the Download application is given higher throughput interfaces, but might not connect at all on metered networks. (See AppendicesC andD for further details on API-related issues.)

2.4.1 Detailed description

The following NEAT components are involved in the realisation of PvD integration:

• NEAT Selection components: selecting network interfaces based on the additional information provided by a PvD daemon.

• NEAT Signalling components: collecting characteristics and meta-data from a PvD daemon into the NEAT System.

PvD Information is made available to applications through a JSON file with the network properties signalled. The location of the property information can be directly configured in the network stack or automatically discovered using IPv6 Router Advertisements. PvD signal information in Router Adver-

37 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

HTTPS Provider PvD Network

Router Ad

APP APP HTTPS PvDD PvD Mon PvD

NEAT API

NEAT Stack Policy Manager Transport CIB IP UDP OPT PvD IP Interfaces

Figure 17: NEAT Integration with PvD using an HTTPS Proxy. tisements can be proxied by routers on the path, and the property configuration URL can be proxied by a series of HTTP servers to propagate the information. NEAT leverages this proxying capability to provide integration with the available PvD Daemon pvdd.

2.4.2 Getting PvD information into NEAT

PvD information can be brought into a system using either an IPv6 Router Advertisement parsing daemon such as pvdd or directly by configuring the location of the PvD JSON URI. NEAT supports re- ceiving this information from a directly configured URI, this URI is passed to the NEAT Policy Manager as an argument. IPv6 Router Advertisements (RA) are a multicast automatic network configuration mechanism commonly used to help bring hosts onto a network. PvD currently uses an experimental RA option, this new option carries a Fully Qualified Domain Name (FQDN) encoding the location of the PvD con- figuration JSON. The system integrating with PvD uses the provided FQDN to request the PvD JSON configuration from a well known location on a web server over TLS. The HTTPS architecture allows support to be brought into applications that are unable to parse Router Advertisements and creates the ability for HTTPS web servers to proxy the upstream networks configuration information, as shown in Figure 17.

2.4.3 PvD JSON format and properties

PvD uses JSON formatted information to carry the signalled network properties and characteristics. PvD information can carry many different properties; a key set of these with examples is in Table7.

38 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Table 7: PvD JSON Keys and example values.

JSON key Description Type Example

name Human-readable service name UTF-8 String "Awesome Wifi" expires Date after which this object is see [47] "2017-07-23T06:00:00Z" not valid prefixes Array of IPv6 prefixes valid for Array of strings ["2001:db8:1::/48", "2001:db8:4::/48"] this PVD localizedName Localized user-visible service UTF-8 string "Wifi Genial" name, language can be selected based on the HTTP Accept- Language header in the request noInternet No Internet, set when the PvD boolean true only provides restricted access to a set of services metered Use of the link is metered, ac- boolean false cess volume is limited

PvD JSON files can contain entries for a number of networks if the upstream provider can offer addresses with multiple different source prefixes. Listing3 shows an example PvD JSON file carrying information about two different networks. The NEAT CIB offers a method for provisioning extension information to an existing CIB Entry.

2.4.4 Deployment scenarios

PvD network information can be used by the NEAT System to make a new set of selection criteria accessible to applications. To evaluate the integration of PvD with NEAT a set of scenarios has been created, the scenarios give example environments where a NEAT Application can use PvD information to make a selection. The scenarios use a set of exemplar applications and network environments where PvD informa- tion could be signalled and the correct selection choice the NEAT System should make is used as an example. Four example applications have been specified with a description of the network properties they will request from the NEAT System. Table8 shows the four applications, the properties they re- quest and whether the request is mandatory or not. Each of the applications will be tested in a number of network configurations, with each network configuration intended to representative of a network which would be encountered in the real world. Four network configurations are described in Table9. PvD properties can be used by NEAT to perform selections based on application specified criteria. To test this functionality, a scenario needs to be devised that will allow a selection preference to be specified and the correct network can be chosen based on selection properties. Table 10 shows the different evaluation scenarios that we consider.

2.4.5 Role in NEAT and next steps

This section has presented PvD as a CIB source that can provide enhanced information that cannot be measured by the NEAT stack. The information provided by PvD makes policy decisions using higher level information possible.

39 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

1 { 2 "name": "Foo Wireless", 3 "localizedName": "Foo-France Wifi", 4 "expires":"2017-07-23T06:00:00Z", 5 "prefixes":["2001:db8:1::/48","2001:db8:4::/48"], 6 "characteristics":{ 7 "maxThroughput":{ "down":200000, "up":50000}, 8 "minLatency":{ "down":0.1, "up":1} 9 } 10 } 11 12 { 13 "name": "Bar4G", 14 "localizedName": "Bar US4G", 15 "expires":"2017-07-23T06:00:00Z", 16 "prefixes":["2001:db8:1::/48","2001:db8:4::/48"], 17 "metered": true, 18 "characteristics":{ 19 "maxThroughput":{ "down":80000, "up":20000} 20 } 21 } Listing 3: Example of PvD JSON format.

Table 8: Applications used in PvD scenarios.

1 2 3 4 Description Audio conference at Download movie Web browser IoT (work, home) work Medium capacity High capacity Free cost JSON request Low latency Low cost Corporate domain

Legend Preferred Mandatory

40 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Table 9: Attributes of Scenario Networks.

Corporate Coffee Home IoT Int Ext shop Capacity (Mbits/s) 100 100 100 1 1 Latency (ms) 10 10 10 500 N/A Cost High High 0 0 0 Allowed domains Lim. * * * Lim.

Table 10: Evaluation Scenarios.

Corporate Coffee Home IoT Int Ext shop Public (No PvD) 2 3 Office 1 2 3 4 Office/Public 1 2 3 4 Home 1 2 3 4

In further WP4 work, we will evaluate NEAT on PvD enabled networks with multiple possible net- work links and demonstrate how NEAT can improve application quality and bandwidth utilisation. In this work, related to Cisco’s use case, we will use the testbed at the University of Aberdeen. The Scenario evaluation network will be configured with a number of different characteristics to simulate different test scenarios, as presented above, and a NEAT-based application will be run over it. The NEAT based application will make a number of policy requests to select between different network test scenarios.

3 Transparent support of non-NEAT applications

Some existing applications can be difficult to migrate to NEAT. However, they could still benefit from NEAT features like the policy and selection system to get better performance. Section 3.1 describes a NEAT proxy that gives NEAT functionality to non-NEAT TCP-based applications. Section 3.2 intro- duces a middleware approach for integrating NEAT functionality into a virtualised SDN infrastructure. This enables legacy applications to make use of the enhanced transport mechanisms offered by NEAT in a managed environment orchestrated by a SDN framework. Finally, Section 3.3 presents a NEAT Sockets API that can benefit legacy applications that do not use a callback API.

3.1 NEAT proxy solutions

Existing and new applications that are not built on NEAT can still benefit from the optimised interface and transport selection provided by NEAT. This section introduces the NEAT proxy: a local proxy that enables basic NEAT support to unmodified TCP-based applications. The proxy intercepts TCP connections and is transparent to the applications. A TCP connection is terminated on the proxy, and a NEAT-based connection to the original destination is created on behalf of the application. The packets are then forwarded to the remote host and back until the connection is closed by one of the sides. If the remote host supports MPTCP, SCTP or QUIC then one of those protocols can also be used by the proxy to improve performance. Figure 18 illustrates a case where a

41 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Figure 18: TCP-based application tries to connect directly via the default route. No NEAT proxy is used.

Figure 19: TCP-based application connects via a NEAT proxy, with a TCP connection to the server.

TCP flow is not using NEAT or the NEAT proxy, and where the default route (interface) is not working. Figure 19 illustrates a case where the NEAT proxy is used to select the best interface for a TCP flow. Figure 20 illustrates a case where the NEAT proxy uses multi-homed SCTP for a TCP flow. The main target of the NEAT proxy are multi-homed devices, routers or computers connected si- multaneously to multiple network providers. The major advantage of this solution is that even multi- homed and NEAT unaware applications can benefit from NEAT, particularly from Happy Eyeballs and intelligent path selection. Happy Eyeballs checks which network supports what type of traffic. It tests all available connec- tions for connectivity with the desired destination on the chosen port and with the specified protocol. This lets us eliminate networks that block certain traffic or simply do not work. For example, if the ssh port is blocked on the default route, HE will choose a different interface that lets ssh traffic through. Non-multihoming-aware applications usually have no means of knowing which network out of all available is best in terms of capacity or reliability. But this knowledge might be available on the system. Celerway routers collect a range of information about available network parameters and performance. This information is exported by a meta-data exporter, implemented by Celerway, to NEAT’s Policy Manager as CIBs. These CIBs combined with policies (PIBs) deployed on the system make it possible to make an intelligent path selection on behalf of the application. The NEAT proxy has been implemented as a C application that runs as a daemon on a machine where the non-multihoming-aware application runs. It is also possible to run the proxy on another machine (router) and intercept the traffic from the application on that router. Our proxy design relies on a feature available in the standard Linux kernel — i.e., tproxy, which adds transparent proxy support. It requires socket match and the TPROXY target to be enabled in the kernel configuration. It also needs policy routing. To make the NEAT proxy work, routing and the iptables must be configured accordingly. The re-

42 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Figure 20: TCP-based application connects via a NEAT proxy, with an SCTP multi-path connection to the server. quired steps can be found in the TPROXY documentation8. The only required steps are actually rout- ing and the TPROXY iptables target. The DIVERT target is an optimisation. Selected packets are deliv- ered locally to our proxy with the destination address unmodified. The proxy accepts packets with a non-local IP address as the destination address and then forwards packets with a non-local IP address as the source.

3.1.1 Traffic identification

In a scenario where the NEAT proxy is not aware of the application type, we developed a tool that anal- yses the traffic type, so that the proxy can make application-specific interface and protocol selections. A typical use case for this is a Celerway router with traffic splitting functionality. The traffic identifi- cation tool is also a CIB source as described in Section 2.2.1, but we present it here since it is tightly connected to the proxy implementation. Deep analysis of TCP flows enables the development of smarter systems to enhance the task of interface selection and can leverage the flexibility provided by the NEAT framework. The basic set of information of a TCP flow is the tuple made of source and destination ports and IP addresses. While port numbers are associated with the type of application, along the years they have become less accu- rate for traffic classification due to use of dynamic ports, port manipulation in order to pass through proxies and the fact that most applications uses HTTP. Therefore, alternatives such as Deep Packet Inspection (DPI) have been developed, that look into the headers and the payload of the packets flow- ing through the network to identify the application or services generating the traffic. Some of the DPI solutions9 try to find a pattern that can uniquely identify the application and match it against a set of identified “application signatures” [42, 46]. This method can be extremely accurate once the signature has been found, which may not be an easy task. In addition, the durability or the validity of the sig- natures has to be constantly monitored. Lately, there have been other approaches that involve more or less complex mechanisms, based on flow features that are fed into a machine learning algorithm [3, 56, 59, 82]. The challenges in this case are first, to find the key set of flow features that can iden- tify accurately the applications and second, find the most appropriate algorithm, without a very high computational cost and with a reasonable accuracy. In addition to the aforementioned challenges, increased use of encrypted traffic makes it even more difficult to analyse the content of the payloads. Although there are plenty of previous works that extract information from flow traces and classify with reasonable accuracy the type of traffic and application, in our case we cannot leverage these kinds of solutions because we aim to make a decision before the data traffic is actually sent, for optimal

8https://www.kernel.org/doc/Documentation/networking/tproxy.txt 9https://www.ntop.org/products/deep-packet-inspection/ndpi/

43 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Figure 21: Block diagram of the flow classification system. interface selection. Even if just a few packets might be enough to identify the type of application [7, 55], we need to decide and select the most adequate transport option before any traffic exchange. Taking into account the advantages and disadvantages of each approach, we designed a hybrid method for traffic classification that is composed of a knowledge base, an offline learning stage and a real-time flow classifier as shown in Figure 21. The flow classifier is built in a sequential way, and it can only combine the information available at the establishment of a connection to determine the type of application that is generating the traffic. Identifying the application or the traffic to be sent helps the router or the end host make a decision. This decision is the choice of the best interface to route the traffic, given the characteristics of the system and the foreseen traffic to be sent. However, the classification of the flows will also assist in prioritising traffic with specific needs and block some traffic that is detected as a potential risk. More- over, we will monitor the characteristics of the flows being classified and collect statistics that will increase the knowledge of the offline learning phase and improve or update the knowledge base. The flow features block is built in two steps. Since the flow classification needs to be applied before the connection is established, not all the information about the flows is available. The first step relies on the information available when the application requests to open a connection: port numbers, IP addresses and transport protocol. After the connection is established, even though the routing de- cision has already been made, the flow is monitored and we collect the statistics needed to feed the offline learning block, which can access the full set of flow features after the transmission is com- plete and offer an enhanced classification the next time a flow targeting the same destination address and/or port is detected. Note that, in order to be able to identify encrypted traffic as well, the classifier

44 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017 needs to select and collect the flow features that are not “hidden” by encryption, like the number of packets in each direction of the flow, the average packet rate or the inter-packet arrival times. When the connection requests to access ports 80 or 443 (HTTP or HTTPS) we need to iterate again to discover the specific service. In order to do so, the flow classifier accesses the knowledge base. The knowledge base gathers information from different sources mapping IP addresses, hosts and services offered. This information is collected from public databases, open data and projects that monitor and match server information and the kind of service that they host as well as from the experience of our own classification. If there is more information available, we can apply further knowledge about the server being accessed or the type of request or look for a pattern match between the payload of the first packets of the connection and the signature base. However, that information is not available when the decision needs to be made, so we can only base our decision on previous experience or already known services associated to the given IP addresses and ports. Monitoring the flow once established and the statistics gathered will be used by the offline learning block, which will be able to improve the accuracy for future classifications. In WP4, we will use the MONROE testbed to evaluate the NEAT proxy and the associated traffic identification method.

3.2 SDN middleware

This section outlines a middleware approach for integrating NEAT functionality into a virtualised SDN infrastructure. The motivation for the approach is to enable legacy applications to make use of the enhanced transport mechanisms offered by NEAT in a managed environment orchestrated by an SDN framework. The approach embeds NEAT into a virtualisation framework, such as VMware vSphere/NSX, transparently exposing NEAT’s functionality to legacy applications running on the plat- form. Specifically, we implement a lightweight, NEAT-enabled virtual appliance which is deployed at key locations in the infrastructure and managed by a centralized SDN controller. The appliance func- tions as a front-end for network services such as database servers, backup repositories, or message bus brokers deployed in the data centre. Instead of connecting directly to the server, a legacy applica- tion establishes a TCP connection to the NEAT appliance, where the connection is terminated and the payload is redirected to the destination server using a transport protocol selected by NEAT. Thus the appliance functions as a NEAT proxy making use of NEAT’s policy system to select the most suitable transport protocol. As a result, the Policy Manager’s external REST API may be utilized to communi- cate with the infrastructure’s controller component. The design is illustrated in Figure 22.

3.2.1 Network Hypervisor Integration

In order to integrate the proxy into the virtualisation layer we make use of application bundles, called vApps in VMware environments. VApps facilitate deployment and management by packaging mul- tiple virtual machines into a single entity. Thus, we can include one ore more NEAT SDN proxies together with a specific virtualised application, e.g., an SQL database server. The vApp framework is then used to configure the location of the proxies in relation to the server application. For example, an administrator may specify that the server should be deployed on a resource pool located in data centre A, while two proxies should be instantiated in data centres B and C respectively, which are located in different geographical regions. Subsequently, the SDN controller will set up routing such that client applications located at data centres B and C will connect to the nearest NEAT SDN proxy. Further, the

45 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

SDN Controller

vApp setup Flow parametersand policies Network path path Network Client Server SDN API

C Legacy S P SDN optimized connection(s) connection (e.g. MPTCP, SCTP, DCTCP) (e.g. TCP) Proxy

Figure 22: SDN middleware architecture. controller may configure the NEAT SDN proxies to utilize an efficient transport protocol to connect to the database server over respective WAN paths. Alternately, the NEAT SDN proxy functionality may be incorporated directly into the hypervisor layer. This direction may be explored as a future evolution of the approach as it involves significant modifications to the hypervisor.

3.2.2 Next Steps

In WP4 the current implementation of this NEAT proxy will be evaluated in parallel to the EMC indus- trial use case. The goal is to compare the performance of the middleware solution with the perfor- mance of the modified Rsync version adapted to support NEAT [10, § 2.2]. Additional work needs to be carried out to enhance the integration of the proxy’s PM interface with an external controller. Given the closed source nature of the VMware Network Hypervisor efforts carried out in NEAT must rely on open implementations such as OpenDaylight. To reduce the time needed to establish a connection, the NEAT proxy pre-provisions connections to the destination server. Additional optimization of the associated mechanism is needed to ensure that the proxies can handle a large number of incoming connection requests.

3.3 NEAT Sockets API

While the NEAT User API [25] provides a powerful, state-of-the-art API for programming new network- ing applications, it is also necessary to take care of existing applications. These applications may be ported to NEAT, in order to make full use of the rich feature set provided by NEAT. However, many existing applications do not use a callback API. Classically, applications in Unix-like systems (Linux, FreeBSD, NetBSD, MacOS X, . . . ) are based on the BSD Sockets API [73]. This API provides blocking — and optionally non-blocking — function calls. Using functions like select() and poll(), it is pos- sible to wait for events. However, the necessary event loop has to be realised as part of the program itself; there are no callbacks. Over the years, the basic BSD Sockets API — in slightly different variants for each operating system — has also been extended with additional function calls to support e.g. multi-homing with SCTP [75]. This heterogeneity, as well as protocol dependency, require careful programming in order to remain as

46 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017 platform-independent as possible. Due to the large user base, i.e., all existing networking applications based on the BSD Sockets API, it is useful for NEAT to offer legacy support for this API. The idea is therefore to provide a BSD-Sockets-like API on top of the callback NEAT User API, in order to make porting applications as easy as possible. Goals of this NEAT Sockets API [18] are therefore:

• Remain as compatible as possible to the BSD Sockets API [73] (although there is not one BSD Sockets API; each operating system is slightly different),

• Provide a small set of additional, NEAT-specific functions to easily make use of additional NEAT features.

While the NEAT Sockets API is not intended to provide 100% of the NEAT functionality, it aims at providing some key features, while keeping the porting effort as small as possible. For example, name resolution and transport-layer security features provided by the NEAT library can be leveraged by non- NEAT applications (more on this below).

3.3.1 Implementation

The NEAT Sockets API is implemented as a shim layer on top of the NEAT User API. The functions provided by the NEAT Sockets API [18] can be categorised into eight groups:

1. Initialisation and Clean-Up (AppendixH, Listing4).

2. Connection Establishment and Teardown (AppendixH, Listing5).

3. Options Handling (AppendixH, Listing6).

4. Input/Output Handling (AppendixH, Listing7).

5. Poll and Select (AppendixH, Listing8).

6. Address Handling (AppendixH, Listing9).

7. Miscellaneous (AppendixH, Listing 10).

8. Security (AppendixH, Listing 11).

The functions provided are as compatible as possible to the BSD Sockets API [73], including the multi-homing SCTP extensions [75]. The prefix “nsa_” (NEAT Sockets API), is added to differentiate the NEAT Sockets API functions from the BSD Sockets API functions. A programmer therefore has to explicitly call the NEAT Sockets API functions to make use of NEAT. Since the BSD Sockets API func- tions are well-documented elsewhere [73, 75], and the NEAT Sockets API itself is documented in detail in [18], the reader is referred to those documents for a full explanation of the different API calls. Ap- pendixH lists all function call prototypes, and Internet Draft [18] is provided in AppendixI. From the implementation perspective10, an additional thread handles the callbacks. Mutexes and conditions coordinate the synchronisation between the program’s thread(s) and the NEAT Sockets API main loop thread. Some more details on the implementation can be found in [15], which describes a very similar API for rsplib, an implementation of Reliable Server Pooling (RSerPool). Another similar API can be found in the sctplib callback-based, user-space SCTP implementation [40].

10The code is available at: https://github.com/NEAT-project/neat/tree/master/socketapi.

47 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

While the NEAT User API [25] — and therefore the NEAT Sockets API [18] — is an API for network sockets, it is important to mention that the BSD Sockets API is tightly integrated into the Unix way of handling files. The idea of Unix is that “everything is a file”. A socket descriptor is therefore like a file descriptor. Many Unix API calls therefore work with both, socket descriptors (e.g., send over a socket, receive from a socket) and file descriptors (e.g., write to a file, read from a file, read from a pipe) as well. Particularly, the functions to wait for events (select(), poll(), etc.) take socket descriptors and file descriptors simultaneously. The corresponding NEAT Socket API functions (nsa_select(), nsa_poll(), etc.) therefore need to work with file descriptors as well. Thus, the NEAT Socket API also has to contain wrapper code around file and pipe functions, in order to handle the mapping between the operating system’s descriptor space and the NEAT descriptor space. Unlike the BSD Sockets API, NEAT also takes care of the DNS name resolution of host names. In programs based on the BSD Sockets API, this has to be realised by calls to corresponding functions like getaddrinfo(). The program then has to create a socket of the right family (IPv4, IPv6) and han- dle addressing according to this family and call connect() with an address. Furthermore, if multiple addresses are provided by the DNS, the program can try different addresses until eventually a connec- tion can be established. NEAT simplifies this procedure significantly. Therefore, the NEAT Sockets API function nsa_connectn() allows a program to establish a connection to a peer by just specifying its DNS name. Then, NEAT takes care of address handling and connection establishment. A program ported to NEAT may therefore be significantly simplified by the usage of nsa_connectn(). On the server side, NEAT also takes care of handling the addresses of server sockets (IPv4, IPv6, transport protocols). Therefore, for binding a NEAT socket, it is simply necessary to specify the port number. The NEAT Sockets API function nsa_bindn() provides this functionality. A program ported to NEAT therefore does not need to take care of IP protocol version or transport protocols. Another highly useful feature provided by NEAT is the handling of TLS security. Since TLS is still work in progress in the callback API, the security handling is therefore still work in progress in the NEAT Sockets API too. The idea of realising TLS security in the NEAT Sockets API is to attach a certifi- cate to a socket (client or server), and to allow configuring a list of trusted root Certificate Authorities (CAs). A program ported to the NEAT Sockets API can therefore be made TLS-capable by just adding a few function calls.

3.3.2 Usage examples

A few programs based on BSD Sockets are being ported to the NEAT Sockets API as examples and for experimental purposes.

3.3.2.1 NetPerfMeter

NetPerfMeter [16, 17, 20] is an open source, multi-platform transport protocol performance evaluation software. It currently supports the Linux, FreeBSD and MacOS platforms (with possibility to easily extend it to further platforms), and the transport protocols SCTP,TCP including MPTCP (if supported by the operating system), UDP and DCCP (Datagram Congestion Control Protocol) if supported by the operating system. NetPerfMeter therefore makes use of the protocol-specific APIs, and a port to NEAT using the NEAT Sockets API provides a good testing environment for the implementation. An experimental NEAT branch of NetPerfMeter is already available11.

11https://github.com/dreibh/netperfmeter/tree/with-neat.

48 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

3.3.2.2 RTP Audio

RTP Audio [14, 19] is an open source, multi-platform audio streaming software. Initially, this software was written for UDP-based, layered transport of audio streams over different DiffServ classes, defined by DiffServ Code Points (DSCP) [4]. RTP Audio also supports use of SCTP instead of UDP. An experimental NEAT branch of RTP Audio is already available12. However, since the NEAT Sock- ets API currently does not implement 1-to-many-style communication, it is not fully operational at the time of writing.

3.3.2.3 SRS

In countries such as China, IPv6 deployment is still rather new despite the lack of available IPv4 ad- dresses. Therefore, there is a large deployment of middleboxes for NAPT with IPv4, with lack for sup- port of modern transport protocols. Such a scenario — i.e., networks that may/may not support mod- ern protocols and features — is a highly interesting use case for NEAT. SRS 3.0 [53] is an open source, real-time RTMP video streaming server with a significant pool of users and contributors from China. IPv6 support has been added recently to this server by a NEAT project participant13, and NEAT support is being added based on the NEAT Sockets API. An experi- mental branch, not fully functional yet at the time of writing, is already available14.

3.3.3 with_neat

With_neat alludes to the with_sctp software, which is provided as part of the Linux SCTP [54] imple- mentation. The purpose of with_sctp is to run a TCP or UDP program with SCTP instead, without making any changes to the program itself. The principle behind with_sctp is quite simple: it provides a small shared library, implementing the BSD Sockets API function socket(). This function is called by a program to create a socket (for TCP or UDP) depending on the setting of the protocol param- eter (IPPROTO_TCP for TCP, or IPPROTO_UDP for UDP). Since Linux SCTP provides a BSD Sockets API as well, the socket() function provided by with_sctp simply changes this protocol parameter to IPPROTO_SCTP for using SCTP. Then, it just calls the system’s (libc) original socket() function to cre- ate the SCTP socket. Loading the with_sctp shared library with LD_PRELOAD, a program then simply uses SCTP wherever it expects a TCP or UDP socket. This works unless special features of the TCP or UDP implementation are used (e.g., certain socket options). A simple wrapper of setsockopt() for setting the frequently-used TCP_NODELAY option is therefore provided as well. The goal of with_neat is to transfer the idea of with_sctp to NEAT: a wrapper maps the BSD Sockets API calls to NEAT Sockets API calls, making a “legacy” networking program NEAT-enabled. Particu- larly, without any change (and possibly even without access to the program’s source), a program could already utilise many benefits of NEAT:

• Multi-homing (e.g., with SCTP or MPTCP).

• Multi-path transport (e.g., with a CMT-capable SCTP implementation or MPTCP).

• QoS-aware flow configuration (automatically, or e.g. with the help of environment variables).

12https://github.com/dreibh/rtpaudio/tree/with-neat. 13https://github.com/ossrs/srs/pull/988. 14https://github.com/dreibh/srs/tree/ossrs3.0-neat.

49 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

However, unlike for with_sctp, it is not sufficient to just wrap socket() and a small set of other functions to realise with_neat:

• All socket functions must be mapped to NEAT functions. That is, the complete set of func- tions provided by the BSD Sockets API needs mapping, since calling any unmapped function will cause the call to fail.

• The NEAT Sockets API uses its own socket descriptor space. In Unix, “everything is a file”, i.e., a socket descriptor is similar to a file descriptor. Consequently, many functions like poll(), select(), read(), write(), etc. work on sockets and files. It is therefore necessary to wrap all file I/O functions using file descriptors as well. This also includes pipes, and handling the standard input, output and error descriptors.

• Modern Linux programs may use the epoll API. This would require epoll support for with_neat as well.

• For handling UDP, the NEAT Sockets API also has to implement a 1-to-many-style API. That is, UDP sockets — like SCTP sockets — can be used with functions like sendto() to “just send data to a given destination”. While UDP can simply send such packets (by setting a destination IP address and port number), SCTP internally establishes and maintains associations. This par- ticularly means that SCTP also has to automatically shut down unused associations.

An experimental NEAT branch for with_neat is already available at the time of writing15.

4 Conclusions

This deliverable summarises our work in WP3 on extended functionalities for NEAT’stransport system, and on transparent support for non-NEAT applications. Extended transport functionalities have been the subject of four different activities in Task 3.2, one of which was developed after deliverable D3.1 was produced. First, we have explored the use of multi-streaming with SCTP against multiple TCP connections for the web. The key findings of these studies will serve as input to further web experiments to be done in Task 4.3 of WP4, “Demonstration and Experiments”. Second, as part of our research on an extended policy and transport-selection sys- tem, we have designed a passive measurement mechanism to assess the available access link capacity and its level of congestion. This mechanism is provided as a CIB source that will enrich the amount of information available for the Policy Manager to take better decisions in some relevant scenarios. We have also showcased how NEAT and its policy mechanism could significantly reduce latency in a multi-access mobile scenario; this will also be pursued further in our experimental evaluation of NEAT as part of Task 4.3. Third, our work on integrating NEAT with SDN has resulted in a framework for application awareness in SDN environments. This framework would allow to better support spe- cific applications in such managed networks. Fourth, our more recent activity on PvD has resulted in NEAT being able to leverage PvD information in its choice of transport services, providing the pol- icy system with another CIB source to base policy decisions on. All these extended functionalities support the different industry use cases identified and refined in WP1, and complement the essential functions and features implemented in the core NEAT System prototype developed in WP2. As part of our commitment to make key outcomes from WP3 public, these activities are reported in research

15https://github.com/NEAT-project/neat/tree/dreibh/withneat.

50 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017 papers published and/or demonstrated in relevant scientific venues; one of these earned a Best Demo Award in the well-known IFIP Networking conference in 2017. Our work on transparent support of non-NEAT applications in Task 3.3 has resulted in a NEAT proxy solution that makes it possible for legacy TCP applications to use NEAT without any migration effort. Also, a middleware approach has been explored to integrate NEAT in a virtualised SDN environ- ment supporting legacy (i.e., non-NEAT) applications. These solutions relate to two of the industry use cases (Celerway’s and EMC’s, respectively). Finally, a BSD-compatible sockets API for NEAT has been implemented as a shim layer on top of the callback-based NEAT User API. This shim enables legacy network applications to leverage some of NEAT’s features without requiring a full porting effort.

51 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

References

[1] O. Alay, A. Lutu, R. García, M. Peón-Quiròs, V. Mancuso, T. Hirsch, T. Dely, J. Werme, K. Evensen, A. Hansen, S. Alfredsson, J. Karlsson, A. Brunstrom, A. S. Khatouni, M. Mellia, M. A. Marsan, R. Monno, and H. Lonsethagen, “Measuring and assessing mobile broadband networks with monroe,” in 2016 IEEE 17th International Symposium on A World of Wireless, Mobile and Mul- timedia Networks (WoWMoM), June 2016, pp. 1–3.

[2] D. Anipko, “Multiple Provisioning Domain Architecture,” RFC 7556 (Informational), Internet Engineering Task Force, Jun. 2015. [Online]. Available: http://www.ietf.org/rfc/rfc7556.txt

[3] T. Auld, A. W. Moore, and S. F. Gull, “Bayesian neural networks for internet traffic classification,” IEEE Transactions on neural networks, vol. 18, no. 1, pp. 223–239, 2007.

[4] J. Babiarz, K. Chan, and F. Baker, “Configuration Guidelines for DiffServ Service Classes,” RFC 4594 (Informational), Internet Engineering Task Force, Aug. 2006, updated by RFC 5865. [Online]. Available: http://www.ietf.org/rfc/rfc4594.txt

[5] N. Baranasuriya, V. Navda, V. N. Padmanabhan, and S. Gilbert, “Qprobe: locating the bottleneck in cellular communication,” in Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies. ACM, 2015, p. 33.

[6] M. Belshe, R. Peon, and M. Thomson, “Hypertext Transfer Protocol Version 2 (HTTP/2),” RFC 7540 (Proposed Standard), Internet Engineering Task Force, May 2015. [Online]. Available: http://www.ietf.org/rfc/rfc7540.txt

[7] L. Bernaille, R. Teixeira, and K. Salamatian, “Early application identification,” in Proceedings of the 2006 ACM CoNEXT conference. ACM, 2006, p. 6.

[8] M. Bishop, “Hypertext Transfer Protocol (HTTP) over QUIC,” Internet Draft draft-ietf-quic-http, 09 2017, Work in Progress. [Online]. Available: https://tools.ietf.org/html/draft-ietf-quic-http-07

[9] Z. Bozakov, S. Mangiante, C. H. Benet, A. Brunstrom, R. Santos, A. Kassler, and D. Buckley, “ANEAT framework for enhanced end-host integration in SDN environments,” in Proc. of IEEE NFV-SDN 2017, Nov. 2017.

[10] Z. Bozakov, S. Mangiante, A. Brunstrom, D. Damjanovic, G. Fairhurst, A. Hansen, T. Jones, N. Khademi, A. Petlund, , D. Ros, D. Stenberg, M. Tüxen, and F.Weinrank, “NEAT-based applica- tions and first version of NEAT-based tools,” The NEAT Project (H2020-ICT-05-2014), Deliverable D4.1, Mar. 2017.

[11] M. Carbone and L. Rizzo, “Dummynet revisited,” SIGCOMM Comput. Commun. Rev., vol. 40, no. 2, pp. 12–20, 04 2010.

[12] S. Deng, R. Netravali, A. Sivaraman, and H. Balakrishnan, “WiFi, LTE, or Both?: Measuring multi- homed wireless internet performance,” in Proceedings of the 2014 Internet Measurement Confer- ence, ser. IMC ’14, 2014, pp. 181–194.

[13] C. Dovrolis and R. Prasad, “Pathrate: A measurement tool for the capacity of network paths,” URL: www. cc. gatech. edu/fac/Constantinos. Dovrolis, 2004.

52 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

[14] T. Dreibholz, “Management of Layered Variable Bitrate Multimedia Streams over DiffServ with Apriori Knowledge,” University of Bonn, Institute for Computer Science, Masters Thesis, 02 2001. [Online]. Available: https://duepublico.uni-duisburg-essen.de/servlets/DerivateServlet/ Derivate-29936/Dre2001.pdf

[15] ——, “Reliable Server Pooling – Evaluation, Optimization and Extension of a Novel IETF Architecture,” phdthesis, University of Duisburg-Essen, Faculty of Economics, Institute for Computer Science and Business Information Systems, 03 2007. [Online]. Available: https:// duepublico.uni-duisburg-essen.de/servlets/DerivateServlet/Derivate-16326/Dre2006_final.pdf

[16] ——, “Evaluation and Optimisation of Multi-Path Transport using the Stream Control Transmission Protocol,” Habilitation Treatise, University of Duisburg-Essen, Faculty of Economics, Institute for Computer Science and Business Information Systems, 03 2012. [Online]. Available: https://duepublico.uni-duisburg-essen.de/servlets/DerivateServlet/Derivate-29737/ Dre2012_final.pdf

[17] ——, “NetPerfMeter: A Network Performance Metering Tool,” Multipath TCP Blog, 09 2015. [Online]. Available: http://blog.multipath-tcp.org/blog/html/2015/09/07/netperfmeter.html

[18] ——, “NEAT Sockets API,” IETF, Individual Submission, Internet Draft draft- dreibholz-taps-neat-socketapi-02, 10 2017. [Online]. Available: https://tools.ietf.org/id/ draft-dreibholz-taps-neat-socketapi-02.txt

[19] T. Dreibholz, J. Selzer, and S. Vey, “Echtzeit-Audioübertragung mit QoS-Management in einem DiffServ-Szenario,” Universität Bonn, Institut für Informatik, Projektseminararbeit, 08 2000. [Online]. Available: https://www.uni-due.de/~be0001/rn/DSV00.pdf

[20] T. Dreibholz, M. Becke, H. Adhari, and E. P. Rathgeb, “Evaluation of A New Multipath Congestion Control Scheme using the NetPerfMeter Tool-Chain,” in Proceedings of the 19th IEEE International Conference on Software, Telecommunications and Computer Networks (SoftCOM), 09 2011, pp. 1–6, ISBN 978-953-290-027-9. [Online]. Available: https://www.wiwi.uni-due.de/ fileadmin/fileupload/I-TDR/SCTP/Paper/SoftCOM2011.pdf

[21] N. Dukkipati, T. Refice, Y. Cheng, J. Chu, T. Herbert, A. Agarwal, A. Jain, and N. Sutin, “Anargument for increasing tcp’s initial congestion window,” SIGCOMM Comput. Commun. Rev., vol. 40, no. 3, pp. 26–33, 06 2010. [Online]. Available: http://doi.acm.org/10.1145/1823844.1823848

[22] Y. Elkahatib, G. Tyson, and M. Welzl, “Can SPDY really make the web faster?” in 2014 IFIP Net- working Conference, 06 2014, pp. 1–9.

[23] T. En-Najjary and G. Urvoy-Keller, “Pprate: A passive capacity estimation tool,” in End-to-End Monitoring Techniques and Services, 2006 4th IEEE/IFIP Workshop on. IEEE, 2006, pp. 82–89.

[24] G. Fairhurst, T. Jones, Z. Bozakov, A. Brunstrom, D. Damjanovic, T. Eckert, K. Riktor Evensen, K.-J. Grinnemo, A. Fosselie Hansen, N. Khademi, S. Mangiante, P.McManus, G. Papastergiou, D. Ros, M. Tuexen, E. Vyncke, and M. Welzl, “A New, Evolutive API and Transport-Layer Architecture for the Internet,” NEAT Project (H2020-ICT-05-2014), Deliverable D1.1, 9 2016.

[25] G. Fairhurst, T. Jones, A. Brunstrom, and D. Ros, “The NEAT Interface to Transport Services,” IETF, Individual Submission, Internet Draft draft-fairhurst-taps-neat-00, Oct. 2017. [Online]. Available: https://tools.ietf.org/id/draft-fairhurst-taps-neat-00.txt

53 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

[26] S. Ferlin, O. Alay, O. Mehani, and R. Boreli, “BLEST: Blocking Estimation-based MPTCP Scheduler for Heterogeneous Networks,” in IFIP Networking Conference (NETWORKING), May 2016.

[27] Google. SPDY: An Experimental Protocol For a Faster Web. [Online]. Available: http: //www.chromium.org/spdy/spdy-whitepaper

[28] I. Grigorik, “Making the web faster with HTTP 2.0,” Commun. ACM, vol. 56, no. 12, pp. 42–49, 12 2013.

[29] K.-J. Grinnemo, Z. Bozakov, A. Brunstrom, M. I. Bueno, D. Damjanovic, K. Evensen, G. Fairhurst, A. Hansen, D. Hayes, P.Hurtig, N. Khademi, S. Mangiante, M. Althaff, M. Rajiullah, D. Ros, I. Rün- geler, R. Santos, R. Secchi, T. C. Tangenes, M. Tüxen, F.Weinrank, and M. Welzl, “Initial Report on the Extended Transport System,” NEAT Project (H2020-ICT-05-2014), Deliverable D3.1, 18 2017.

[30] K.-J. Grinnemo, A. Brunstrom, G. Fairhurst, D. Hayes, P. Hurtig, N. Khademi, D. Ros, I. T. M. Rüngeler, F.Weinrank, and M. Welzl, “Final Report on Transport Protocol Enhancements,” NEAT Project (H2020-ICT-05-2014), Deliverable D3.2, 24 2017.

[31] T. Hoeiland-Joergensen, P.McKenney, D. Taht, J. Gettys, and E. Dumazet, “The flowqueue-codel packet scheduler and active queue management algorithm,” Internet Draft draft-ietf-aqm-fq- codel-06, March 2016.

[32] P.Hurtig, S. Alfredsson, A. Brunstrom, K. Evensen, K.-J. Grinnemo, A. F. Hansen, and T. Rozensz- trauch, “A neat approach to mobile communication,” in Proceedings of the Workshop on Mobility in the Evolving Internet Architecture, ser. MobiArch ’17. New York, NY, USA: ACM, 2017, pp. 7–12.

[33] T. Høiland-Jørgensen, P. Hurtig, and A. Brunstrom, “The good, the bad and the wifi,” Comput. Netw., vol. 89, no. C, pp. 90–106, 10 2015. [Online]. Available: https://doi.org/10.1016/j.comnet. 2015.07.014

[34] P. N. II, “The Internet and the Millennium Problem (Year 2000),” RFC 2626 (Informational), Internet Engineering Task Force, Jun. 1999. [Online]. Available: http://www.ietf.org/rfc/rfc2626. txt

[35] J. Iyengar and I. Swett, “QUIC Loss Detection and Congestion Control,” Internet Draft draft-ietf-quic-recovery, 09 2017, Work in Progress. [Online]. Available: https://tools.ietf.org/ html/draft-ietf-quic-recovery-06

[36] J. Iyengar and M. Thomson, “QUIC: A UDP-Based Multiplexed and Secure Transport,” Internet Draft draft-ietf-quic-transport, 10 2017, Work in Progress. [Online]. Available: https://tools.ietf.org/html/draft-ietf-quic-transport-07

[37] M. Jain and C. Dovrolis, “Pathload: A measurement tool for end-to-end available bandwidth,” in In Proceedings of Passive and Active Measurements (PAM) Workshop. Citeseer, 2002.

[38] T. Jones, G. Fairhurst, and C. Perkins, Raising the Datagram API to Support Transport Protocol Evolution. IFIP,6 2017.

[39] T. Jones, G. Fairhurst, and E. Vyncke, A Datagram API for Evolving Networks Beyond 5G. EUCNC, 6 2017.

54 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

[40] A. Jungmaier, “Das Transportprotokoll SCTP,” phdthesis, Universität Duisburg-Essen, In- stitut für Experimentelle Mathematik, 08 2005. [Online]. Available: https://duepublico. uni-duisburg-essen.de/servlets/DerivateServlet/Derivate-13244/dissertation_jungmaier.pdf

[41] R. Kapoor, L.-J. Chen, L. Lao, M. Gerla, and M. Y. Sanadidi, “Capprobe: A simple and accurate capacity estimation technique,” ACM SIGCOMM Computer Communication Review, vol. 34, no. 4, pp. 67–78, 2004.

[42] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, “Blinc: multilevel traffic classification in the dark,” in ACM SIGCOMM Computer Communication Review, vol. 35, no. 4. ACM, 2005, pp. 229– 240.

[43] S. Katti, D. Katabi, C. Blake, E. Kohler, and J. Strauss, “Multiq: Automated detection of multiple bottleneck capacities along a path,” in Proceedings of the 4th ACM SIGCOMM conference on Inter- net measurement. ACM, 2004, pp. 245–250.

[44] N. Khademi, D. Ros, and M. Welzl, “The new aqm kids on the block: An experimental evaluation of codel and pie,” in 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April 2014, pp. 85–90.

[45] N. Khademi, Z. Bozakov, A. Brunstrom, O. Dale, D. Damjanovic, K. Evensen, G. Fairhurst, A. Fis- cher, K.-J. Grinnemo, T. Jones, S. Mangiante, A. Petlund, D. Ros, I. Rüngeler, D. Stenberg, M. Tüxen, F. Weinrank, and M. Welzl, “Final Version of Core Transport System,” NEAT Project (H2020-ICT-05-2014), Deliverable D2.3, 2017.

[46] H. Kim, K. C. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee, “Internet traffic clas- sification demystified: myths, caveats, and the best practices,” in Proceedings of the 2008 ACM CoNEXT conference. ACM, 2008, p. 11.

[47] G. Klyne and C. Newman, “Date and Time on the Internet: Timestamps,” RFC 3339 (Proposed Standard), Internet Engineering Task Force, Jul. 2002. [Online]. Available: http: //www.ietf.org/rfc/rfc3339.txt

[48] K. Lai and M. Baker, “Measuring bandwidth,” in INFOCOM’99. Eighteenth Annual Joint Confer- ence of the IEEE Computer and Communications Societies. Proceedings. IEEE, vol. 1. IEEE, 1999, pp. 235–245.

[49] ——, “Nettimer: A tool for measuring bottleneck link bandwidth.” in USITS, vol. 1, 2001, pp. 11– 11.

[50] K.-C. Lan and J. Heidemann, “A Measurement Study of Correlations of Internet Flow Character- istics,” Computer Networks, vol. 50, no. 1, pp. 46–62, Jan. 2006.

[51] A. Langley and W. Chang. QUIC Crypto. [Online]. Available: https://docs.google.com/document/ d/1g5nIXAIkNY-7XJW5K45IblHdL2f5LTaDUDwvZ5L6g/.

[52] M. Li, M. Claypool, and R. Kinicki, “Wbest: A bandwidth estimation tool for ieee 802.11 wireless networks,” in Local Computer Networks, 2008. LCN 2008. 33rd IEEE Conference on. IEEE, 2008, pp. 374–381.

[53] W. Lin, “SRS 3.0 Wiki,” 10 2017. [Online]. Available: https://github.com/ossrs/srs/wiki/v3_EN_ Home

55 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

[54] LKSCTP, “Linux Kernel SCTP,” 10 2017. [Online]. Available: http://lksctp.sourceforge.net/

[55] A. W. Moore and K. Papagiannaki, “Toward the accurate identification of network applications.” in PAM, vol. 5. Springer, 2005, pp. 41–54.

[56] A. W. Moore and D. Zuev, “Internet traffic classification using bayesian analysis techniques,” in ACM SIGMETRICS Performance Evaluation Review, vol. 33, no. 1. ACM, 2005, pp. 50–60.

[57] P.Natarajan, P.D. Amer, and R. Stewart, “Multistreamed web transport for developing regions,” in ACM SIGCOMM Workshop on Networked Systems for Developing Regions (NSDR), 08 2008.

[58] P. Natarajan et al., “SCTP: An innovative transport layer protocol for the web,” in Proceedings of the 15th international conference on World Wide Web. ACM, 2006, pp. 615–624.

[59] T. T. Nguyen and G. Armitage, “Training on multiple sub-flows to optimise the use of machine learning classifiers in real-world ip networks,” in Local Computer Networks, Proceedings 2006 31st IEEE Conference on. IEEE, 2006, pp. 369–376.

[60] K. Nichols and V. Jacobson, “Controlling queue delay,” ACM Queue, vol. 10, no. 5, 05 2012. [Online]. Available: http://doi.acm.org/10.1145/2208917.2209336

[61] K. Papagiannaki, N. Taft, S. Bhattacharyya, P. Thiran, K. Salamatian, and C. Diot, “A Pragmatic Definition of Elephants in Internet Backbone Traffic,” in Proceedings of the 2nd ACM SIGCOMM Internet Measurement Workshop, 2002, pp. 175–176.

[62] G. Papastergiou, K.-J. Grinnemo, A. Brunstrom, D. Ros, M. Tüxen, N. Khademi, and P. Hurtig, “On the cost of using happy eyeballs for transport protocol selection,” in Proceedings of the 2016 Applied Networking Research Workshop, ser. ANRW ’16. New York, NY, USA: ACM, 2016, pp. 45–51. [Online]. Available: http://doi.acm.org/10.1145/2959424.2959437

[63] V. E. Paxson, “Measurements and analysis of end-to-end internet dynamics,” phdthesis, Univer- sity of California, Berkeley, 1997.

[64] QUIC Working Group. [Online]. Available: https://datatracker.ietf.org/wg/quic/charter/

[65] M. Rajiullah, A. Cader Mohideen, F.Weinrank, R. Secchi, and G. Fairhurst, Understanding Multi- streaming for Web Traffic: An Experimental Study: IFIP FIT Workshop. IFIP,6 2017.

[66] E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.3,” Internet Draft draft-ietf-tls-tls13, 07 2017, Work in Progress. [Online]. Available: https://tools.ietf.org/html/ draft-ietf-tls-tls13-21

[67] J. Roskind, “QUIC: Multiplexed stream transport over UDP,” Google working design document, 2013. [Online]. Available: https://docs.google.com/document/d/1jdKEQMlM7ThDMDalFYFR_ 9-Yw91PhoBmkAPQcCicX3s/pub

[68] R. Santos, Z. Bozakov, S. Mangiante, A. Brunstrom, and A. J. Kassler, “A NEAT framework for application-awareness in SDN environments,” in Proc. of IFIP Networking Conference, Jun. 2017, demo, Best Demo Award.

[69] S. Saroiu, P.K. Gummadi, and S. D. Gribble, “Sprobe: A fast technique for measuring bottleneck bandwidth in uncooperative environments,” in IEEE INFOCOM, 2002, p. 1.

56 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

[70] S. Savage, “Sting: A tcp-based network measurement tool.” in USENIX Symposium on Internet Technologies and Systems, vol. 2, 1999, pp. 7–7.

[71] R. Secchi, A. C. Mohideen, and G. Fairhurst, “Performance analysis of next generation web access via satellite,” International Journal of Satellite Communications and Networking, pp. n/a–n/a, 2016. [Online]. Available: http://dx.doi.org/10.1002/sat.1201

[72] R. Secchi, A. Mohideen, and G. Fairhurst, “Performance analysis of next generation access via satellite,” Int. J. Satell. Comm. N. (IJSCN), vol. 34, no. 6, 12 2016.

[73] W. R. Stevens, B. Fenner, and A. M. Rudoff, Unix Network Programming. Addison-Wesley Pro- fessional, 2003, ISBN 0-131-41155-1.

[74] R. Stewart, “Stream Control Transmission Protocol,” RFC 4960 (Proposed Standard), Internet Engineering Task Force, Sep. 2007, updated by RFCs 6096, 6335, 7053. [Online]. Available: http://www.ietf.org/rfc/rfc4960.txt

[75] R. Stewart, M. Tuexen, K. Poon, P. Lei, and V. Yasevich, “Sockets API Extensions for the Stream Control Transmission Protocol (SCTP),” RFC 6458 (Informational), Internet Engineering Task Force, Dec. 2011. [Online]. Available: http://www.ietf.org/rfc/rfc6458.txt

[76] J. Strauss, D. Katabi, and F. Kaashoek, “A measurement study of available bandwidth estimation tools,” in Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. ACM, 2003, pp. 39–44.

[77] M. Thomson and S. Turner, “Using Transport Layer Security (TLS) to Secure QUIC,” Internet Draft draft-ietf-quic-tls, 10 2017, Work in Progress. [Online]. Available: https: //tools.ietf.org/html/draft-ietf-quic-tls-07

[78] thttpd with sctp support. [Online]. Available: https://github.com/nplab/thttpd/tree/ multistream

[79] X. S. Wang, A. Balasubramanian, A. Krishnamurthy, and D. Wetherall, “How speedy is SPDY?” 2014, http://wprof.cs.washington.edu/spdy/tool/.

[80] ——, “How Speedy is SPDY?” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), April 2014, pp. 387–399. [Online]. Available: https://www.usenix.org/ conference/nsdi14/technical-sessions/wang

[81] M. Welzl, D. Damjanovic, G. Fairhurst, D. Hayes, T. Jones, D. Ros, M. Tuexen, and F. Weinrank, “Final Version of Services and APIs,” NEAT Project (H2020-ICT-05-2014), Deliverable D1.3, 30 2017.

[82] N. Williams, S. Zander, and G. Armitage, “A preliminary performance comparison of five ma- chine learning algorithms for practical ip traffic flow classification,” ACM SIGCOMM Computer Communication Review, vol. 36, no. 5, pp. 5–16, 2006.

[83] D. Wing and A. Yourtchenko, “Happy Eyeballs: Success with Dual-Stack Hosts,” RFC 6555 (Proposed Standard), Internet Engineering Task Force, Apr. 2012. [Online]. Available: http://www.ietf.org/rfc/rfc6555.txt

57 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A NEAT Terminology

This appendix defines terminology used to describe NEAT. These terms are used throughout this doc- ument.

Application An entity (program or protocol module) that uses the transport layer for end-to-end de- livery of data across the network (this may also be an upper layer protocol or tunnel encapsula- tion). In NEAT, the application data is communicated across the network using the NEAT User API either directly, or via middleware or a NEAT Application Support API on top of the NEAT User API.

Characteristics Information Base (CIB) The entity where path information and other collected data from the NEAT System is stored for access via the NEAT Policy Manager.

NEAT API Framework A callback-based API in NEAT. Once the NEAT base structure has started, using this framework an application can request a connection (create NEAT Flow), communicate over it (write data to the NEAT Flow and read received data from the NEAT Flow) and register callback functions that will be executed upon the occurrence of certain events.

NEAT Application Support Module Example code and/or libraries that provide a more abstract way for an application to use the NEAT User API. This could include methods to directly support a middleware library or an interface to emulate the traditional Socket API.

NEAT Component An implementation of a feature within the NEAT System. An example is a “Happy Eyeballs” component to provide Transport Service selection. Components are designed to be portable (e.g. platform-independent).

NEAT Diagnostics and Statistics Interface An interface to the NEAT System to access information about the operation and/or performance of system components, and to return endpoint statis- tics for NEAT Flows.

NEAT Flow A flow of protocol data units sent via the NEAT User API. For a connection-oriented flow, this consists of the PDUs related to a specific connection.

NEAT Flow Endpoint The NEAT Flow Endpoint is a NEAT structure that has a similar role to the Trans- mission Control Block (TCB) in the context of TCP. This is mainly used by the NEAT Logic to collect the information about a NEAT Flow.

NEAT Framework The Framework components include supporting code and data structures needed to implement the NEAT User Module. They call other components to perform the functions required to select and realise a Transport Service. The NEAT User API is an important component of the NEAT Framework; other components include diagnostics and measurement.

NEAT Logic The NEAT Logic is at the core of the NEAT System as part of the NEAT Framework com- ponents and is responsible for providing functionalities behind the NEAT User API.

NEAT Policy Manager Part of the NEAT User Module responsible for the policies used for service se- lection. The Policy Manager is accessed via the (user-space) Policy Interface, portable across platforms. An implementation of the NEAT Policy Manager may optionally also interface to ker- nel functions or implement new functions within the kernel (e.g. relating to information about a specific network interface or protocols).

58 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

NEAT Selection Selection components are responsible for choosing an appropriate transport end- point and a set of transport components to create a Transport Service Instantiation. This utilises information passed through the NEAT User API, and combines this with inputs from the NEAT Policy Manager to identify candidate services and test the suitability of the candidates to make a final selection.

NEAT Signalling and Handover Signalling and Handover components enable optional interaction with remote endpoints and network devices to signal the service requested by a NEAT Flow, or to interpret signalling messages concerning network or endpoint capabilities for a Transport Ser- vice Instantiation.

NEAT System The NEAT System includes all user-space and kernel-space components needed to re- alise application communication across the network. This includes all of the NEAT User Module, and the NEAT Application Support Module.

NEAT User API The API to the NEAT User Module through which application data is exchanged. This offers Transport Services similar to those offered by the Socket API, but using an event-driven style of interaction. The NEAT User API provides the necessary information to allow the NEAT User Module to select an appropriate Transport Service. This is part of the NEAT Framework group of components.

NEAT User Module The set of all components necessary to realise a Transport Service provided by the NEAT System. The NEAT User Module is implemented in user space and is designed to be portable across platforms. It has five main groupings of components: Selection, Policy (i.e. the Policy Manager and its related information bases and default values), Transport, Signalling and Handover, and the NEAT Framework. The NEAT User Module is a subset of a NEAT System.

Policy Information Base (PIB) The rules used by the NEAT Policy Manager to guide the selection of the Transport Service Instantiation.

Policy Interface (PI) The interface to allow querying of the NEAT Policy Manager.

Stream A set of data blocks that logically belong together, such that uniform network treatment would be desirable for them. A stream is bound to a NEAT Flow. A NEAT Flow contains one or more streams.

Transport Address A transport address is defined by a network-layer address, a transport-layer pro- tocol, and a transport-layer port number.

Transport Feature Short for Transport Service Feature.

Transport Service A set of end-to-end features provided to users, without an association to any given framing protocol, which provides a complete service to an application. The desire to use a spe- cific feature is indicated through the NEAT User API.

Transport Service Feature A specific end-to-end feature that the transport layer provides to an appli- cation. Examples include confidentiality, reliable delivery, ordered delivery and message-versus- stream orientation.

59 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Transport Service Instantiation An arrangement of one or more transport protocols with a selected set of features and configuration parameters that implements a single Transport Service. Exam- ples include: a protocol stack to support TCP,UDP,or SCTP over UDP with the partial reliability option.

60 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

B Paper: Evaluating the Impact of Transport Mechanisms on Web Performance

The following research paper has been produced by project participants and is currently under prepa- ration.

61 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Evaluating the Impact of Transport Mechanisms on Web Performance

A. C. Mohideen?, R. Secchi?, G. Fairhurst? , M. Rajiullah†, F. Weinrank‡, and A. Brunstrom† †Karlstad University, Karlstad, Sweden mohammad.rajiullah, anna.brunstrom @kau.se { } ?University of Aberdeen, Aberdeen, U.K. althaff, raffaello, gorry @erg.abdn.ac.uk { } ‡FHM, Munster, Germany [email protected]

Abstract— multiple domains (even for the same origin content), a practice known as sharding. A client opens multiple connections for This paper explores the design trade-offs required for an Internet transport protocol to effectively support web access. It shared content [6]. This further increases the required number identifies a set of distinct transport mechanisms and explores their of simultaneous transport connections. use with a focus on multistreaming. The mechanisms are studied Although parallelism has benefits, introducing a large using a practical methodology that utilise the range of transport number of transport connections is not without drawbacks. features provided by TCP and SCTP. The results demonstrate the relative benefit of key transport mechanisms and analyse First, the client-server session may experience a large number how these impact web access performance. Our conclusions of under-utilised connections (e.g., a connection may transfer help identify the root causes of performance impairments and only a small resource), which reduces efficiency due to the suggest appropriate choices guiding the design of a web transport overhead required to open and maintain each connection. protocol. Second, breaking the transmission flow into many independent connections reduces the ability to provide congestion control, Keywords—TCP; SCTP; Web; performance analysis making web traffic more aggressive towards other competing traffic [7]–[9]. Even so, it is still common for HTTP 1.1 clients I.INTRODUCTION to use multiple parallel connections to the same web server This paper explores the transport protocol mechanisms [10]. One reason for the continued use of parallel connections required to realise a modern high-efficient web client. The stems from the stream-oriented design of the TCP transport original specification of HTTP/1.0 serialised the web requests protocol. This does not have mechanisms that support sending onto a single transport connection, that was assumed to be multiple objects over a single flow. offered by TCP and originally supported simple web pages A number of TCP optimisations have also emerged to with text and a few images. However, most web pages have improve web performance (IW10 [11], TCP Fast Open [12], evolved large highly complex structures [1] comprising a RACK [13], etc). An alternative multi-streaming transport collection of inter-dependent resources. Recent studies [2]–[4] emerged, originally to support signalling information: the have found that the dependency graph for web page resources Stream Control Transport Protocol, SCTP [14]. Although SCTP (and corresponding scheduling order) play a significant role has not been widely supported for web use, it does provide an in determining the overall web performance. The order of alternate way to realise parallelism in the transport layer. In its delivery and processing can therefore be expected to impact simplest form, each transport connection is closed when the the time to display a page, and it is important to understand requested resource is received. HTTP/1.1 [15] also allowed a how transport mechanisms contribute to overall performance. A client to keep the transport connection open refuses it for sub- problem known as head of line blocking (HoLB) occurs when sequent requests (known as HTTP persistence), but not finally the chain of processing is delayed while waiting for a critical widely realised until SPDY [16] and HTTP/2.0 [17] emerged. resource to be received over a transport connection [5]. HoLB Persistence is also a feature of an SCTP association, enabling plagued the performance of early web clients. SCTP to model this transport behaviour with HTTP/1.1. To address these problems, various techniques have em- In contrast, a multi-streaming approach can identify sub- ployed to accelerate page download [5]. One approach increases streams and relate these to the objects being transported. In the parallelism of resource download, i.e., requesting an HTTP contrast to TCP, this approach has become key to the parallelism resource while other resources are being downloaded. Therefore, provided in multi-streaming message protocols, such as the since early specifications of HTTP/1.1, browsers have used a Stream Control Transmission Protocol (SCTP) [14] [18]. number of TCP connections per server (e.g., the current default is six in Mozilla Firefox and Google Chrome) and have often HTTP/2.0 also introduced a framing layer that helps adopted a proactive policy for connection management, includ- bidirectional multiplexing of interleaved requests and responses ing closing/reopening slow TCP connections and sometimes carried over a persistent TCP connection [17]. Interleaving was requesting the same resource over multiple connections. In only recently introduced to SCTP in the form of I-Data [19], addition, servers often choose to distribute webpages across primarily to address HoLB issues associated with supporting

62 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

1000 webRTC. These higher-layer aspects are not the focus of the css javascript current paper. html image

The contribution of this paper is three-fold: (a) it uses a 100 web traffic workload based on both a dependency graph and the processing time for HTTP objects at a web client to explore the benefits of multi-streaming. (b) it provides new data examining 10 the impact of RTT and bottleneck capacity on web performance. No. of resources (c) it seeks to understand the contribution of buffering within the network and the use of active queue management (AQM) 1 on transport parallelism and multi-streaming. Group A Group B Group C Group D Group E Group F The remainder of this paper is organised as follows: Section Figure 1: Distribution of number of resources by MIME type II describes our web model and testing methodology. The across the six size-ranks. experimental tool and experiment are described in section III, followed by performance analysis in section IV, the impact of AQM on the transport is discussed in section IV. The paper concludes in section VII. 100000 css javascript html II.WEB MODELAND DATASET 10000

Although the purpose of our analysis is to explore transport 1000 mechanisms, the analysis requires a representative workload.

This study utilised a publically available web performance 100 dataset [20]. This provides the number and size of HTTP resources (objects) from 170 recorded web pages. This also 10 includes graphs representing the dependency between HTTP Time to complete transfer (ms) resources and their processing time at the client, enabling others 1 to repeat our tests if required. Group A Group B Group C Group D Group E Group F To characterise the web traffic workload, we categorised Figure 2: Distribution of time to complete a transfer by MIME the web pages according to the total size of all resources in a type across the six size-ranks. page. This total was used to divide each page into one of six bins (size-ranks), labeled A to F, organised so that each size- rank held an equal number of web pages, forming statistically significant groups. Table I reports the interval of sizes for each to transport bigger (and often more complex) resources, such as size-rank in the second column, and the 5%, 50% and 95% video or interactive banners, and tend to cluster multiple items percentile for the resource size distribution in the 3rd, 4th and in a single resource, e.g., using a single javascript file to send 5th column. For each bin, the percentile of the distribution of multiple scripts. However, the distribution of resource size has the number of resources at 5%, 50% and 95% is also reported less spread than the distribution of the number of resources. in parenthesis. This data shows a correlation between the size of a page and For simplicity, our experiments consider only the webpage the number of resources. Although there is a wide distribution with median size for each size-rank. in the number of resources within each size-rank. For example, in the smallest size-rank (A) the number of resources/page Figure 1 categorises resources by their MIME type, showing varied between 1 and 39, whereas the largest size-rank (F) the four most common types: text files (HTML), scripts ranged between 49 and 228 resources/page. This suggests that (javascript), style-sheets (CSS) and images (the most common pages of similar size may have a quite dissimilar composition across all size ranks). We observed very few image URLs, and it may not be sufficient to characterise web pages only by suggesting the dependency graph grows mainly horizontally their overall size. (i.e. increasing number of branches originating from a single The size of the retrieved resources was also observed to be resource). Other types contributed less than 2%, including Flash correlated to the total web page size, i.e. larger webpages tend resources, octect-stream and fonts.

Figure 2 shows the distribution of the time spent by the Table I: Webpage size and 5, 50 and 90 percentile of number client to complete transfer of a resource (including computation of resources per size-rank. time). This figure excludes images, because these are terminal nodes in the dependency graph. We observe that for a network Group Size-Rank Size (KB) and Size (KB) and Size (KB) and Name (KB) # res. at 5% # res. at 50% # res. at 95% path RTT of a few tens of milliseconds, the time component A 0.05-118 0.05 (1) 23 (6) 109 (39) was often not negligible in comparison to the transfer time. B 119-565 129 (3) 325 (21) 532 (67) In these datasets, the total time for web pages represented by C 566-873 567 (6) 690 (25) 846 (69) D 874-1242 878 (6) 964 (45) 1183 (82) largest size-ranks (E, F) was around or above one second. This E 1243-1945 1286 (24) 1546 (55) 1901(119) non-negligible latency impacts transport performance and is F 1946-3315 2070 (49) 2454 (127) 3309 (228) therefore discussed later in this paper.

63 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

III.TOOLSAND EXPERIMENT SETUP TCP connections (1, 6 and 18 1) or a single SCTP association (the number of streams is not a significant factor when using a A. Experimental Testbed multi-streaming protocol, and we allowed up to 100 parallel Our performance analysis considered two scenarios; 1. A streams). The cost of opening streams is further discussed in simple path with no competing traffic and predefined patterns Section IV-D. of loss 2. A path with competing traffic through a network The key experiment parameters are summarised in Table II. bottleneck. The former reveals the impact of transmission rate and propagation delay, while the latter also considers the impact of a bottleneck, the resulting interaction between transport congestion control and network buffering. B. pReplay Web client Our testbed used a set of three computers emulating a web 2 client, the network, and a web server. All computers had a Web requests were generated using the pReplay tool , common hardware configuration of 4 GB RAM and Intel Core developed in C. This uses libcurl [30] to replay HTTP traces 2 Duo processor (2.6 GHz). The network was emulated by the using HTTP/1.1 over TCP or a modified version of phttpget [31] netem traffic shaper [21], configured with a bottleneck capacity, extended to support SCTP [32]. The tool used a dependency delay, buffer size, and packet loss rate (in scenario 1). The graph in JSON files that represented the resource requests and testbed configurations were controlled with an open source computation times required to process java scripts, CSS etc. automation harness, Fabric [22]. pReplay walks the dependency graph, starting from the first activity to load the root HTML. When a network activity is Scenario 2 considered a bottleneck with the default FIFO found, pReplay issues a http request for the relevant URL. queuing in Linux and the use of Active Queue Management The tool optionally simulates computational activity by waiting (AQM), controlled via Traffic Control (tc) commands. The for time determined by the graph. Once an activity completes, AQM testbed used CoDel [23] and FQ-CoDel [24] queue pReplay checks whether all dependent activities have also management algorithms and followed best practices from the completed and then commences the next activity. It finishes bufferbloat community in [25]. We followed the methodology only when all activities in a dependency graph have been described in [26] for parameterising the AQM algorithms and to visited. choose the buffer size at the bottleneck (152ms corresponding to 127 full-sized packets at 10 Mbps capacity). latter (scenario 2) experimemnts were conducted to gather C. Lightweight Web Server realtime response under load (RRUL) [27] measurements using We used a server modified from the lightweight web server the flent [28] tool. We created two competing bulk TCP flows thttpd (tiny HTTP deamon) [33] supporting HTTP/1.1. This using flent tool that saturates the buffer at the bottleneck for work is based on a patch that allowed thttpd to be run over the entirity of each web experiment. The competing flows used SCTP [34], but only enabled web traffic to use a single stream CUBIC CC. This setup helps us to measure (at steady state) for each SCTP association. This work was extended [34] to and study the impact of a congested bottleneck on PLTs, and enable parallel multistreaming, with the possibility to introduce the contribution of AQMs to transport in such scenarios. algorithms to allow sharing transmission opportunities between Our analysis included experiments using a range of sym- parallel streams (i.e., sender scheduling using a round-robin or metric paths at 2 Mbps,10 Mbps and 100 Mbps. Results for another algorithm), and support for interleaving large objects 100 Mbps indicated similar relative performance for different (i.e., SCTP I-DATA [19]). transport mechanisms, as also observed in an empirical study at google [29]. Results for lower rate paths, at or below 2 Mbps, are known to be strong dependency on the speed of IV. RESULTS the bottleneck, the effect of competing traffic on performance, and flow scheduling methods at the link later, and are not the This section contains a systematic study of web page load focus of the present paper. The remainder of the paper therefore time (PLT) using HTTP/1.1 over both TCP and SCTP. Our focusses on a 10 Mbps bottleneck. Similarly, we modelled a goal is to understand the conditions that benefit the use of range of path RTTs representative of both desktop and mobile multiple connections compared to multistreaming. pReplay was users, drawn from a distribution derived from an empirical used to measure PLT, the time between making the first web study at Mozilla for both mobile and desktop clients, Table III. request and the time either the last response is received or the last computation is completed. The results present data for an The client and server supported TCP (Linux ver 4.2.0-42 average of 30 runs, plotted with 95% confidence intervals. and BSD) and SCTP (under BSD). he same Initial Window (IW) was used for TCP and SCTP. The client used an IW Results are presented for websites at the 50th percentile of three packets, recommended by the IETF and common for from our web model (Section. II) as described in Table IV. windows users. The server used an IW of 10, common for Linux-based servers, and an experimental IETF specification. The dataset processing time [20], was used as an upper The maximum segment size was 1500 bytes. bound for analysing the impact of processing time. We expect that advances in client platforms and in the way resources are The multi-streaming web server is described in section III-C. parsed, would now result in a much lower processing time. We A custom made client emulated a HTTP/1.1 browser (sec- therefore also plot the load time with no additional processing tion III-B), allowing requests with either a number of parallel time, to present a minimum bound.

64 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

8 30 50 1 TCP 1 TCP 1 TCP 7 6 TCPs 6 TCPs 45 6 TCPs 25 18 TCPs 18 TCPs 40 18 TCPs 6 100s SCTP 100s SCTP 100s SCTP 35 20 5 30 4 15 25

PLT [s] 3 PLT [s] PLT [s] 20 10 15 2 10 5 1 5 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) google (b) dmm (c) siteadvisor

60 40 100 1 TCP 1 TCP 1 TCP 6 TCPs 35 6 TCPs 90 6 TCPs 50 18 TCPs 18 TCPs 80 18 TCPs 100s SCTP 30 100s SCTP 100s SCTP 70 40 25 60 30 20 50

PLT [s] PLT [s] 15 PLT [s] 40 20 30 10 20 10 5 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (d) amazon (e) pinterest (f) mediafire

Figure 3: PLT for 10 Mbps capacity, no loss, without processing time.

Experiment parameters Category Factor Range/value The simplest case is when there is no processing time RTTs 20, 50, 100, 200, 800 ms dependency and there is no emulated loss (tail drop loss from Network Bottleneck Capacity 10 Mbps router buffers was still observed in some experiments). Packet loss No loss, 1.5%, 3% IW client (IW 3), server (IW 10) TCP/SCTP CWND validation no Each transport pipe independently performed start-up, # parallel TCP flows 1, 6, 18 congestion control and loss recovery. Each TCP transport # streams in SCTP 100 connection transferred just one single web resource. Table II: Experimental parameters In contrast, parallelism allows multiple transport pipes to each send a resource at any one time. This reduced the Percentile Desktop RTTs (ms) Mobile RTTs (ms) number of consecutive RTTs required to complete the web 5 1 11 page, reducing the overall PLT. Our experiments considered 25 20 44 50 79 94 two ways in which this parallelism could be introduced: First. 75 194 184 using parallel TCP connections (each independently managing 95 800 913 congestion control) or second using multiple SCTP streams Table III: Path RTT from data provided by Mozilla (where all streams shared a single congestion controller). Figure 3 shows that in most cases parallelism reduced the PLT. An exception may be seen in Figure 3e, where, one, A. Impact of Parallelism at the Transport six and eighteen TCP connections and a multi-stream SCTP Figure 3 shows the impact of PLT for the selected number association have an almost similar PLT (up to the 100ms RTT of parallel TCP connections compared to a single SCTP case). Pages of large size with fewer resources (large average connection. object size); the pinterest web page with 6 objects of 258KB average object size (see Table IV), shows similar PLTs even 1Common browsers open up to six connection to a single domain, but for parallalism. We discuss this special scenario later in the sharding contents across multiple web servers is also common. AQM section. 2Based on Epload [20] However, parallelism also came with a cost:

Table IV: Statistics for the web pages in experiment For a transport protocol with an independently managed • Page Res. Count Page Size (KB) Av. Res. Size (KB) congestion control (TCP), a higher sending rate can induce ≈ Google 8 74 9 congestion leading to collateral damage to other flows Dmm 21 330 15 Siteadvisor 40 701 17 sharing the bottleneck. Amazon 53 977 18 Pinterest 6 1548 258 Mediafire 75 2474 33

65 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

For a multistreaming transport protocol that uses a shared of line retransmission delay and reduced congestion window • congestion control (SCTP), each stream contributes to the (lower throughput). Parallelism reduces the PLT when using capacity used by the association. This increases congestion TCP, since a loss only impacts the associated pipe and,the window growth, reducing the PLT (Figure 3). When throughput of other parallel flows is unchanged. congestion is experienced, a mutlistreaming protocol will react to reduce collateral damage to other flows. However, When using multi-streaming, only results in HoLB blocking this has a negative impact on the throughput. for the (sub)stream that experiences loss. However, any loss also impacts the congestion window shared by all streams in an In most cases, (except for the googlesites in Figure 3a), a SCTP association. The shared congestion control reacts more multistreaming approach provided a smaller PLT than the N conservatively, and results in a higher PLT. If the loss was a parallel TCP pipes, which consume more overhead in setting-up result of congestion, this result could have been different, since multiple parallel connections, and self-induced congestion from then reducing the overall capacity consumed by a client could concurrency. For small pages, (e.g. google in Figure 3a), the help reduce future loss and reduce the PLT. combined initial window (IW) provided by N TCP connections provides benefit over the single IW for multistreaming, but D. Discussion of the Experiment Setup again at the risk of more collateral damage. A key benefit of multistreaming is the lightweight cost All web pages in Figure 3 show a higher PLT for a larger for additional streams, which allows clients to open as many RTT. However, multistreaming shows benefit for the higher streams as they need. Our use of SCTP therefore considered RTT paths, where the connection overhead becomes important a larger maximum number of streams (100) compared to the (e.g., in Figure 3f, the PLT increases over 282% using 18 TCP maximum number of TCP connections (18). The memory alloc- parallel connections, compared to 229% using multi-streaming ated by each TCP/SCTP connection consists of a Transmission for an RTT of 200 ms to 800 ms). Control Block (TCB) of about 700 B, which is is more than needed for a SCTP stream ( 32 bytes) [35]. Although the TCB Web page structure also had an impact on the PLT. When for an SCTP association can be twice as large as for TCP, this there is no parallelism, the number of resources influences cost is amortized when multiple streams are used. the PLT more than the overall page size. This may be seen in Figure 3d, for 1 TCP, where the Amazon page (with a We did not consider alternative ways to serve the original larger number of smaller resources) complete much later than content, such as domain sharding (to scatter the content across the Pinterest page in Figure 3e (with fewer larger resources, multiple server), or image spriting. These can change the Table IV). Therefore, the number of resources and the average opportunities for parallelism, but reduce opportunities for size of the objects have more impact on the overall web multistreaming. (Using a single origin server is also recognised performance than the total webpage size. Parallelism alleviates as best practice for HTTP/2 [17], to exploit the benefits of this by reducing the delay from HoLB dependency for pages multistreaming). with many resources (e.g. the PLT for Amazon is lower than that for Pinterest when either multistreaming or N parallel TCP In this section, our performance analysis only considered a connections are used). scenario with no loss or pseudo-random link loss, although we did observe some loss from self-induced congestion. We look The larger PLT for Pinterest in Figure 3e, is limited by the at other scenarios in section V. Our analysis IV shows: size of individual resources for the high RTT scenario (800 ms), where additional parallelism can offer benefit. the number of web resources and the average size of a • web resource impact the transport much significantly than B. Impact of processing time at the client the total page size. The performance of short-lived flows (Small objects) is therefore limited by the growth of the This section examines the influence of processing time on congestion window and is a direct function of path delay. the PLT, Figure 4. Paths with a shorter RTT may be expected to experience • The additional processing time does not significantly more rapid loss recovery, e.g., TCP Cubic provides one increase the PLT when using a single connection (1 TCP), where recovery per pipe (no multi-streaming). However, for small the request overhead for each resource dominates. Parallelism resources there is also a pathology that can result in loss eliminates this overhead, therefore the processing delay resulted recovery based on RTO, that can significantly increase in greater temporal dependency between resources from the PLT [36]. the inter-dependency (and processing time) between web web model [20] and can be observed to have a direct impact • on the PLT (Figure 4). This demonstrates the importance of resources reduces web performance when using multi- reducing processing delay when designing web clients, although streaming. Control block sharing [37] with shared bottle- the authors did not have any way to evaluate how the model neck detection [38] could also result in similar behaviour for processing delay would have changed if a modern web for TCP. Repurposing webpage content (e.g., reducing the average client had been used instead. These results therefore present • the upper and lower bounds. object size) improves performance with multistreaming. The pinterest and mediafire web pages illustrate these C. Impact of loss effects, and were therefore chosen to conduct the set of Our results also consider the impact of a simple loss model experiments in the next section. on the PLT (e.g., from link effects such as wireless interference). Figure 5. Loss for a single TCP flow (1 TCP) results in head

66 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

8 60 100 1 TCP 1 TCP 1 TCP 7 6 TCPs 6 TCPs 90 6 TCPs 50 18 TCPs 18 TCPs 80 18 TCPs 6 100s SCTP 100s SCTP 100s SCTP 70 40 5 60 4 30 50

PLT [s] 3 PLT [s] PLT [s] 40 20 30 2 20 10 1 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) google (b) amazon (c) mediafire

Figure 4: PLT for 10 Mbps capacity, no loss, with processing time.

14 45 90 1 TCP 1 TCP 1 TCP 40 80 12 6 TCPs 6 TCPs 6 TCPs 18 TCPs 18 TCPs 18 TCPs 100s SCTP 35 100s SCTP 70 100s SCTP 10 30 60 8 25 50 20 40

PLT [s] 6 PLT [s] PLT [s] 15 30 4 10 20 2 5 10 0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) google (b) dmm (c) siteadvisor.com

140 120 250 1 TCP 1 TCP 1 TCP 6 TCPs 6 TCPs 6 TCPs 120 100 18 TCPs 18 TCPs 200 18 TCPs 100s SCTP 100s SCTP 100s SCTP 100 80 80 150 60

PLT [s] 60 PLT [s] PLT [s] 100 40 40 50 20 20

0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (d) amazon (e) pinterest (f) mediafire

Figure 5: PLT for 10 Mbps capacity, 1.5% packet loss, without processing time.

V. EXPLORING SHARED CONGESTION BOTTLENECKS PLTs of Pinterest webpages in Fig. 3e and 6a clearly allows to appreciate the impact of the queuing delay on transport This section evaluates the impact of bottleneck congestion performance with an unmanaged buffer. on PLT. In particular, we evaluate three instances of bottleneck buffer management: A drop-tail FIFO queue, a Controlled Delay Figure 6a shows that the PLT of Pinterest, consisting of (CoDel) queue [23], and one managed by flow-queuing CoDel 6 resources. Since Pinterest website consists of few relatively (FQ-CoDel) queue [24]. Both CoDel and FQ-CoDel are forms large objects (more that 300 kB), enough data per connection of active queue management (AQM). The bottleneck is loaded is sent to allow the congestion controller to reach steady-state. by a long-running bulk TCP flow and a web page download. Thus, the PLT is largely dominated by the available capacity and small performance differences are observed in the case of 1, 6 and 18 TCP flows. The small difference in SCTP performance A. Drop-Tail FIFO Bottleneck is instead due to a different interaction between the application In a FIFO buffer, the transmission delay of each packet in and the transport. the queue contributes to the path latency. At 10 Mbps, assuming 1) Effect of the packet losses: The Mediafire page PLT in 1500 B transmission units, a 127 packet FIFO buffer requires Fig. 7a illustrates the effects of packet losses on parallelism and around 150 ms to completely drain. The difference between multistreaming. Mediafire has many more objects than Pinterest

67 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

80 80 80 1 TCP 1 TCP 1 TCP 70 6 TCP 70 6 TCP 70 6 TCP 18 TCP 18 TCP 18 TCP 60 SCTP 60 SCTP 60 SCTP 50 50 50 40 40 40

PLT [s] 30 PLT [s] 30 PLT [s] 30

20 20 20

10 10 10

0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) FIFO (b) CoDel (c) FQ-CoDel

Figure 6: PLT for 10 Mbps capacity, Pinterest website, congested bottleneck.

140 140 140 1 TCP 1 TCP 1 TCP 120 6 TCP 120 6 TCP 120 6 TCP 18 TCP 18 TCP 18 TCP SCTP SCTP SCTP 100 100 100

80 80 80

PLT [s] 60 PLT [s] 60 PLT [s] 60

40 40 40

20 20 20

0 0 0 10 100 1000 10 100 1000 10 100 1000 RTT [ms] RTT [ms] RTT [ms] (a) FIFO (b) CoDel (c) FQ-CoDel

Figure 7: PLT for 10 Mbps capacity, Mediafire website, congested bottleneck.

and smaller on average. As already observed in the unloaded A flow sending large objects is likely to experience packet bottleneck scenario, the PLT of Mediafire degrades as the RTT loss at some point during transmission. Hence, the cwnd is increases. However, a large RTT degrades multistreaming per- first reduced and then increased more slowly in congestion formance (the SCTP case) more than parallel TCP connections avoidance. In contrast, a flow consisting of series of small performance, leading to a situation reversed with respect to the HTTP transactions may not increase the window to the point one observed without competing traffic (Fig. 3f). of overflow and may spend more time in slow-start [8]. As a consequence, small transactions tend to have more packets This effect can be attributed to packet drops that occurs delivered before loss and can increase further their cwnd. shortly after a flow starts. This results in a significant reduction of cwnd. The reduced cwnd continues to have impact for 3) Effect of a larger initial window: The size of the initial the remainder of the flow duration, increasing the total time congestion window (IW) can have an important effect on to download the object and any subsequent object using the performance. An IW of 10 segments (IW10) can transfer up to same stream. Thus, if the transport consists only of a single 15 KB of data. This can constitute an important PLT reduction congestion controlled stream, the entire transmission chain is with respect to an initial window of 3 segments (IW3), as slowed down. Conversely, a server that can choose among illustrated by Table V. several parallel flows, can schedule delivery of resources to Important saving can be obtained also when the web page best use flows that were not penalised by early packet losses. consists of many small objects, as in the Mediafire case (see Table V). However, a transport using multi-streaming shares This effect is observed in SCTP where the congestion one IW among all the (sub)streams, which diminishes the control is not only shared between all concurrent flows, but also overall benefit. Thus, it is even more important use even larger persistent across all objects of the same page. A similar effect IW to compensate [8]. This paper limits the analysis to 10 would be seen if the TCP flows were to be used persistently to segments as releasing a number of segments larger than 10 sequentially request multiple objects (e.g., as permitted with into the network is believed to increase the risk of collateral HTTP/1.1 or HTTP/2.0). damage [39]. 2) Effect of the flow congestion window: Before the loss B. AQM using CoDel TCP is in slow-start and the cwnd increases exponentially with the RTT as long as data is available to send. The more data is Active Queue Management (AQM) is a network-layer sent, the larger the cwnd becomes. method to help control the delay experienced by flows sharing

68 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

a congested bottleneck. The Controlled Delay (CoDel) [23] C. AQM using Flow Queuing with CoDel algorithm limits the queuing delay by measuring the queuing FQ-CoDel [24] is a hybrid algorithm which implements time spent by the packets in the network buffer and maintaining CoDel algorithm on the sub queues of the flow queuing (FQ) a target for the queue size that evolves over a pre-set interval. scheduler. The scheduler uses a five-tuple hashing algorithm If the queue exceeds the target over the pre-set interval, packets to enqueue packets onto sub-queues, and a deficit round are dropped from the tail of the queue until the queuing delay robin scheduler to dequeue the packets from sub-queues. The drops below the target. The default values for the target and FQ-CoDel mechanism, therefore, promotes flow byte-based interval are respectively 5 ms and 100 ms. fairness of parallel flows sharing a common bottleneck. In this 1) Effect of CoDel on RTT: The PLT for both Pinterest and respect, the method mirrors at the network layer the parallelism Mediafire webpages is significantly improved for both TCP discussed previously at the transport layer. and SCTP when the bottleneck uses CoDel compared to FIFO. 1) Effect of FQ-CoDel on PLT: The PLT for the Mediafire Indeed, the smaller path RTT under load with CoDel allows and Pinterest webpages are similar for an RTT below 200 ms faster delivery of data and helps each flow to grow faster its using both TCP and multi streaming when there is some form cwnd. As a result, The PLT observed for an RTT less than of parallelism. Many of the differences evident when using 200 ms (incidentally the propagation delay) is similar for all FIFO or simple CoDel are reduced or eliminated when the transports. bottleneck is controlled by FQ-CoDel. A single TCP flow does 2) Effect of CoDel on loss recovery: Each packet loss not significantly benefit - largely because HoLB dominates requires the transport protocol to retransmit the missing packet. performance. A single SCTP association does derive benefit. The retransmitted packet needs to be received before the receiver This could in some cases be due to the lower RTT under load can send the object to the client, which results in HoLB. The but is likely to be more significantly impacted by the lack greater the queuing in the network, the longer this takes. In of collateral damage by the traffic with which it shares the a FIFO buffer, retransmissions are queued with all other data, bottleneck. and can hence equally be delayed by data sent by other parallel Our results show that CoDel performs similarly to FQ- transport flows, the result is the same for TCP and a multi- CoDel for web. This indicates that the presence of flow queuing streamed transport. may not be essential to boost the PLT performance, experiments, CoDel reduces the impact of other flows on the progress of a conclusion also found in previous research [26]. However, a specific flow. In this way, it can reduce the time to complete FQ-CoDel is beneficial in cases such as Mediafire where either a retransmission when multiple TCP flows are being used as the number objects is large or their size is small. demonstrated in Figure 6b and 7b. Moreover, when a single stream is used, two RTTs are required to grow the cwnd to D. Summary send one object on average (33 KB) of Mediafire webpage. A range of mechanisms have been studied and the approach However, (FQ-)CoDel reduces drastically the queuing delay taken has been to evaluate transport mechanisms to understand and hence the RTT. This allows faster growth of cwnd and their contribution to web page load time. We used a data-driven faster retransmissions compensating for the more aggressive workload, because we understood already that the performance droppings policy in CoDel. The advantage of a web transport would be dependent on the structure of the requested web page. using a path with AQM is clearly visible when using the Our results analysed how these transport mechanisms were Mediafire page. impacted by the level of parallelism and RTT. While CoDel effectively improves performance with respect to FIFO, the PLT of Mediafire with one TCP connection is VI.FROM TRANSPORT MECHANISMSTOANEWWEB still high (about 20 s when the RTT is 200 ms). This confirms TRANSPORT PROTOCOL that parallelism or multistreaming is needed to improve the Our approach to analysis used established models as a way performance of webpages with many objects. to better understand the desirable features of a transport protocol for web traffic. In taking this approach, we used open data to produce the workload models, but we readily acknowledge Table V: Comparison of average PLTs with IW3 and IW10 for the much wider diversity of web content than we have been Pinterest and Mediafire webpages able to explore in the results presented in this paper. There is therefore an opportunity for future work that explores a richer Pinterest 1 TCP 6 TCP set of web models. We also acknowledge that the transport RTT (ms) IW3 IW10 IW3 IW10 techniques can drive different optimisations in the way in which 0 8,926 7,662 8,813 8,044 content is structured and retrieved. This has been evident in 20 10,506 9,149 11,281 9,135 50 10,620 9,633 11,868 9,807 some of the results presented (e.g., small objects can improve 100 15,202 10,038 15,580 10,492 performance for TCP parallelism, but other aspects can benefit 200 16,664 12,689 20,502 12,305 800 50,883 26,879 60,555 27,389 multi-streaming). In this respect we expect that web content Mediafire will continue to be optimised to match the capabilities of the RTT (ms) IW3 IW10 IW3 IW10 transport over which it is to be transmitted. Despite these 0 19,522 20,214 16,697 14,125 20 26,690 24,164 21,248 16,027 concerns, the results illustrate the key benefits and drawbacks 50 29,099 34,074 22,626 16,672 of multi-streaming. 100 40,073 34,163 29,709 21,854 200 47,146 41,992 39,243 27,888 This work has evaluated mechanisms found the TCP and 800 134,740 127,377 113,031 66,132 SCTP protocols, however the results are applicable to other

69 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Experiment Transport Mechanisms System Mechanism TCP SCTP QUIC between transport mechanisms and the way in which buffers a. Fast-Open (RFC7413) X - X are managed in the network. New tools that leverage the b. Multi-streaming - X X deployment of new transport protocols have been proposed 1. Transmission (RFC4960) c. Interleaved - X X to the Internet community [41]. These techniques try to address Multi-streaming (draft) the problem of Internet transport ossification, i.e. the inability to d. Per stream flow - - X evolve the Internet transport layer due to the difficulty to replace control (RFC4960) e. Multi-Path (RFC6824) X X X the current stack. The EU-funded project NEAT [42] contributed 2. Loss Detection a. SACK X X X exploring these techniques by addressing the lack of flexibility and Recovery (RFC2018) (RFC5681) of the an application program interface and optimising use of a. TCP Cubic X - X 3. CC Algorithm b. TCP Reno (RFC5681) X X X the available transport services. A NEAT stack implements a. IW10 (RFC6928) X X X both TCP and SCTP and allows an application to flexibly 4. Congestion Con- b. New Reno trol Fast Recovery X X X choose between, it also provides the opportunity to introduce (RFC3782, RFC6582) new protocols, such as QUIC - and therefore can serve as a c. ECN (RFC3819) X - - platform for accelerating deployment of new protocols and Table VI: Experimented Transport Mechanisms mechanisms in the transport layer.

A. Connection setup Transport Mechanism Application Mechanism Fast-Open (RFC7413, EXP) 0-RTT handshake A persistent transport can achieve a significantly lower Multi-streaming PLT by using an existing connection to send successive HTTP (RFC4960) Multi-streaming over a single transport requests. This benefit is especially visible when a path with an Interleaved Multi- streaming (draft) Interleaving of large resources appreciable RTT is used to request many resources to complete a page. For example, a non-persistent TCP connection for the Per stream Flow control (RFC4960) Flow control mediafire web page incurs 75 requests, and for a 1 second path, Table VII: Transport and Application Mechanisms this introduces 75 seconds of overhead for connection setup and a further 75 seconds to send the GET requests. SCTP’s four- way handshake would further increase the cost of connection setup. Whereas a persisent single connection (1 TCP or 1 SCTP transports that also need to work across an Internet path. In association) only incurs one conection setup plus the time to particular, the results are presented at a time when the IETF send the requests. is developing the base mechanisms for a new web transport, TCP Fast Open (TFO) [12] helps to reduce the cost of IETF QUIC [40] . This transport has its origins in work at subsequent connection setup to the same server, eliminating Google and an experimental deployment of Google’s own QUIC one RTT of delay per connection. This can result in similar protocol [40], however the intention was not to standardise the connection setup cost for persistent and non-persistent use, but work of google, but to develop appropriate techniques based has a marginal effect when using persistent use, because there on understood requirements. At the current stage there are no is only a single connection setup. Enabling TFO also requires implementations of the congestion control or loss recovery additional functions at the server. mechanisms for the IETF QUIC protocol, but it is clear that it will include key elements learned by the community since HTTP/1.1 was introduced over TCP. VII.CONCLUSIONAND FUTURE WORK The modern web has evolved through structural changes This paper has explored key transport mechanisms including over the recent decades. Examples include transforming the multistreaming, parallelism, shared and individual congestion early hypertext document formats to become rich multimedia control to evaluate their impact on web performance. The web pages and the emergence of dynamic web applications. mechanisms were explored across a range of network and Web content continues to evolve. application scenarios using a tool developed to replay a set of The results in this paper provide input to understanding pre-established web page models. This was used to evaluate of some of the mechanisms that will be utilised by QUIC. the benefit of multistreaming, which was shown to significantly Specifically, the new method is expected to favour a single multi- improve overall web performance by enabling rapid utilisation streamed approach, rather than a single stream (as originally of available link capacity and reduced web load time for web proposed for TCP) or multiple parallel transport sessions (now pages with a large size objects or larger web pages, benefiting the norm). The protocol will share one congestion state (as from shared congestion control. However, also has drawbacks has been implemented for SCTP) with persistent reuse of open when used over a path that experience high rates of loss. Even connections (as in SCTP, but also emerging in SPDY [16] a single lost packet in an SCTP connection stalls all of the and standardised in HTTP/2.0 [17]), and loss-recovery will multiplexed streams over that connection. This also applies to be designed to eliminate head of line blocking and closely streams in HTTP/2 when a loss happens in underlying TCP. integrate with the requirements for supporting HTTP/2.0. At QUIC [43] solves this using UDP as the underlying transport the time of writing, almost 1/8 of web servers have introduced supporting out-of-order delivery – a single lost packet for N N HTTP/2 support. concurrent HTTP connections will only stall 1 out of streams. Besides, losses experienced by the shared congestion From this aspect, the paper also explores the interaction of multistreaming limit the growth of the congestion window, between the application performance and the interaction and lead to an increase in the page load time.

70 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Future work could go beyond the currently standardised [14] R. Stewart, “Stream Control Transmission Protocol,” RFC 4960 (Pro- mechanisms to explore the impact of new mechanisms: RACK, posed Standard), Internet Engineering Task Force, Sep. 2007. BBR, L4S, etc work, but this analysis needs to be performed [15] R. Fielding et al., “Hypertext Transfer Protocol – HTTP/1.1,” RFC 2616 with care, because as in the presented analysis, the overall (Draft Standard), Internet Engineering Task Force, Jun. 1999. benefit is likely to depend on multiple factors, that means some [16] Google. SPDY: An Experimental Protocol For a Faster Web. [Online]. techniques offer significant benefits, but only when used with Available: http://www.chromium.org/spdy/spdy-whitepaper particular network scenarios and/or web page constructions. [17] M. Belshe, R. Peon, and M. Thomson, “Hypertext Transfer Protocol Ver- sion 2 (HTTP/2),” RFC 7540 (Proposed Standard), Internet Engineering The merits and demerits of combining specific mechanisms Task Force, May 2015. also need to be considered when defining a protocol together [18] P. Natarajan, P. D. Amer, and R. Stewart, “Multistreamed web transport with how the transport protocol will be used and managed for developing regions,” in ACM SIGCOMM Workshop on Networked by the networks and managed by the networks over which it Systems for Developing Regions (NSDR), Seattle, Aug. 2008. needs to operate. In the mean-time, a deeper understanding of [19] R. Stewart et al., “Stream Schedulers and User Message Interleaving the performance implications for HTTP/1.1 can also provide for the Stream Control Transmission Protocol,” Internet Draft draft-ietf- a good technical basis for examining how transport design tsvwg-sctp-ndata, Jul. 2016, Work in Progress. impacts the performance of HTTP/2.0. [20] X. S. Wang et al., “How Speedy is SPDY?” in 11th USENIX Symposium on Networked Systems Design and Implementation , Seattle, Apr. 2014, pp. 387–399. ACKNOWLEDGMENT [21] Hemminger.S, “Network emulation with netem,” in Linux Conf, Au, This work has received funding from the European Union’s 2005. Horizon 2020 research and innovation programme under grant [22] Fabric. [Online]. Available: http://www.fabfile.org/ agreement No. 644334 (NEAT). The views expressed are solely [23] K. Nichols and V. Jacobson, “Controlling queue delay,” ACM those of the authors. Queue, vol. 10, no. 5, May 2012. [Online]. Available: http: //doi.acm.org/10.1145/2208917.2209336 REFERENCES [24] T. Hoeiland-Joergensen, P. McKenney, D. Taht, J. Gettys, and E. Du- mazet, “The flowqueue-codel packet scheduler and active queue man- [1] Y. Elkahatib, G. Tyson, and M. Welzl, “Can SPDY really make the web agement algorithm,” Internet Draft draft-ietf-aqm-fq-codel-06, March faster?” in 2014 IFIP Networking Conference, Trondehim (Norway), 2016. Jun. 2014, pp. 1–9. [25] Best practices for benchmarking codel and fq-codel. [Online]. Available: [2] M. Butkiewicz, H. V. Madhyastha, and V. Sekar, “Characterizing web http://goo.gl/FpSW5z page complexity and its impact,” IEEE/ACM Transactions on Networking, vol. 22, no. 3, pp. 943–956, June 2014. [26] T. Høiland-Jørgensen, P. Hurtig, and A. Brunstrom, “The good, the bad and the wifi,” Comput. Netw., vol. 89, no. C, pp. 90–106, Oct. 2015. [3] C. A. Avram, K. Salem, and B. Wong, “Latency amplification: Character- [Online]. Available: https://doi.org/10.1016/j.comnet.2015.07.014 izing the impact of web page content on load times,” in 2014 IEEE 33rd International Symposium on Reliable Distributed Systems Workshops, [27] D.taht, realtime response under load(rrul). [Online]. Available: Oct 2014, pp. 20–25. https://www.bufferbloat.net/projects/bloat/wiki/RRUL Spec/ [4] X. S. Wang et al., “Demystify page load performance with wprof,” in [28] Toke.hj, flent. [Online]. Available: https://flent.org Proc. of the USENIX conference on Networked Systems Design and [29] More bandwidth doesn’t matter (much). [Online]. Avail- Implementation, 2013. able: https://docs.google.com/a/chromium.org/viewer?a=v&pid=sites& [5] B. Briscoe et al., “Reducing internet latency: A survey of techniques and srcid=Y2hyb21pdW0ub3JnfGRldnxneDoxMzcyOWI1N2I4YzI3NzE2 their merits,” IEEE Communications Surveys Tutorials, vol. 18, no. 3, [30] libcurl — Client-side URL Transfers. [Online]. Available: https: pp. 2149–2196, thirdquarter 2016. //curl.haxx.se/libcurl/c/libcurl.html [6] R. Secchi, A. Mohideen, and G. Fairhurst, Evaluating the Performance of [31] phttpget - pipelined http get utility. [Online]. Available: http: Next Generation Web Access via Satellite. Cham: Springer International //www.daemonology.net/phttpget/ Publishing, 2015, pp. 163–176. [32] phttpget - pipelined http get utility with sctp support. [Online]. Available: [7] N. Khademi, D. Ros, and M. Welzl, “The new aqm kids on the block: https://github.com/NEAT-project/HTTPOverSCTP/tree/multistream An experimental evaluation of codel and pie,” in 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), April [33] thttpd — Tiny/Turbo/Throttling HTTP Server. [Online]. Available: 2014, pp. 85–90. http://acme.com/software/thttpd/ [8] N. Dukkipati, T. Refice, Y. Cheng, J. Chu, T. Herbert, A. Agarwal, [34] thttpd with sctp support. [Online]. Available: https://github.com/nplab/ A. Jain, and N. Sutin, “An argument for increasing tcp’s thttpd/tree/multistream initial congestion window,” SIGCOMM Comput. Commun. Rev., [35] P. Natarajan et al., “SCTP: An innovative transport layer protocol for vol. 40, no. 3, pp. 26–33, Jun. 2010. [Online]. Available: the web,” in Proceedings of the 15th international conference on World http://doi.acm.org/10.1145/1823844.1823848 Wide Web. ACM, 2006, pp. 615–624. [9] R. Secchi, A. Mohideen, and G. Fairhurst, “Performance analysis of [36] D. N. Cheng. Y, Cardwell. N, “Rack: a time-based fast loss next generation access via satellite,” Int. J. Satell. Comm. N. (IJSCN), detection algorithm for tcp,” Internet Draft draft-cheng-tcpm- vol. 34, no. 6, Dec. 2016. rack, Mar. 2017, Work in Progress. [Online]. Available: https: [10] I. Grigorik, “Making the web faster with HTTP 2.0,” Commun. ACM, //tools.ietf.org/html/draft-ietf-tcpm-rack-02 vol. 56, no. 12, pp. 42–49, Dec. 2013. [37] R. I, X. L, Z. A. Ha. S, E. L, and S. R, “Cubic for fast [11] J. Chu, N. Dukkipati, Y. Cheng, and M. Mathis, “Increasing TCP’s long-distance networks,” Internet Draft draft-rhee-tcpm-cubic, Sep. Initial Window,” RFC 6928 (Experimental), Internet Engineering Task 2017, Work in Progress. [Online]. Available: https://tools.ietf.org/html/ Force, Apr. 2013. [Online]. Available: http://www.ietf.org/rfc/rfc6928.txt draft-ietf-tcpm-cubic-06 [12] Y. Cheng, J. Chu, S. Radhakrishnan, and A. Jain, “TCP Fast Open,” [38] S. Ferlin, Alay, T. Dreibholz, D. A. Hayes, and M. Welzl, “Revisiting RFC 7413 (Experimental), Internet Engineering Task Force, Dec. 2014. congestion control for multipath tcp with shared bottleneck detection,” in [Online]. Available: http://www.ietf.org/rfc/rfc7413.txt IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference [13] Y. Cheng and N. Cardwell, “RACK: A Time-based Fast Loss on Computer Communications, April 2016, pp. 1–9. Detection Algorithm for TCP,” Internet Draft draft-cheng-tcpm- [39] N. Dukkipati et al., “An argument for increasing TCP’s initial congestion rack-00, Oct. 2015, Work in Progress. [Online]. Available: https: window,” ACM SIGCOMM Comput. Commun. Rev., vol. 40, no. 3, pp. //tools.ietf.org/html/draft-cheng-tcpm-rack-00 27–33, Jul. 2010.

71 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

[40] J. Roskind, “QUIC: Multiplexed stream trans- port over UDP,” Google working design document, 2013. [Online]. Available: https://docs.google.com/document/d/ 1jdKEQMlM7ThDMDalFYFR 9-Yw91PhoBmkAPQcCicX3s/pub [41] G. Papastergiou et al., “De-ossifying the internet transport layer: A survey and future perspectives,” IEEE Commun. Surveys Tuts., 2016. [42] G. Fairhurst, T. Jones, Z. Bozakov, A. Brunstrom, D. Damjanovic, T. Eckert, K. R. Evensen, K.-J. Grinnemo, A. F. Hansen, N. Khademi, S. Mangiante, P. McManus, G. Papastergiou, D. Ros, M. Tuxen,¨ E. Vyncke, and M. Welzl, “NEAT Architecture,” NEAT Project (H2020-ICT-05-2014), Deliverable D1.1, Dec. 2015. [Online]. Available: https://www.neat-project.org/publications/ [43] Y. Cui et al., “Innovating transport with QUIC: Design approaches and research challenges,” IEEE Internet Computing, vol. 21, no. 2, pp. 72–76, 2017.

72 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

C Paper: Raising the Datagram API to Support Transport Protocol Evolution

The following research paper [38] has been produced by project participants and has been presented in the FIT Workshop at 16th International IFIP TC6 Networking Conference, Networking 2017.

73 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Raising the Datagram API to Support Transport Protocol Evolution

Tom Jones, Gorry Fairhurst Colin Perkins University of Aberdeen, Aberdeen, U.K. University of Glasgow, Glasgow, U.K. Email: {tom, gorry}@erg.abdn.ac.uk Email: [email protected]

Abstract— calls to control important semantics) but also overly simplistic, Some application developers can wield huge resources to build pushing applications to implement transport features them- new transport protocols, for these developers the present UDP selves [3]. Socket API is perfectly fine. They have access to large test Secondly, there is an increasing trend to see UDP not as a beds and sophisticated tools. Many developers do not have these resources. This paper presents a new high-level Datagram API transport protocol, but as a demultiplexing substrate layer that that is for everyone else, this has an advantage of offering a supports the deployment of new transport protocols [4] [3]. clear evolutionary path to support new requirements. This new This is a reaction to ossification of the network: the intended API is needed to move forward the base of the system, allowing transport demultiplexing point is the Protocol field in the IPv4 developers with limited resources to evolve their applications header, or the Next Header field in IPv6, but this in unusable in while accessing new network services. practice since use of values identifying transports other than I.INTRODUCTION TCP and UDP will result in firewalls dropping the packet. Accordingly, the transport demultiplex is moving up the stack, The Berkeley Sockets Application Programming Interface with dynamic binding of identifiers to transport protocols using (API) is the main interface to the network for developers. UDP port numbers negotiated by out-of-band signalling [5]. It has been hugely successful, with few changes to its core We face new sophisticated [4] applications driving a grow- semantics over its 35 year history. The API has scaled to ing volume of UDP traffic in the Internet, and the emergence support applications running services over networks that could of UDP as a core protocol for evolving Datagram transport not have been envisaged during its inception, at a scale that [6], an important question emerges: is the current network could not have been imagined. transport API fit for purpose? Application protocols using the socket API may choose to This paper explores whether the simple UDP socket API is build on top of a stream protocol, running over TCP, or a sufficient for the next step in evolution of Internet Transport, datagram protocol using User datagram Protocol (UDP). UDP or whether applications can benefit from a higher-level API is the simplest transport [1], offering a minimal protocol over that builds upon thirty years experience of using the network. IP, with service multiplexing with port numbers and optional We examine whether this new API could open-up access to checksums with best effort unreliable delivery. The UDP network functions (such as Quality of Service (QoS), Explicit socket API provides a very simple interface for applications Congestion Notifications (ECN), control of packet size) help to send and receive datagrams, with the ability to control the enable session level functions (such as path selection for multi- options/parameters required to build applications [2]. homing, mobility and firewall punching), and greater support While this model has been a success, the socket API is now for fault reporting in increasingly complex network topologies. showing its age. It is becoming clear that it does not offer a This work is part of a larger effort designing, implementing clear evolutionary path to support new requirements, as needs and attempting to deploy new transport services and APIs. This of applications change, and as the network changes beneath. includes the IETF Transport Services (TAPS) working group, There are two major areas where this has become visible. First, defining and documenting the transport services available for as more sophisticated applications are developed, and as the applications; the IETF QUIC and RTCWEB working groups, complexity of the network grows, we increasingly see that the defining new transports running over UDP; and research Datagram socket API does not provide a sufficiently expressive projects such as the EU NEAT [7] project that is building interface. For some applicaitons, the connection-less unreliable a system to enable transport evolution and ease deployment. datagram service is a core feature. Others would prefer more The next section identifies some issues applications have transport support, but must use UDP because they require using the UDP Socket API. Section III examines how changes partial reliability, control of transmission timing, non-standard in the protocol stack above and below the current API can help congestion control, multicast, or any of the other features that applications evolve. Section IV presents an API to address the are only possible with UDP in the present API. The API is issues raised. Section V discusses how enabling evolution is increasingly baroque (e.g., using obscure setsockopt() only possible with a new API. The paper concludes in Sections ISBN 978-3-901882-94-4 c 2017 IFIP VI and VII with a brief look to the future and discussion.

74 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

II.BACKGROUND same setsockopt API is used to control features that The Socket API closely models the file system API. Calls are semantically not socket options [2]. For example, the to send and receive are mapped to performing read IP_ADD_MEMBERSHIP option triggers an IGMP join of a and write calls on the socket for the network connection. multicast group, with semantic closer to that of connect(). Datagram-orientated protocols are modeled as atomic read and write socket operations that either succeed or fail A. What features are missing from the Sockets API? depending on the buffer size. UDP is offered in this API Establishing Connectivity: TCP-based applications tend to as either a connected or unconnected transport, the default use one of a small number of protocols, e.g, HTTP, SSH, FTP, unconnected state allows a sender to send datagrams to an IP and typically run in a client-server manner. The connect(), address. Connecting a UDP socket causes the socket to pass listen(), and accept() API fits this use case cleanly, ICMP errors up to the application. Connections have no side and is straightforward for Network Address Port Translation effects on the wire, offering only a shortcut to applications by (NAPT) or firewall traversal: ports are opened in response using the explicit connected address for datagrams [3]. to outgoing connection establishment packets, for the 5-tuple The UDP API offers only a few methods to access its mini- representing the connection; the traffic is inspected to ensure mal services. Applications can create a socket, look up a host, it looks like the corresponding protocol; and the connection is connect, set options and send and receive data, represented by closed when a FIN is seen (or after a timeout). the pseudo code code for a typical client in Listing 1. There are a diverse set of Applications built on UDP int main() that need themeselves to perform some form of connection { int sockfd, rv, numbytes; establishment. Many more protocols are in use, communi- struct addrinfo hints, *servinfo, *p; cation patterns are more varied and often peer-to-peer, and hints.ai_family = AF_UNSPEC; the stateless nature of the transport protocol means that mid- hints.ai_socktype = SOCK_DGRAM; if ((rv = getaddrinfo(argv[1], SPORT, &hints, & dleboxes that track transport protocol state to maintain holes servinfo)) != 0) { fprintf(stderr,"getaddrinfo:%s\n", must resort to using timers to keep the firewall open. This gai_strerror(rv)); environment makes it likely that UDP-based applications will return 1; } encounter connectivity issues. This is especially true when the while (true) { if ((numbytes = sendto(sockfd,"hello", remote endpoint is a peer that is also behind a NAPT. One strlen("hello"), 0, solution to this problem uses the combination of STUN [8] p->ai_addr, p->ai_addrlen)) == -1) { perror("talker: sendto"); to determine NAPT binding and probe connectivity, TURN exit(1); } relays as dynamically configured proxies for UDP [9] or if ((numbytes = recvfrom(sockfd, buf, TCP [10] flows, the Interactive Connectivity Establishment MAXBUFLEN-1 , 0, (struct sockaddr *)&their_addr, & (ICE) algorithm [11] to categorise network impediments and addr_len)) == -1) { perror("recvfrom"); systematically probe connectivity, and a relayed signalling exit(1); protocol to rendezvous with the remote host and exchange } } candidate addresses for connectivity. While the signalling is close(sockfd); return 0; likely inherently application specific, there is scope to imple- } ment the other functions generically, as a path layer below the Listing 1. Example of a client application using the UDP Socket API. (The socket API, rather than have each application implement the example client, looks up the remote host, chooses an IP address and settles entire complex NAPT traversal stack.1 into a loop of sending and receiving data until the application completes.) Support for multiple interfaces: An application running An application may modify protocol options via the on a multihomed host has to account for the presence of setsockopt/getsockopt API calls. These provide the multiple interfaces and that those interfaces will vary in only way to interact with the lower networking stack. Com- properties and connectivity over time (mobility). The present monly used options allow control of the differentiated services API does not make it easy for applications to discover the code point (DSCP) used, setting the ECN field, setting the hop local interfaces or their properties, it requires application count for the IP datagram, the link maximum transmission unit level methodologies to discover what is working. Applications (MTU), and the “don’t fragment” (DF) bit in the IP header. must determine whether network information gathered on one The setsockopt API allows applications to set options, interface is valid on another interface. Issues can arise with but provides no mechanism for discovering whether they will locality if name servers are used across domains for accessing work and no path for falling back to options known to always resources. DNS resolution on multihomed systems can also work. This can make it dangerous to use QoS or ECN: if the be problematic [13]. The interface used for name resolution application has to provide fallback code it is more likely it (getaddrinfo()) does not support multiple interfaces, and will stick to a safe set of values. Further, the set of options has 1 evolved over time, and is inconsistent between platforms and This can be viewed as a generalisation of the happy eyeballs connection racing technique [12] used by TCP applications to probe IPv4 and IPv6 often presents variants of the same function. This complicates connectivity. That, too, would benefit from a consistent implementation in application portability between platforms. Furthermore, the the socket layer.

75 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

complications can result with geographic load balancing and mented above the API [3] (e.g., choice of codec to meet applications that require mobility. congestion control constraints in a conferencing appli- Control over quality of service and reliability: TCP provides cation, or how to trade loss vs. capacity constraints). a reliable, ordered, byte stream service, that is subject to A richer API should allow an application to request a set head-of-line blocking while waiting for retransmissions of of abstract properties for the transport service it desires (e.g., lost packets. There is no portable way to inspect the receive requiring a datagram service, whether high capacity is needed, buffer, or access data out of order [14]. When needed, UDP- whether there is benefit from low latency, whether low cost based applications must implement (partial) reliability above is preferred, etc). Understanding application needs can help a the API. The developer has to take responsibility for building Datagram API because it can then automate functions that are a solid network system [3]. hard for an application to optimise. Congestion control: For TCP, congestion control is assumed to take place below the Sockets API, and there is no interface A. Below the Datagram API to select the algorithm, or query congeston state. For UDP- A higher level API can reduce the volume of code required based application, congestion control must be implemented to build an Internet application, it can also significantly reduce above the API, with no support from the socket. the complexity the application has to manage. Providing a In summary, the socket API has been a companion for starting point to automate appropriate choices below the API. developers writing application protocols for decades, but the The system below the API needs to interpret properties from interface is starting to show its age. It provides a poor API the application together with system wide properties. Turning for many important features, and requires applications to im- these into concrete actions requires a policy system to select plement other features in their entirity. To address these issues protocol mechanisms, help discover interfaces and inform libraries can be integrated into an application, but integrating parameter choices. For example, a video streaming application such a library requires modifications to the code base, and could request properties that indicate a minimum capacity the libraries event model has to be made compatible with required for the datagram service and QoS preferences to the application. Supporting each and any new feature means minimise latency while constraining cost. Listing 2 shows an the application has to integrate more with libraries that help example JSON policy file that indicates a QoS Live Video support them, with a corresponding increase in complexity and precedence. maintenance costs. A new, standard, API is needed. { "transport":[ III.RAISETHE DATAGRAM API { "value":"Datagram","precedence": 1 }, The current socket API is too low level. Even applications ], that need direct access to the network can benefit from a "qos":[ { higher level Datagram API. By placing commonly needed "value":"Interactive Video","precedence": functions below this API, applications can specify what they 1 }, want from the stack, but allow the system below the API { "value":"Live Video","precedence": 2 to perform the actions needed to realise a service. However, } it is not immediately obvious what of the set of functions ] "network":[ identified in Section II-A should lie below a new Datagram { "value":"cost","precedence": 1 API, and which should remain in the application. On the one }, hand, part of the success of UDP was an API that enables { "value":"capacity","precedence": 2 application choice of how to interact with the network. On } ] the other hand, applications increasingly require solutions to } the same set of problems and implementing these can benefit from wider context of the network paths and interfaces, and Listing 2. Example JSON file describing a NEAT Abstract Policy allow mechanisms to evolve independently of applications. To provide network context for functions below the API, in- We use the following principles to guide our choice of which formation needs to be gathered about the properties of network functions should be placed below the Datagram API: paths, and network service interfaces. This knowledge base can 1) An application using the new API that does nothing new, be related to policy and application requirements to enable should be able to at least receive similar service to that the application to rely on the system making good choices of the socket API. about how to use the network. At the simplest level, this 2) Commonly needed functions should be placed below implies understanding of available network service interfaces - the API when these are automatable (do not require by gathering information (e.g., MTU, line rate, address) about application decisions). local physical and virtual interfaces (e.g., across tunnels or 3) Functions where the preference can be expressed as a source addresses that bind to provisioning domains). policy can also be placed below the API. TCP maintains information about the paths that have been 4) Functions that rely on application algorithms or detailed used from an endpoint, and similar data may be collected for knowledge of trade-offs relating to data should be imple- use by UDP – such as the path MTU, capacity recently used,

76 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

etc.). This can also help eliminate transport candidates that of a second, but there is some flexibility to change what is include protocols that are known not to be supported on a being sent, if not when, but this requires cooperation of the specific path. Further information may be gleaned from the media codecs. Real-time performance offers the application to experience of protocols using a path, including experienced be tightly coupled with the congestion controller, and for both round trip time (RTT) and capacity insight from coupled the application to respect the congestion constraints and the congestion control [15]. Other functions could also be auto- congestion control to respect application limitations. mated here, such as NAPT keep-alive and black-hole detection, easing the tasks of finding a candidate path, failover between IV. THE DATAGRAM API FORTHE NEAT SYSTEM paths, concurrent use of multiple paths. This section provides a concrete example of some of the API We also note that application developers and users need to aspects discussed in the paper, based on the open source NEAT be able to understand the decisions made on behalf of the System [16], developed as part of the EU NEAT project [17]. application. While most of the time it is expected that good Designed as a replacement for the socket API, this provides a decisions will be taken, there is a need to understand why a one-sided change to the transport API at the sender. particular policy or application property resulted in a particular The new API offers applications access to abstract transport choice. This supports troubleshooting and allows polices to be services. This allows selection between the available transport refined when needed – this in itself is valuable compared to the protocols including TCP, SCTP, SCTP/UDP, UDP and UDP- current information made available by the UDP socket API. Lite via a single unified API. Mechanisms beneath the API, provide many functions including help to discover the set of B. Above the Datagram API protocols that may work across an Internet path. We recognise that some functions cannot be easily migrated A simple example in Listing 3 illustrates the lifetime of an below the API. While datagram congestion control can benefit application using the NEAT System. The application creates from standard mechanisms/algorithms, the details are often a NEAT context, within which it then creates a NEAT flow, linked to application design, applications have to provide their using application policies passed in JSON to describe the own congestion control. This function is expected to remain abstract properties it requires or desires. The Policy Manager above the API. In contrast, the system below the API could combines these policies with a global configured policy to offer circuit breaker functions when required to control the inform its decisions, e.g., to generate a list of transport envelope of the capacity consumed by an application [3]. candidates. The NEAT Characteristic Information Base (CIB) NAPT traversal could be automated for simple cases, but is populated with information about the network interfaces and many applications need complex processing to finally select paths allowing decisions to also consider network, path and amongst a set of transport candidates. This is often compli- transport statistics. cated by the need to interact with rendezvous points, signalling static struct neat_flow_operations ops; intermediaries and to understand session-level negotiation di- static struct neat_ctx *ctx = NULL; static struct neat_flow flow = NULL; alogues. For these reasons more sophisticated applications are * ctx = neat_init_ctx() likely to continue to utilise ICE libraries to perform the NAPT flow = neat_new_flow(ctx) traversal. None-the-less the availability of information from prop ="(see Listing 2)"; neat_set_property(ctx, flow, &prop) below the API (such as speed, cost, reliability) can help select ops.on_writable = on_writable; ops.on_readable = on_readable; candidates. The automation of path-related functions such as ops.on_error = on_error; keep-alive and path MTU discovery can eliminate features that neat_set_operations(ctx, flow, &ops) otherwise would need to be implemented above the API by neat_open(ctx, flow, hostname, port) an application. neat_start_event_loop(ctx, NEAT_RUN_DEFAULT); The transport 5-tuple of source IP, port, destination IP, port static neat_error_code on_writable( struct neat_flow_operations *opCB) and transport protocol is used to identify datagrams forming { neat_write(opCB->ctx, opCB->flow, buf) a flow. If an application is multi-homed or mobile between return NEAT_OK; multiple network interfaces the 5-tuple cannot be used to } identify the endpoint. Mobility between interfaces requires static neat_error_code on_readable( struct neat_flow_operations *opCB) context (including a connection_ID) beyond individual flows { neat_read(opCB->ctx, opCB->flow, buf) and can outlive transport usage, and as such is primarily return NEAT_OK; an application function, although such mechanisms may take } advantage of context information gathered below the API. Listing 3. NEAT Example Application listing C. Traversing the Datagram API Rather than an imperative polling-based socket API, an ap- Some functions require cooperation between the application plication uses a callback-based API to access the NEAT and transport to be effective, and straddle the Datagram API. System. It therefore needs to provide a set of callback han- An example might be congestion control for an interactive dlers for each NEAT Flow. In Listing 3, the application video conference. This has strict timing constraints: audio sets up the on_writeable callback. The application calls frames must be sent every 20ms and video frames every 1/60th neat_connect with the name and port of a listening server.

77 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

The key differences between Listings 3 and 1, is that the [19]) information, protocols to communicate path information NEAT System performs common network actions automati- (PLUS [20]), and an extension to UDP to permit transport cally on behalf of the application. The example policy Listing of options. All are work-in-progress in the IETF, available 2 illustrates a high level abstract transport request, that can as Internet Drafts, but none are as yet fully specified nor inform selection of an appropriate DSCP, and help identity implemented. transport candidates when multiple network interfaces are active. Once created, a NEAT flow is comparable to a socket, There are a growing number of devices that are capable but offers much more utility, using the callback-based API to of connecting to multiple network services. These devices call an application on network events or data. may have multiple physical interfaces, and additional could For connection-oriented protocols the NEAT System can support virtual interfaces (e.g., able to send using multiple select an appropriate transport configuration for a flow using IPv6 address prefixes). PvD is an architecture for endpoints Happy-Eyeballs selection logic [18] to choose between trans- in a multiple network interface environment to discover net- port candidates and instantiate a concrete Operating System work configuration information. A PvD-aware endpoint can socket. The process of selecting a transport is different for a use a protocol to discover authoritative information such as; Datagram services that do not have a connection-setup (e.g. source address prefixes, DNS server locations, HTTP Proxy UDP or UDP-Lite). It is not possible for the NEAT System location, default gateway address, and could be extended to to know whether a datagram flow has suffered a connectivity include characteristics of the service (e.g., maximum capacity failure (e.g., by expiry of NAPT state, routing changes, or from available, existence of supported QoS services, cost of using choice of a DiffServ Code Point that is not available). Such an interface, etc). The new PvD protocols provide a way for an information is only known at the application layer through the endpoint stack to select transport candidates and also to assist reception of receiver-generated feedback messages. in configuring the protocols for the local network service. In Datagram applications can use the Happy Applications addition, applications could benefit from an interface to the mechanism to register a periodic callback. This allows mecha- PvD information - in the NEAT System this type of function nisms below the API to query the application, asking whether is provided by the Policy Manager. it is ’happy’ with the progress of a NEAT flow. This allows Path Layer UDP Substrate (PLUS) is a proposed encapsu- datagram applications to perform automatic selection and lation header and protocol that provides bi-directional com- fallback, handled by the NEAT System Because the appli- munication over UDP. This is intended to convey selected cation decides what makes it ’happy’, it can trigger selection transport information to middleboxes on an Internet path, mechanisms based on application level criteria. If a certain even in the face of pervaisive application encryption. The capacity were requested and latency and the chosen transport transport-agnostic method assists state management for pin- candidate was not able to satisfy this, the system now has holes through NAPTs, firewalls and other boxes on the net- the correct signals to choose a second transport candidate. work path. Extensions to PLUS can provide path information On completion, the application may retrieve NEAT Flow (e.g. advice on MTU or available capacity) that may help parameters to determine the transport state (e.g., addresses, a transport protocol and can inform selection of a suitable port numbers, DSCP, etc). transport candidate. PLUS could also evolve to support non- The NEAT System leaves implementation of support for data-related diagnostics, e.g. measure progress of flow, dupli- mobility and/or ICE to datagram applications. Similarly, li- cation/loss, relative fairness, etc. braries to support application-oriented functions, such as the Real Time Protocol (RTP), can be used over the NEAT API. Another proposal suggests adding options to UDP [21]. This utilises the UDP length field, in a way resembling its V. SUPPORTING PROTOCOL EVOLUTION use in UDP-lite, but in this case to provide a field in which One key advantage of raising the API is the ability to options can be attached to any datagram. UDP Options could enable protocol evolution [16] in support of new application be utilised by mechanisms below the Datagram API to provide requirements. Datagram transport protocols can by developed a standard way to communicate control information. This independently of the kernel, and our more expressive API receiver demultiplexing lets the transport extract control infor- eases this evolution. It also enables innovation in the use mation not intended for the application. This would therefore of the network, by easing the introduction of new protocols allow the stack to send probes/measurements on behalf of the and mechanisms, and by taking advantage of these when the application, such as using an "echo this data" message for RTT network becomes available. A particular supporting protocol measurement, keep-alive probes, Path MTU probes, etc. may only be available in a few networks or supported by limited equipment, which might be insufficient to justify It is too early to tell where there is merit for the community inclusion in every application, but the features it offers may at large in using new service discovery methods (such as be attractive across a range of application when it happens to PvD), a new encapsulation (such as Plus) or an update to UDP be supported, which could justify inclusion below the API. (such as UDP Options). However, we do note that transition We consider three examples of mechanisms below the to support such transport evolution would be greatly eased by Datagram API: Protocols to supply provisioning domain (PvD the presence of a higher-level Datagram API.

78 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

VII.CONCLUSION Application UDP is increasingly playing the role of a demultiplexing Application Datagram API substrate layer, dynamically binding the transport protocol to a signalled “port” number. The UDP Sockets API needs to Socket API Transport evolve to be less an application programming interface, and Policy Transport Manager more a transport protocol interface. The usefulness of the UDP Transport Demux present UDP Sockets API has passed. It is time to raise the CIB Datagram API to support transport protocol evolution. IP IP The application interface must migrate up the stack, to provide a higher level of abstraction for applications, while Fig. 1. Stack Evolution: Left, The traditional socket API. Right, A new allowing transport flexibility to meet their needs. This provides Datagram API utilising Policy and Transport-layer Demultiplexing a substrate for new low-level transport protocol development, while providing the transport services needed by the next generation applications. VI.LOOKINGTOTHE FUTURE ACKNOWLEDGMENT The network has become ossified and experience has shown This work has received funding from the European Union’s Horizon it has been virtually impossible to deploy transport protocols 2020 research and innovation programme under grant agreement No. 644334 (NEAT). The views expressed are solely those of the author(s). with different IP protocol numbers. TCP development requires modifications in operation system kernels, needing a large REFERENCES effort for the developer to deploy enhancements. The time [1] J. Postel, “User datagram protocol,” IETF, RFC 768, August 1980. required to have enough hosts running an enhancement to see [2] G. Fairhurst and T. Jones, “Features of the user datagram protocol (UDP) and lightweight UDP (UDP-lite) transport protocols,” IETF, Work in a benefit impedes iteration times. progress, October 2016. Accordingly, new protocol development is happening on top [3] L. Eggert, G. Fairhurst, and G. Shepherd, “Udp usage guidelines,” RFC of UDP (Figure 1, left stack). This has several advantages. 8085 (BCP), 2017. [4] J. Iyengar and M. Thomson, “Quic: A udp-based multiplexed and secure First, and most critically, it enables the permissionless end- transport,” IETF, Internet-Draft, January 2017. to-end deployment of new transports: UDP has wide enough [5] S. McQuistin and C. S. Perkins, “Reinterpreting the transport protocol deployment that it can be expected to work in most networks. stack to embrace ossification,” in Proc. Workshop on Stack Evolution in a Middlebox Internet. Zürich, Switzerland: IAB, January 2015. Secondly, a UDP demultiplexing substrate introduces minimal [6] T. Herbert, L. Yong, and O. Zia, “Generic udp encapsulation,” IETF, bandwidth and processing overhead. In addition, there is Internet-Draft, October 2016. already at least some support in middlebox devices (NAPT, [7] NEAT, “NEAT Project,” https://www.neat-project.org, 2017. [8] J. Rosenberg, R. Mahy, P. Matthews, and D. Wing, “Session traversal Firewalls) that can be used as a starting point for deployment. utilities for nat (stun),” IETF, RFC 5389, October 2008. Finally, the UDP API is widely supported allowing user-space [9] R. Mahy, P. Matthews, and J. Rosenberg, “Traversal using relays around stacks to directly access the network without requiring special nat (turn): Relay extensions to session traversal utilities for nat (stun),” IETF, RFC 5766, April 2010. privileges. The latter overcomes the time and effort required to [10] S. Perreault and J. Rosenberg, “Traversal using relays around nat (turn) integrate a new transport across a range of operation systems. extensions for tcp allocations,” IETF, RFC 6062, November 2010. The history of the Stream Control Transmission Protocol [11] J. Rosenberg, “ICE: A protocol for NAT traversal for offer/answer protocols,” IETF, RFC 5245, April 2010. (SCTP) illustrates the benefits of using UDP as a demulti- [12] D. Wing and A. Yourchenko, “Happy eyeballs: Success with dual-stack plexing substrate. SCTP has an assigned IP protocol number hosts,” IETF, April 2012, RFC 6555. (132), and is moderately widely implemented as a native [13] T. Savolainen, J. Kato, and T. Lemon, “Improved recursive dns server selection for multi-interfaced nodes,” IETF, RFC 6731, December 2012. transport, but has seen only limited deployment because it [14] S. McQuistin, C. S. Perkins, and M. Fayed, “TCP Hollywood: An does not pass residential NATs/firewalls. When running over unordered, time-lined, TCP for networked multimedia applications,” in UDP, as the WebRTC data channel [22], however, SCTP has Proc. Networking Conference. Vienna, Austria: IFIP, May 2016. [15] S. Islam and M. Welzl, “Start me up: Determining and sharing TCP’s seen worldwide, deployment in web browsers, in part because initial congestion window,” in Proc. IRTF ANRW. ACM, 2016. of ease of implementation in user-space, and in part because [16] K.-J. Grinnemo, T. Jones, G. Fairhurst, D. Ros, A. Brunstrom, and it is not blocked by most firewalls/NATs. P. Hurtig, “Towards a flexible internet transport layer architecture,” in Proc. LANMAN. IEEE, jun 2016. Large developers are evolving their applications, we see [17] NEAT, “NEAT Source Code,” https://github.com/NEAT-project, 2017. efforts from Facebook, Google, Apple and others to develop [18] G. Papastergiou, K.-J. Grinnemo, A. Brunstrom, D. Ros, M. Tüxen, new protocols on top of UDP. The new protocols offer a higher N. Khademi, and P. Hurtig, “On the cost of using happy eyeballs for transport protocol selection,” in Proc. IRTF ANRW. ACM, 2016. level API to the applications, this API is locked away under [19] D. Anipko, “Multiple provisioning domain architecture,” IETF, RFC layers of application state. If you are not the browser vendor 7556, June 2015. with the new HTTP transport protocol you are playing catch up [20] B. Trammell and M. Kuehlewind, “Path layer udp substrate specifica- tion,” IETF, Work in progress, December 2016. to remain on a level with their networking stack. Developers [21] J. Touch, “Transport options for UDP,” IETF, Work in progress, February that do not have the same wide scale of resources have access 2017. only to the socket API, this is not sufficient to continue to [22] H. Alvestrand, “Transports for WebRTC,” IETF, Work in progress, October 2016. evolve on the Internet.

79 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

D Paper: A Datagram API for Evolving Networks Beyond 5G

The following research paper [39] has been produced by project participants and has been presented in the European Conference on Networks and Communications, June 2017.

80 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A Datagram API for Evolving Networks Beyond 5G

Tom Jones, Gorry Fairhurst Eric Vyncke University of Aberdeen, Aberdeen, U.K. Cisco, Brussels, Belgium Email: {tom, gorry}@erg.abdn.ac.uk Email: [email protected]

Abstract—The UDP Socket API has matured to support a Application wide range of uses, while only offering minimal services to applications. This poster describes the high-level Datagram API Application NEAT API offered by the NEAT System and how this enables applications to take advantage of information about the available network Socket API NEAT Stack Policy Manager services, to effectively use next generation networks. UDP Transport Transport CIB 1 I.INTRODUCTION IP IP UDP PvD The current socket API presented to an application using 2 OPT 3 datagram services is very simple. This means it does not have IP Interface IP Interfaces the context to make effective choices to support use the most efficient use of the network. Network traffic can sometimes Fig. 1. Left, The traditional socket API, operating ove a single IP Interface. be sent over an interface that poorly matches the requirements Right, A higher API utilising Policy operating over many networks of the application, because the stack does not know which interface is best and the application cannot tell the stack what it needs. Changes are needed to both prevent wasted wireless A wealth of information is already becoming available spectrum and avoid poor application performance. to endpoints about the local network environment and path. It is particularly important for applications to make good However, this information is only available to applications decisions about the way they use the network, as applications using high level interfaces, e.g. an application on iOS using become more demanding (e.g., high definition TV, ultra-low nsurlconnection to automatically roam from WiFi to latency tactile Internet) and links have more diverse network 4G/5G as the device approaches the WiFi network edge. Each characteristics - ranging from robust low-speed services to application must incorporate specific support libraries. the large capacities of 802.11ac supplying above 1 Gbps (and We conclude that the current standard datagram API does promises higher capacity from 90 GHz technologies). not have the functionality for the network to evolve. Something Many modern devices have more than one interface. In- more is needed as the network continues to evolve, to more terface properties can differ significantly depending on un- effectively use the network. derlying network technology, loading and congestion. Re- cent advances in wireless technology further increase the III.RAISETHE DATAGRAM API heterogeneity of networks. Applications often face a brittle We introduce a higher-level datagram API that allows an experience, with extreme changes in network characteristics application to express requirements and expectations about the over short periods, and need to make decisions about which service to be provided while enabling mechanisms beneath the path is best for a specific application. API to automate decisions. This approach has been adopted This turns out to be a tricky balancing problem - on the in the system created by the New Evolutive API for Internet one hand, network operators have a clear understanding of the Transport (NEAT) [2] EU H2020 project. NEAT builds a high services provided by the technologies they use (often including level Open Source [3] callback driven API on the socket API. loading of the local network infrastructure), but there is no standard way to make this available to applications. On the A. The NEAT API other hand, applications can use congestion control and prob- The new API [4] offers applications access to abstract ing techniques to find out what can be usable, but its unfeasible transport services, via the NEAT API. A NEAT Flow is a and undesirable to simply probe all paths continuously. single abstract construct for accessing the network making it possible for an application to be given different underly- II.THE DATAGRAM SERVICE API ing transport protocols. The NEAT System guarantees API The minimal UDP API [1] provides no facilities to help compatibility making applications independent of underlying UDP applications. This minimal service set means that ap- network capabilities. plications have to commonly implement a set of higher level The NEAT API allows applications to specify policy. A transport mechanisms. NEAT Application can request a set of high level services 978–1–5386–3873–6/17/$31.00 c 2017 IEEE it needs from the network with selectable costs. A Policy

81 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Manger (PM) provides candidates to choose transport and Using these interfaces a NEAT application requires a data- network parameters based on both the application and global gram service with a latency maximum of 20ms RTT and a system policy. These decisions utilise information stored in capacity of 2Mbit/s. The PM looks at the current CIB sources the Characteristic Information Base (CIB), which gathers (information from previous NEAT flows and connectivity char- information from CIB sources about the paths that NEAT acteristics signalled via PvD). Flows may use. Both the WiFi and the 5G links are able to match the We focus on three types of CIB source, figure 1: capacity requirements requested by the application, but the 5G 1) Local information other observed NEAT Flows or local network has a high price per byte. The NEAT Stack creates interface parameters. an initial flow for the application on the WiFi link and uses 2) Data provided network signalling protocols from the a UDP Option to measure the path RTT. By combining a local network operator. time stamp option with adding a datagram, a sender can then 3) Data determined using probing for path information with perform Packet Layer Path MTU discovery and hence allow an explicit control data to monitor path status application to efficiently segment its data even for a network path with multiple levels of tunnel encapsulation. B. Signalling Network Properties NEAT can support Abstract QoS mapped to DiffServ code A next generation network operator can provide Provision- point (DSCP), the CIB sources can provide a mapping relevant ing Domain (PvD) [5] information about networks it provides. to the current network. When supported the NEAT Stack can For each IP interface, this information could include IP address use UDP Options to validate the provided DSCP actually prefixes, HTTP default gateways, maximum throughput, cost, supplies the requested service, e.g. if a particular DSCP is etc. An endpoint can use the information to determine the blocked the NEAT Stack can fallback to use a working DSCP. best interface for an application to use from those available. IV. CONCLUSION For example the cost characteristic could be used to direct bulk data transfers over WiFi, while using cellular services for The usefulness of the present UDP Sockets API has passed. applications that require constant availability such as voice. It is time to raise the Datagram API to support evolution. A new API will allow the network stack to evolve to support the C. Probing for Datagram Path Information greater range of heterogeneity offered by 5G cellular systems UDP Options [6] is an example of introducing a new and to gain advantage from new protocol developments (e.g., experimental transport mechanism. It extends UDP to support PvD and UDP Options). Applications can be enabled to transport options. Options space is created by reusing the UDP specify preferences and requirements for the network service length field to indicate the payload length rather than the length they wish to receive. Understanding the properties of the actual of the entire datagram and uses the current IP length to indicate network services immediately enables more intelligent use of the length of the datagram and option space. the best (cheapest, fastest, least delay, most reliable) network. UDP Options make it possible for a higher level API to Below the new API, the stack can take advantage of implement new features for the Datagram service. Options give network characteristic information to build a corpus of data the ability to send a time stamp that is returned by the remote about available paths and offer a safe fallback that allows endpoint. This allows a NEAT System to determine the RTT of applications to delegate the choice to use a feature/service to the UDP path. This sort of path information can contribute to the stack. This becomes important as higher speeds (often less the CIB - e.g. allowing a NEAT System to understand whether predictable) emerge, new services develop and new transport a particular path is operational and to help ensure application mechanisms become available. traffic is appropriately, matched to the best path. This feature ACKNOWLEDGMENT is important to allow evolution to support types of network This work has received funding from the European Union’s Horizon services, e.g. to utilise higher speed opportunistic networks. 2020 research and innovation programme under grant agreement No. 644334 (NEAT). The views expressed are solely those of the author(s). D. An Example of Using NEAT to Enable Network Evolution REFERENCES This poster will provide examples to illustrate how a higher [1] G. Fairhurst and T. Jones, “Features of the user datagram protocol (UDP) Datagram API enables applications using the NEAT System to and lightweight UDP (UDP-lite) transport protocols,” IETF, Work in specify policies that the PM can use to choose and configure progress, October 2016. [2] NEAT, “NEAT Project,” https://www.neat-project.org, 2017. a datagram transport service. [3] github, “NEAT Source Code,” https://github.com/NEAT-project, 2017. Consider one example of an endpoint with both WiFi and [4] K.-J. Grinnemo, T. Jones, G. Fairhurst, D. Ros, A. Brunstrom, and 5G [7] interfaces. The WiFi IP interface is an open service with P. Hurtig, “Towards a flexible internet transport layer architecture,” in Proc. LANMAN. IEEE, jun 2016. no QoS guarantees. The cellular service presents multiple (four [5] B. Bruneau, P. Pfister, D. Schinazi, T. Pauly, and E. Vyncke, “Proposals in figure 1) 5G IP interfaces. These use multiple IPv6 prefixes to discover provisioning domains,” IETF, Work in progress, March 2017. - one perhaps related to a user’s personal subscription, one an [6] J. Touch, “Transport options for UDP,” IETF, Work in progress, February 2017. open service, and others extending their employer’s network [7] P. S. Schmidt, T. Enghardt, R. Khalili, and A. Feldmann, “Socket Intents: - all simultaneously available. The PvD information from the Leveraging Application Awareness for Multi-Access Connectivity„” in mobile operator, indicates the latency, capacity and cost. ACM CoNEXT.

82 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

E Paper: A NEAT Approach to Mobile Communication

The following research paper [32] has been produced by project participants and has been presented in the ACM SIGCOMM 2017 Workshop on Mobility in the Evolving Internet Architecture (MobiArch 2017), UCLA, CA, U.S.A., August 2017.

83 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A NEAT Approach to Mobile Communication

Per Hurtig Stefan Alfredsson Anna Brunstrom Karlstad University Karlstad University Karlstad University Kristian Evensen Karl-Johan Grinnemo Audun Fosselie Hansen Celerway Communications Karlstad University Celerway Communications Tomasz Rozensztrauch Celerway Communications ABSTRACT 1 INTRODUCTION The demands for mobile communication is ever increasing. Mobile Mobile communication is becoming evermore prevalent, and mobile applications are increasing both in numbers and in heterogeneity devices are the communication device of choice for most people. of their requirements, and an increasingly diverse set of mobile In 2016, the number of mobile devices grew to a total of 8 billion, technologies are employed. This creates an urgent need for opti- exceeding the world’s population, according to Cisco’s Global Mo- mizing end-to-end services based on application requirements, con- bile Data Trac Forecast [4]. Also, fourth generation (4G) trac ditions in the network and available transport solutions; something now exceeds third generation (3G) trac and accounts for 69 % of which is very hard to achieve with today’s internet architecture. the total mobile trac [4]. Work on the next generation of cellular In this paper, we introduce the NEAT transport architecture as a communications (5G) is ongoing (to be rolled out by 2020) aiming solution to this problem. NEAT is designed to oer a exible and to meet new business and consumer demands [11]. evolvable transport system, where applications communicate their The exponential growth in data demand on mobile networks, transport-service requirements to the NEAT system in a generic, and mobile link capacity approaching its theoretical limits, has transport-protocol independent way. The best transport option is bound us to nd new solutions and innovative network designs then congured at run-time based on application requirements, net- that can handle the enormous amount of trac. To solve the short- work conditions, and available transport options. Through a set of age of wireless capacity, mobile network operators have found real life mobile use case experiments, we demonstrate how applica- heterogeneous networks (HetNets) the most eective solution: Het- tions with dierent properties and requirements could employ the Nets provide adequate increase in capacity by utilizing a multi-tier NEAT system in multi-access environments, showing signicant architecture consisting of dierent types of small cells including performance benets as a result. micro-, pico-, and femtocells. In a broader perspective, wireless and mobile operators have approached increasing trac volumes by CCS CONCEPTS trying to converge dierent types of wireless and mobile networks, i.e., by introducing heterogeneity not only in terms of range but • Networks → Transport protocols; Network experimentation; also in terms of employed network technologies. This increased Mobile networks; heterogeneity has, in turn, spawned novel transport protocols like Multi-path TCP (MPTCP) [6], that tries to load-balance trac over KEYWORDS the available interfaces in a host to perform capacity aggregation. NEAT, transport selection, heterogeneity, multiple paths, policies, Still, heterogeneity by and in itself only provides part of the solu- TCP, MPTCP, cellular, WLAN, LTE, 4G, 3G tion. In particular, it remains to solve in what way a heterogeneous network infrastructure will be able to meet the transport-service ACM Reference format: demands of current and future applications, i.e., how such a net- Per Hurtig, Stefan Alfredsson, Anna Brunstrom, Kristian Evensen, Karl- work infrastructure will be able to accommodate heterogeneous Johan Grinnemo, Audun Fosselie Hansen, and Tomasz Rozensztrauch. 2017. transport-service requirements. A NEAT Approach to Mobile Communication. In Proceedings of MobiArch In this paper we introduce NEAT [8], a transport architecture ’17, Los Angeles, CA, USA, August 25, 2017, 6 pages. that can help meet the increasing demands of heterogeneous mobile DOI: 10.1145/3097620.3097622 applications. The NEAT architecture and accompanying software stack [10] is designed to oer a exible and evolvable transport system. Applications interface to the NEAT system through an en- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed hanced API that eectively decouples them from the operation of for prot or commercial advantage and that copies bear this notice and the full citation the transport protocols and the network features being used. Ap- on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or plications are able to communicate their transport-service require- republish, to post on servers or to redistribute to lists, requires prior specic permission ments to the NEAT system in a generic, transport-protocol inde- and/or a fee. Request permissions from [email protected]. pendent way. This allows the best transport option to be congured MobiArch ’17, Los Angeles, CA, USA © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. at run-time based on application requirements, network conditions, 978-1-4503-5059-4/17/08...$15.00 and network path support. Policy information as well as cross-layer DOI: 10.1145/3097620.3097622

84 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

MobiArch ’17, August 25, 2017, Los Angeles, CA, USA Hurtig et al.

the networking stack. Although dierent in their approaches, they WLAN all address problems with using multiple paths simultaneously, in-

Internet cluding path asymmetry, compatibility, and fairness. To this date, LTE none of the proposed solutions have seen large-scale deployment. 3G/4G However, MPTCP, an extension to TCP, have recently attracted a Client Server lot of attention from both researchers and the industry; for instance, it is used by Apple’s personal digital assistant, Siri. Figure 1: Multi-path use case. MPTCP allows multiple subows within a single connection. These subows are set up as regular TCP connections, except that they are bound into the MPTCP connection initiated by the rst subow. Data sent over dierent subows in MPTCP are frequently information on network conditions are used to dynamically opti- received unordered, due to dierent one-way delays of the sub- mize the communication without requiring a re-engineering of the ows. This incurs head-of-line blocking which causes delays at the involved applications. application. If a MPTCP connection comprises several subows, The general idea behind NEAT is to tackle the problem of transport- it is the MPTCP packet scheduler that selects on which subow a layer ossication, which currently limits applications to use the segment should be sent, and, considering properties of the selected transport services oered by either TCP [2, 16] or UDP [15]. The network path, which segment to send. A common practice is to problem of moving beyond TCP and UDP is that alternative proto- select subow on the basis of the shortest RTT [13, 17]: segments cols have seen limited deployment and therefore are not supported are rst sent on the subow with the lowest smoothed RTT (SRTT). on all systems and have diculties in passing through middleboxes Only when the congestion window of this subow has become such as rewalls. To address this problem, the Internet Engineering lled, segments are sent over the subow with the next lowest Task Force (IETF) has chartered the Transport Services (TAPS) [20] SRTT. working group to dene a protocol-independent interface for trans- Although MPTCP was designed as a general purpose transport port services. Thereby allowing applications to make use of new for multiple paths, it still suers from some of the basic problems transport protocols and protocol features when and where they are related to multi-path communication. The main problem, related available, without requiring the applications to be modied. NEAT to performance, is MPTCP’s inability to provide good performance is fully aligned with, and goes beyond, TAPS and is based on previ- when network paths are asymmetric in terms of capacity and delay. ous point solutions to e.g., provide more expressive sockets [7, 18], For example, when small objects are transferred, MPTCP is unable transparent transport selection [19], and middlebox-proof trans- to provide low latency, as it unnecessarily utilizes the slower path ports [12]. For a detailed survey on previously proposed solutions, for data transmission [5]. In such scenarios, the most ecient solu- and their respective merits in combating transport layer ossica- tion is to use a regular single-path protocol like TCP over the path tion, please see [14]. NEAT is, to the best of our knowledge, the with shortest delay. The use of multiple paths can, in fact, be a prob- only complete system encompassing TAPS. lem also for larger objects, if the paths are highly asymmetric [5]. In the remainder of this paper we illustrate the use of NEAT for Unfortunately, having access to asymmetric paths is a very common mobile communication. Through several mobile use case experi- scenario that results from being simultaneously connected to, e.g., ments, we demonstrate how applications with dierent properties 3G and WLAN. When paths are comparable in terms of capacity and and transport-service requirements could employ the NEAT system delay, but still slightly asymmetric, e.g., as is typically the case with in a multi-access WLAN and 4G/LTE environment, and, in doing so, LTE and WLAN, MPTCP has been shown to provide performance obtain a signicantly better service than would otherwise have been benets, compared to, e.g., TCP [3, 5, 21]. However, the selection the case. The background required for the considered multi-access of which interface to use for the rst subow plays an important use cases is introduced in Section 2 and the NEAT architecture in role to achieve such performance benets [3, 5]. The initial path is Section 3. The use cases and our experimental results are presented typically determined by the system’s underlying routing system. in Section 4, followed by concluding remarks in Section 5. In the past, this was mostly a good idea as the default interface to route trac over usually was the best, in terms of performance. 2 BACKGROUND However, with the emergence of LTE as access technology it is To support seamless mobility, portable devices are typically equipped hard to determine which interface will perform best a priori. Re- with multiple network interfaces using dierent access technolo- cent studies actually show that 40 % of the times, LTE performs gies (e.g., LTE/WLAN), as shown in Figure 1. The way dierent better than WLAN when both can be used simultaneously [5]. platforms and architectures employ access technologies to provide Using multiple paths to communicate eciently has proven to both connectivity and good performance diers. Some solutions be very hard, and there is no denite answer on when to do it, for use bonding at the link layer to provide transparent capacity ag- which trac, or even in which way. Both the use of multiple paths gregation, while other solutions handle link management at the and its conguration depends on factors such as trac patterns, application layer. The common goal of such solutions are, never- the level of asymmetry among paths and its current conguration. theless, to enable multi-path communication between networked Therefore, to make the best of solutions like MPTCP there is a need devices. To this date, a lot of research has been conducted on how for an architectural surrounding that can use knowledge of such to best accomplish multi-path data transmission. For example, Li properties to correctly chose and congure transport services. et al. [9] lists 90+ proposed solutions, divided among all layers in

85 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A NEAT Approach to Mobile Communication MobiArch ’17, August 25, 2017, Los Angeles, CA, USA

Application properties: Application PX = 100KiB

NEAT User API v NEAT User API policy: if PX <= 100KiB and CY < 10ms: NEAT User Module Policy Manager NEAT User Module Policy Manager TCP ... New PIB CIB New PIB CIBTCP MPTCP TCP SCTP Transport

Transport v New UDP TCP MPTCP metadata: New Transport UDP TCP MPTCP CY = 5ms Transport IPv4/IPv6 IPv4/IPv6 Figure 3: NEAT uses application properties together with Figure 2: The NEAT system architecture. PIB and CIB information to select transport.

3 NEAT ARCHITECTURE Manager (PM). The PM stores policy rules, i.e., rules linking sets Rather than requesting a particular protocol, applications can use of matching requirements to sets of preferred or mandatory trans- the API provided by NEAT to specify their transport-service re- port characteristics, in the Policy Information Base (PIB). In the quirements. The requirements are then processed by NEAT and aforementioned wireless scenario, the PIB contains policies that a matching transport service is created for the application. The map a request for a reliable bulk transport service to both MPTCP transport service is created transparently, and at run-time. Another and TCP, with MPTCP having the highest priority. Policies can be prominent feature of the NEAT system is that it is deployable in specied on multiple levels. For instance, globally dened policies the current internet and does not rely on protocols or features that map reliable transport services to reliable protocols like TCP, while are only available on certain platforms. policies local to a specic system are more ne-grained. Such poli- As previously mentioned, heterogeneity is a notable feature of cies can, i.e., map transport services to protocols that are present the current internet network infrastructure and most likely will on the local system but generally are rare, or to enable/disable the continue being an important feature of the next-generation internet use of costly networking tecnologies such as 4G. infrastructure. The desire to be connected “any time, anywhere, and In addition to the PIB, the PM uses a repository named Charac- any way” has led to an increasing array of heterogeneous commu- teristic Information Base (CIB). The CIB stores information about nication systems including a spectrum of wireless personal-, local-, available interfaces, supported protocols towards accessed desti- and metropolitan-area networking systems, and a mix of cellular nation endpoints, network properties, and other information re- network technologies. One of the key goals for developing open garding previous connections between endpoints. In the process of APIs for the next-generation internet is to hide the heterogeneity identifying candidate transport solutions, the PM polls the CIB for of the underlying communication systems, something that aligns relevant information, e.g., available interfaces and network path well with the NEAT system and its architecture that decouples the characteristics. On the basis of the fetched policies from the PIB and transport service provided to an application from the transport the supplementary information from the CIB, the PM puts together protocol providing that service. The NEAT system architecture also a list of candidate transport solutions, which is next used by the embraces heterogeneity by being an evolvable architecture that NEAT User Module to select a preferred transport solution, using integrates new communication systems fairly easily. the application service requirements as input. Figure 2 provides an overview of the NEAT system architecture. In our wireless scenario, the CIB holds information about previ- Applications provide information about the requirements for a ously tested transport solutions over WLAN and 4G towards visited desired transport service via the NEAT User API to the NEAT User destinations, and also information about network properties such Module, a module that is portable across dierent operating systems as capacity, Maximum Transmission Unit (MTU), and interface and network stacks. The NEAT User Module does not only take metadata such as signal strength and wireless technology. This input from the application, but also consults local service-selection information is combined with application requirements and PIB policies and collected metadata before creating a transport service. information to create a suitable transport service. As an example, consider a wireless communication scenario with access to both WLAN and 4G. Assume that a le transfer application 4 MOBILE USE CASES on a user device which supports both TCP and MPTCP requests a To evaluate NEAT for mobile nodes we used the setup depicted in transport service for a long bulk ow. The service-selection poli- Figure 1. The remote server was a web server under our control and cies tell the NEAT User Module that both TCP and MPTCP are the mobile node was a MONROE mobile broadband (MBB) measure- suitable transport solutions, but that priority should be given to ment node [1]. The MONROE node was equipped with dedicated MPTCP since it can use all available interfaces simultaneously and software for measuring and experimenting in both WLAN and MBB thus achieve higher throughput than TCP. The NEAT User Module networks, and was able to continuously collect information about initiates a MPTCP connection with TCP as fallback. the operator, various identiers for the equipment, SIM card and The part of the NEAT User Module that is responsible for the cell towers (IMEI, ICCID, IMSI, CID), the device mode and submode, management of the service-selection policies is called the Policy frequency bands, and information about the signal quality (RSSI,

86 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

MobiArch ’17, August 25, 2017, Los Angeles, CA, USA Hurtig et al.

For this experiment, we conducted a series of transfers between 1.4 a remote web server and the MONROE measurement node. The TCP node was connected to the internet using both its WLAN interface MPTCP 1.2 as well as its LTE interface, with the WLAN interface acting as the NEAT default gateway. To determine if NEAT is able to compose suitable transport services given dierent transfer sizes, we transmitted 1.0 objects of dierent sizes using three transport-layer congurations: TCP, MPTCP and NEAT. For the TCP and MPTCP transfers, the 0.8 respective protocol was used to download the objects. For NEAT, we created a policy that favored TCP over MPTCP when the amount of data to transfer was less than, or equal to, 100 KiB, and preferred 0.6 MPTCP over TCP when the data size was greater than 100 KiB. The size of the transmissions were communicated from the application 0.4 to NEAT via the NEAT User API, and the limit of 100 KiB was taken Relative download time from previous work [5] on MPTCP over WLAN and LTE. The results from this experiment is shown in Figure 4, where 0.2 the download time for each protocol, relative to TCP, is shown by the y-axis of the graph. The x-axis of the graph represents the 0.0 object size. For each combination of protocol and object size, the 1 10 100 1000 10000 experiment was repeated 30 times, and the variation among the Size [KiB] experimental runs was computed and is represented in the graph by 95 % condence intervals. The results conrm that MPTCP is a poor choice for transfers smaller than, or equal to, 100 KiB in this Figure 4: Relative download performance using TCP, scenario. For instance, when transmitting 10 KiB the time required MPTCP, and NEAT over WLAN and LTE. for MPTCP to nish the transfer is approximately 18 % longer than for TCP. The reason MPTCP performs worse is that data is sent over the slower LTE path. Staying exclusively on the faster WLAN path RSCP, RSRP, RSRQ), and others. For WLAN interfaces, information is clearly a faster alternative. However, when the amount of data to about the base station name (ESSID), along with link quality, signal send increases, the benet of using MPTCP, which load-balances level, coding/modulation and link rate are collected. All this meta- the trac over both paths, becomes evident. For the experiments data is continuously fed to the NEAT CIB. To make full use of both with 1000 KiB and 10 000 KiB transfers the gain of using MPTCP the available interfaces (WLAN and LTE), the MONROE node is instead of TCP, translates to a 50-55 % reduction of the transfer time. also equipped with version 0.91.2 of the Linux MPTCP kernel, as is NEAT, on the other hand, selected the transport depending on the the web server. actual object size, and are therefore able to match the performance Figure 3 shows how the NEAT architecture can be used to create a of the most suitable transport protocol in all experiments. transport service. The application provides, via the NEAT User API, a set of desired communication properties. The NEAT PM uses this information together with the metadata stored in the CIB and the 4.2 Getting Help from the Hardware policy information in the PIB to create a suitable transport service, Given the previous results one might think that TCP is always in this example using TCP as transport. Next, we will showcase preferable to MPTCP for small data transfers, and vice versa for three sets of real experiments where the NEAT system is used to longer ones. This is not the case, as the quality of the respective con- compose suitable transport services for mobile communication, nection, WLAN and cellular, plays a central role. In the background based on application knowledge, local hardware information and section, it was explained that MPTCP does not work well given run-time link quality metrics. paths with signicant asymmetry, e.g., in terms of capacity and delay. While LTE connections often have performance characteris- 4.1 From Application Knowledge to Transport tics that are fairly similar to those of WLAN, it is not always the Service case. When cellular coverage is poor, a mobile broadband modem When transferring data, the amount of data to be sent plays an might fall-back to 3G or 2G, increasing the asymmetry between the important role. For example, interactive applications transmitting available paths, making MPTCP a poor choice for transport. small amounts of data are often more sensitive to latency than This problem is illustrated by another set of experiments, shown bulk trac applications. Simply put, applications have dierent in Figure 5, where the underlying technology of the mobile broad- requirements on the underlying transport to perform well. While a band modem was varied between LTE and 3G. The graph to the left longer transfer can benet from capacity aggregation through, e.g., is identical to Figure 4, as the experimental setup was the same as MPTCP, shorter transfers are typically hindered by such approaches in the previous experiment and the policy used by NEAT resulted as performance is dominated by the worst path. NEAT is designed in the same protocol conguration. Looking at the right graph, to deal with such diering requirements by composing transport however, the eect of switching from LTE to 3G on the secondary services tailored to the need of the particular application. interface is quite signicant. For the shorter transfers, 1 KiB and

87 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A NEAT Approach to Mobile Communication MobiArch ’17, August 25, 2017, Los Angeles, CA, USA

WLAN and LTE WLAN and 3G Protocol 2.5 TCP MPTCP NEAT 2.0

1.5

1.0 Relative download time 0.5

0.0 1 10 100 1000 10000 1 10 100 1000 10000 Size [KiB] Size [KiB]

Figure 5: Relative download performance using TCP, MPTCP, and NEAT over WLAN and LTE/3G.

10 KiB, the eect of using MPTCP is similar to the LTE scenario. However, when the object size increases, so does the transmission MPTCP (WLAN+LTE) time. For the largest objects, those of size 10 000 KiB, MPTCP actu- MPTCP (LTE+WLAN) ally require 150 % more time to complete. The reason for this poor NEAT performance is due to MPTCP’s mode of operation which causes 1.5 data to be sent over the slower 3G path as soon as it cannot send over the WLAN. This situation occurs frequently during a trans- fer, as MPTCP deems the WLAN path to be unavailable whenever the congestion window of that subow is full. NEAT is able to cir- cumvent this performance problem by, on one hand, not choosing 1.0 MPTCP for transfers shorter than, or equal to, 100 KiB. This is due to the aforementioned policy stating that TCP should be used for short transfers. Furthermore, NEAT continuously collected quality metrics from the MONROE framework. This metadata contained Relative download time 0.5 run-time information on, e.g., what technology was used by the mobile broadband modem. In this scenario, when the LTE interface was restricted to 3G, the CIB information in conjunction with a corresponding system policy indicated that the mobile interface was not suitable for MPTCP sessions, causing the PM to generate a 0.0 transport service candidate based on TCP instead. 1 10 100 1000 10000 Size [KiB] 4.3 Tuning the Transport In the previous experiments, the application requirements, policies, Figure 6: Relative download performance using MPTCP and characteristics employed by NEAT led to dierent transport pro- with dierent primary path conguration, and NEAT. tocol choices. In many scenarios, the selection of protocols to chose from are limited. For instance, a protocol might only be available at one of the peers, or the chosen protocol needs to be congured server. So far, we have only focused on the selection of transport properly to meet application requirements. Let us consider the sce- depending on application properties and available access technolo- nario that has been elaborated throughout the paper; a client with gies. For instance, if both WLAN and LTE were present, it was both WLAN and LTE capabilities using MPTCP to access a remote shown that MPTCP was an eligible choice of transport. However,

88 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

MobiArch ’17, August 25, 2017, Los Angeles, CA, USA Hurtig et al. the conguration of the preferred transport protocol may have an REFERENCES impact on performance as well. For instance, consider a scenario [1] Ö. Alay, A. Lutu, R. García, M. Peón-Quiròs, V. Mancuso, T. Hirsch, T. Dely, where a client is trying to communicate using a very poor WLAN J. Werme, K. Evensen, A. Hansen, S. Alfredsson, J. Karlsson, A. Brunstrom, A. S. Khatouni, M. Mellia, M. A. Marsan, R. Monno, and H. Lonsethagen. 2016. connection and a very good LTE connection. What will happen Measuring and assessing mobile broadband networks with MONROE. In 2016 is likely that the WLAN interface is chosen as the interface to set IEEE 17th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM). 1–3. DOI:https://doi.org/10.1109/WoWMoM.2016.7523537 up and start the transmission over, as WLAN most likely is the [2] Mark Allman, Vern Paxson, and Ethan Blanton. 2009. TCP Congestion Control. default interface to communicate over. In scenarios where the LTE RFC 5681 (Draft Standard). (Sept. 2009). http://www.ietf.org/rfc/rfc5681.txt connection performs much better than the WLAN connection, this [3] Yung-Chih Chen, Yeon-sup Lim, Richard J. Gibbens, Erich M. Nahum, Ramin Khalili, and Don Towsley. 2013. A Measurement-based Study of MultiPath has detrimental eects on communication performance. This exact TCP Performance over Wireless Networks. In Proceedings of the 2013 Internet scenario is illustrated by the results shown in Figure 6. For this Measurement Conference (IMC ’13). Barcelona, Spain, 455–468. experiment we moved the MONROE node to the University library [4] Cisco Systems, Inc. 2017. Cisco Visual Networking Index: Global Mobile Data Trac Forecast Update, 2016–2021 White Paper. Cisco Systems, Inc. during an exam week. The library was crowded with students that [5] Shuo Deng, Ravi Netravali, Anirudh Sivaraman, and Hari Balakrishnan. 2014. used the WLAN, and at certain moments during the experimenta- WiFi, LTE, or Both?: Measuring Multi-Homed Wireless Internet Performance. In Proceedings of the 2014 Internet Measurement Conference (IMC ’14). Vancouver, tion it was even hard to connect to the wireless access points in BC, Canada, 181–194. the building. In the result graph, we see the object download times [6] Alan Ford, Costin Raiciu, Mark Handley, and Olivier Bonaventure. 2013. TCP using MPTCP with WLAN as the primary interface and LTE as the Extensions for Multipath Operation with Multiple Addresses. RFC 6824 (Experi- mental). (Jan. 2013). http://www.ietf.org/rfc/rfc6824.txt secondary interface (WLAN+LTE) and also MPTCP with LTE as [7] Brett D Higgins, Azarias Reda, Timur Alperovich, Jason Flinn, Thomas J Giuli, primary and WLAN as secondary (LTE+WLAN). All results in the Brian Noble, and David Watson. 2010. Intentional Networking: Opportunistic graph are in relation to MPTCP (WLAN+LTE). Exploitation of Mobile Network Diversity. In Proceedings of the ACM MOBICOM. ACM, New York, NY, USA, 73–84. DOI:https://doi.org/10.1145/1859995.1860005 The results from this experiment clearly show that using MPTCP [8] Naeem Khademi, David Ros, Michael Welzl, Zdravko Bozakov, Anna Brunstrom, with WLAN as primary interface works reasonably well as long Gorry Fairhurst, Karl-Johan Grinnemo, David Hayes, Per Hurtig, Tom Jones, as the transmission is small, i.e., 10 KiB or less. Sending just a few Simone Mangiante, Michael Tüxen, and Felix Weinrank. 2017. NEAT: A Platform- and Protocol-Independent Internet Transport API. IEEE Communications Maga- packets thus seemed to work ne. For the larger objects, this was zine (March 2017). Accepted for publication. no longer true. The mid-sized objects of size 100 KiB were nished [9] Ming Li, Andrey Lukyanenko, Zhonghong Ou, Antti Ylä-Jääski, Sasu Tarkoma, Matthieu Coudron, and Stefano Secci. 2016. Multipath Transmission for the approximately 35 % faster using LTE as primary interface, and the Internet: A Survey. IEEE Communications Surveys & Tutorials 18, 4 (Fourthquarter largest objects 1000 KiB-10 000 KiB more than 80 % faster. NEAT 2016), 2887–2925. DOI:https://doi.org/10.1109/COMST.2016.2586112 does not suer from these problems as the CIB information provided [10] NEAT. 2017. NEAT GitHub repository. https://github.com/NEAT-project/neat/. (2017). by the MONROE framework and the application properties help [11] NGMB. 2015. NGMB 5G White Paper. Technical Report. Next Generation Mobile the PM select the interface with best quality as the primary one to Networks Alliance. [12] Michael F Nowlan, Nabin Tiwari, Janardhan Iyengar, Syed Obaid Aminy, and initiate the MPTCP connection over. Bryan Ford. 2012. Fitting square pegs through round pipes: unordered delivery Note that the conguration of the initial path is only an example wire-compatible with TCP and TLS. In Proceedings of USENIX NSDI. USENIX of NEAT’s ability to congure the transport to compose a good Association, 383–398. [13] Christoph Paasch, Simone Ferlin, Özgü Alay, and Olivier Bonaventure. 2014. Ex- service. NEAT can congure any aspect of a transport system as perimental Evaluation of Multipath TCP Schedulers. In ACM SIGCOMM Capacity well as any other setting that may have an impact on the behavior Sharing Workshop (CSWS). ACM, Chicago, IL, USA. and performance of the communication, as long as the appropriate [14] Giorgos Papastergiou, Gorry Fairhurst, David Ros, Anna Brunstrom, Karl-Johan Grinnemo, Per Hurtig, Naeem Khademi, Michael Tüxen, Michael Welzl, Dragana policies and CIB information are available. Damjanovic, and Simone Mangiante. 2017. De-Ossifying the Internet Transport Layer: A Survey and Future Perspectives. IEEE Communications Surveys & Tutorials 19, 1 (Firstquarter 2017), 619–639. DOI:https://doi.org/10.1109/COMST. 5 CONCLUSIONS 2016.2626780 [15] Jon Postel. 1980. User Datagram Protocol. RFC 768 (INTERNET STANDARD). This paper describes the TAPS-compliant NEAT architecture and (Aug. 1980). http://www.ietf.org/rfc/rfc768.txt how it can be integrated in mobile systems to oer tailored trans- [16] Jon Postel. 1981. Transmission Control Protocol. RFC 793 (INTERNET STAN- port services to applications. Through a series of real experiments it DARD). (Sept. 1981). http://www.ietf.org/rfc/rfc793.txt Updated by RFCs 1122, 3168, 6093, 6528. is shown that NEAT is able to use application knowledge and meta- [17] Costin Raiciu, Christoph Paasch, Sebastien Barre, Alan Ford, Michio Honda, data collected from mobile nodes to optimize data transmission in Fabien Duchene, Olivier Bonaventure, and Mark Handley. 2012. How Hard Can a multi-access scenario, involving WLAN and LTE/3G. Based on It Be? Designing and Implementing a Deployable Multipath TCP. In Proceedings of USENIX NSDI. USENIX, San Jose, CA, USA, 399–412. the available information, NEAT selects appropriate transport pro- [18] Philipp S Schmidt, Theresa Enghardt, Ramin Khalili, and Anja Feldmann. 2013. tocols and congures them accordingly. NEAT has the potential to Socket Intents: Leveraging Application Awareness for Multi-access Connectivity. In Proceedings of ACM CoNEXT. Santa Barbara, California, USA, 295–300. optimize any type of network communication, but work is needed [19] Abbas Ali Siddiqui and Paul Mueller. 2012. A Requirement-Based Socket API for on dening policies and how to best obtain useful information on a Transition to Future Internet Architectures. In 6th International Conference on network characteristics. Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS). Palermo, Italy, 340–345. [20] TAPS. 2017. Transport Services Working Group (taps). https://datatracker.ietf. ACKNOWLEDGMENTS org/wg/taps/about/. (2017). [21] Kiran Yedugundla, Simone Ferlin, Thomas Dreibholz, Özgü Alay, Nicolas Kuhn, This work has received funding from the European Union’s Horizon Per Hurtig, and Anna Brunstrom. 2016. Is multi-path transport suitable for 2020 research and innovation programme under grant agreement latency sensitive trac? Computer Networks 105 (2016), 1–21. DOI:https://doi. org/10.1016/j.comnet.2016.05.008 No. 644334 (NEAT) and No. 644399 (MONROE), and the Norwegian RFF programme under grant agreement No. 245698. The views expressed are solely those of the authors.

89 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

F Paper: A NEAT Framework for Enhanced End-Host Integration in SDN Environments

The following paper [9] has been produced by project participants and has been presented at the IEEE NFV-SDN conference, Berlin, November 2017.

90 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A NEAT Framework for Enhanced End-Host Integration in SDN Environments

Zdravko Bozakov∗, Simone Mangiante∗, Cristian Hernandez Benet†, Anna Brunstrom†, Ricardo Santos†, Andreas Kassler†, Donagh Buckley∗ ∗ Dell EMC, Ireland Email: name.surname @dell.com { } † Karlstad University, Sweden Email: name.surname @kau.se { }

Abstract —SDN aims to facilitate the management of increas- NEAT NBI ingly complex, dynamic network environments and optimize the Policies / use of the resources available therein with minimal operator Network Network information intervention. To this end, SDN controllers maintain a global view Controller of the network topology and its state. However, the extraction of information about network flows and other network metrics remains a non-trivial challenge. Network applications exhibit a Resource optimization wide range of properties, posing diverse, often conflicting, de- mands towards the network. As these requirements are typically not known, controllers must rely on error-prone heuristics to extract them. In this work, we develop a framework which allows appli- cations deployed in an SDN environment to explicitly express their requirements to the network. Conversely, it allows network controllers to deploy policies on end-hosts and to supply appli- cations with information about network paths, salient servers NEAT-enabled and other relevant metrics. The proposed approach opens the Application end-hosts door for fine grained, application-aware resource optimization requirements strategies in SDNs. Fig. 1. Application-aware SDN architecture: An SDN controller interacts with NTRODUCTION I.I end-host applications utilizing NEAT through a northbound interface (NBI) The Software-Defined Networking (SDN) paradigm to optimize the allocation of flows within the network, influence parameters of transport connections and deploy policies at the end-hosts. promises to facilitate the management and operation of datacenter networks by enabling an automated, centralized control and optimization of pooled network resources. To achieve this goal network controllers strive to maintain a performance of the individual applications as well as that of rich and up-to-date global view of the network topology the overall network. Programmability and centralized control and the resources available therein. Controllers require in SDN are key mechanisms for achieving these goals. To information about the properties of flows traversing the realize application awareness, end-host applications may in- network in order to efficiently map network traffic to the teract with the network controller to express their specific physical network substrate. Flow level information may be resource demands and obtain feedback about the current net- extracted by monitoring the forwarding devices and used to work conditions. Several recently proposed approaches utilize obtain valuable metrics for traffic engineering. However, such local agents in end-hosts dedicated to collect metrics from information often does not offer the controller framework a applications and characterize them. In [5] end-hosts detect full picture of the myriad of network applications deployed at elephant flows locally and inform the network controller; the edges and their respective requirements. As an example similarly, in [18], [22], and [28] end-hosts measure and classify of this issue, one may consider the wide range of network local traffic to provide application feedback to the network application which tunnel traffic over TCP on port 80. In such controller. Further, using a global view of the network an cases, inferring whether a network flow is associated with a informed controller can influence end-host applications and video conferencing application with low latency requirements tune transport protocol parameters, as proposed in [9] and [2]. or a low bandwidth instant messaging application is a The efforts above illustrates a clear demand for mechanisms challenging task. that integrate application information into SDN environments. Application-aware networking [19], [14] refers to the ability While the related work provides a number of innovative of an intelligent network to consider the requirements of solutions tailored to specific use cases, these approaches lack applications connecting to it and, as a result, optimize the a generic, overarching framework which is flexible enough

91 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

to cover a large range of scenarios – equally benefiting Application applications deployed in SDN and in traditional environments. Contribution: The contribution of this work is threefold: NEAT API 1) we improve the visibility into the application requirements in SDNs by providing a generic end-host interface which SDN NEAT User Module Policy Manager New PIB CIB controllers can query, 2) we enhance end-host’s view of the TCP SCTP ... Transport attached network through controller input, and 3) enable SDN UDP TCP SCTP New controllers to deploy policies into the edge of the network. USER Transport As a result, we increase the granularity at which network KERNEL IPv4/IPv6 controllers are able to optimize the network and enable ap- plications to perform more informed optimizations of their Fig. 2. NEAT Architecture respective flows. Figure 1 depicts the high-level architecture of our approach. performance of individual application as well as the efficiency As an exemplary scenario, consider a network controller of the network on which these are deployed. aiming to efficiently orchestrate so-called elephant flows – the small percentage of network traffic flows which typically consume a considerable amount of the total network capacity A. NEAT – in a data center. In a traditional scenario the controller must The NEAT transport architecture [12] and accompanying first identify such flows by monitoring all network traffic and software stack [16] is being designed to offer a flexible and applying some classification heuristic over a predefined time- evolvable transport system. Applications interface to the NEAT span. It is only after the elephant flows have been identified system through an enhanced transport API that effectively that the controller may apply its resource allocation strategy. decouples them from the operation of the transport proto- Thus, SDN frameworks such as [1] can significantly benefit cols and the network features being used. This allows the from a prompt and efficient elephant flow identification. best transport option to be configured at run time based on In this work we argue that end-host applications should be application requirements, current network conditions, and the able to classify and express the requirements for any data supported transport services on the path. In particular, appli- transfer as part of the connection initiation. For example, cations provide the NEAT system with information about their using the proposed network stack, the flows of a backup client traffic requirements. Then, on the basis of these requirements, service that runs on an end-host and periodically transfers bulk specified policies, and measured network conditions, NEAT data to a server in the same data center may be tagged as establishes and configures appropriate connections. elephant flows. This can be done explicitly by the application, The NEAT architecture is illustrated in Figure 2. Appli- or through a suitable system policy that is triggered whenever cations communicate using the NEAT User API. Based on the flow size exceeds a predefined threshold – assuming the the requested services and provided application requirements application can supply the volume during the connection setup. the NEAT Policy Manager identifies a ranked list of suitable In turn, the SDN controller could remotely install an end-host transport candidates. In its decisions, the Policy Manager policy, e.g., mandating the use of MPTCP for all elephant makes use of a set of specified policies stored in a Policy flows. We outline some implementation strategies of this type Information Base (PIB) as well as any information gathered of use case in Section V. However, we emphasize that the about the current end-host and network state stored in a mechanisms presented in this paper extend beyond this simple Common Information Base (CIB). The identified transport use case and are applicable to a wide range of scenarios, candidates are then tried by the NEAT system using a transport including handling of flows with low latency requirements, layer happy eyeballs approach [27] to find the most suitable dynamically selecting suitable transport protocols for specific available transport option. As NEAT runs as a software library services and providing end-hosts with information about the in user space, it can make use of transport protocol options attached networks. Further, we note that in non-SDN environ- available both in user and kernel space. ments the developed policy mechanisms may be configured While the overall goal of NEAT is to combat transport by system administrators using traditional tools. service ossification and to provide enhanced transport services to a wider range of applications, the NEAT architecture also II.END-HOST TRANSPORT API serves as an ideal basis for an end-host agent that integrates In this paper we leverage the NEAT architecture, which we with SDN controllers. outlined recently in [12]. We demonstrate, that in an SDN The NEAT User API abstraction and Policy Manager com- context the NEAT application stack acts as a programmable, ponent provide a structured way for obtaining knowledge policy-driven end-host agent. In this work we focus on the about application requirements, enforcing policies at the net- ability of the NEAT system to offer a bidirectional com- work edge and exchanging resource information with external munication channel with external SDN controllers. In our entities. The policy mechanisms developed in this work are view, this opens a range of opportunities for optimizig the an integral component of the evolving NEAT architecture.

92 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

In the following sections we present the architecture of the NEAT transport API / system NEAT Policy Manager, its implementation and mechanisms Application request properties for communicating with external entities. Policy Information Base async B. Host-Controller interaction 1 access Expanded request Profiles By utilizing the NEAT stack the presented work seeks to properties 3 Policies enable performance gains in individual applications, subject Policy enforced candidates to their respective requirements, while facilitating the global Common Information Base optimization of network resources in software defined data REST API End-host centers. 2 Host, protocol, path

We define four interaction strategies between the network Filtered properties External controller NEAT Policy Manager NEAT candidates controller and NEAT-enabled end-hosts which we view as necessary to achieve the aforementioned goals. 1) The SDN controller queries the NEAT stack for currently Ranked connection candidates

open connections, their parameters and provided require- 4 NEAT connection selection ments or configured policies. Based on this information it may transparently provision new network resources as Fig. 3. NEAT Policy Manager: each NEAT end-host runs an instance of the necessary or generate new policies for the end-host. PM. Applications submit their flow requirements as a set of NEAT properties through the NEAT transport API. (1) Profiles expand high-level properties to 2) The SDN controller pushes policies or information about concrete ones. (2) Result is filtered against properties collected in local CIB, available network paths, supported protocols, remote- generating set of feasible candidates. (3) Policies are applied to each generated hosts and other properties into the end-host stack. For candidate. (4) Ranked candidate list is used to initialize connections. each new application flow, the NEAT Policy Manager attempts to utilize all available information to identify simple, flexible NEAT properties, described in Section IV – the most suitable connection candidates. which function as the atomic unit of the policy component. 3) A NEAT application explicitly requests information not available locally, such as path properties or supported pro- Each application request contains a destination name, port tocol features, by sending a query to the SDN controller. and the desired transport type, either stated explicitly (e.g., 4) A NEAT application pushes its requirements to the SDN TCP, QUIC) or functionally (e.g., reliable, ordered). controller. The controller may choose to incorporate these In the latter case, the transport requirement is subsequently hints to better fulfill its optimization objectives and to mapped to a concrete protocol through a suitable system pol- augment its global view of the network or drop them. icy. In addition, the request may contain an arbitrary number of desired connection properties or high level requirements, We assume that the above interactions are asynchronous and such as low_latency, bulk_transfer, cc_cubic or that the delivery and processing of the involved messages is not encrypted. guaranteed, i.e., controllers may choose to ignore information All candidates generated by the PM will contain at least provided by hosts. Similarly, we assume a best-effort strategy the information required to establish a new connection: a for the fulfillment of application requests. In the following, we local source interface and the associated IP address, the describe a framework which provides the mechanisms required destination IP and port, as well as a transport protocol. In to implement the aforementioned strategies. addition, the candidate may include an arbitrary number of III. NEAT POLICY MANAGER properties that describe selected or supported protocol features, The Policy Manager (PM) is an integral component of the constraints imposed by the network path, etc. From this the NEAT system, responsible for constructing and ranking candidate list a handle to the first successfully established feasible transport connection candidates for a given application connection is returned to the initiating application. request. Moreover, in the context of this work it provides the Specifically, the PM workflow involves the following steps: main mechanism for exchanging information between the end- 1. Profile lookup The initial lookup step is used to apply host applications and external network controllers. policies that transform high-level application requirements into concrete properties: e.g., low_latency may be expanded A. Workflow into properties indicating a specific wired interface, a max- The workflow executed by the NEAT Policy Manager for imum RTT delay requirement, and a request to use TCP each application request is depicted in Fig. 3. For each received fast open if available. The output of the profile lookup is an application request, the PM generates a ranked list of potential updated application request. connection candidates, which fulfill the application require- 2. CIB lookup In the next step the updated application ments and configured system policies, while taking into ac- request is compared against a list of potential candidates stored count all available information about the local system and the in the Common Information Base (CIB). Each row of the CIB attached networks. Requirements and attributes in application is comprised of an arbitrary number of properties containing requests, candidates and system policies are expressed using information and attributes associated with interfaces, network

93 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

paths, supported protocols and remote hosts. The information A. Policy Manager Primitives is provided by so-called CIB sources, e.g., a local system NEAT Properties : Our framework is designed around monitor or an external SDN controller. the notion of properties which act as a building block for The lookup generates a ranked list of candidates by merging defining network policy rules and describing the features copies of the updated request with any available CIB rows that of candidate flows. Each property is essentially a key/value match one or more of the request properties. pair with associated meta attributes. Each NEAT property 3. Policy lookup Each candidate from the CIB candidate is defined by a key – a unique, descriptive string – and list is compared against a list of policies configured in the an associated value from one of the following types: single system’s Policy Information Base (PIB). Polices1 contain a value, set, range. Further, a NEAT property has a numeric set of match properties and a properties set to be appended precedence attribute indicating whether the value of property to the request. A policy is triggered whenever its match is mutable (0) or immutable (1), i.e., optional or mandatory, properties match a subset of the properties contained in a respectively. If an immutable property cannot be satisfied, the candidate. If a policy is triggered, its properties are appended associated candidate is invalidated, whereas optional properties to the respective candidate properties (optionally replacing the imply a best-effort fulfillment. In addition, properties contain matched properties). Each policy is associated with a priority a numeric score attribute which is used to determine the value determining the order in which policies are applied importance of a particular property during the ranking stage. (ascending). Thus, properties added by low-priority policies In the following, we use the shorthand notation may be overridden by any policy with a higher-priority. After p :=[k|v] to denote a property p, where k represents all matching policies have been applied, the candidate list is s the associated key, v the value and s the score of the re-ranked to reflect the scores of the updated properties. property, where needed. Further, square brackets [ ] indicate 4. Selection Finally, the NEAT system attempts to establish · that the property is mandatory while round brackets ( ) the connection. To this end, NEAT employs a happy eyeballs · indicate that it is optional. We use commas to separate approach [27], opening multiple parallel connections with values in a set and a dash to indicate integer ranges, different protocol and interface parameters as specified in the e.g., p :=(MTU|1500-9000). PIB candidate list. The selection component returns a handle 1 to a connection successfully established within a predefined Next, we define some key primitives and operations of time, taking the candidate ranking into account. Furthermore, NEAT properties. it caches the outcome for the attempted candidate connections Intersection We define the intersection of two property values into the CIB for future reference. v , v , denoted v v , as the overlap of the respective i j i ∩ j values, sets, ranges, or the empty set ∅. As an example, con- At any point during the workflow the SDN controller is sider the intersection of the integer range 1500-9000 and granted read and write access to the PIB and CIB repositories the set 500,1483,2000 which yields 2000. Similarly, enabling it to obtain a list of installed policies, the system’s the intersection of the ranges 1500-9000 and 500-2000 current view of the attached networks and to augment these is 1500-2000. if deemed necessary. On the other hand, the application Match Any two properties are considered to match if can communicate with the controller implicitly, by adding they possess identical keys and the intersection of entries to the PIB/CIB, or explicitly by instructing the NEAT their respective values is non-empty. For example, system to generate a message to the controller containing the using == to represent match test, the operation requested properties. The latter is implemented by including [MTU|1500-9000]==(MTU|500, 1483, 2000) a special to_controller property within a request. yields “true” because the value 2000 is common to both IV. IMPLEMENTATION properties. Score and precedence attributes are ignored during a match. We implemented a reference prototype of the Policy Man- Update Using the primitives above we define the update ager components outlined above. Our implementation is avail- operation on property p by a second property p with an able at [16] under an open source license. We use the Python i j identical key, denoted p p . The operation returns a programming language and rely on Unix domain sockets i ← j new, updated property p with key k and updated value, for local exchange of policies and CIB entries. Additionally, i∗ i precedence and score attributes. If the two properties match, we implement a REST interface for external access by the the value of p is set to the intersection v := v v . controller. We use the OpenDaylight controller to implement i∗ i∗ i ∩ j Further, the resulting property inherits the higher precedence a northbound API for connecting to the NEAT interface. of the two properties. The score s of the updated property In the sequel we describe some key mechanics behind i∗ is set to s + s . the Policy Manager components. We highlight how these are i j If p and p do not match, i.e., the intersection of their values incorporated into the framework and outline the proposed i j is , we take the precedence attributes into account. If the information exchange format. ∅ precedence of both properties is immutable (0), the update 1Policies and profiles are functionally equivalent differing only in the stage operation fails. Otherwise, if the precedence of pj is greater of their execution. than or equal to the precedence of pi, we set pi∗ := pj∗ .

94 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

"uid": "bulk_flow_policy", "uid": "reliable_transports", { "priority":14, { "priority":1, "match": "match": "data_volume_Gb"{ : "transport"{ : "value": "reliable" "value": "start"{:10, "end": "Inf" , { { }} , { }} "properties"} :[ "properties"} : "transport": "value":"TCP", "transport":{ "value":"MPTCP" , { "precedence"{ :2, "score":1 , "wired_interface"{ : "value":true} , "transport": "value":"SCTP",}} "precedence":1 , { { "precedence"{ :2, "score":3 , "elephant_flow":} "value":true , "transport": "value":"MPTCP"}}, "to_controller": {"value":0 } { "precedence"{ :2, "score":2 , { } ], }} "replace_matched"} : false, "expire":-1 "expire":1493307808.030206 } } Listing 2. NEAT profile for supported reliable transport protocols Listing 1. NEAT policy setting threshold for elephant flows

An exemplary update operation using the properties in a data center environment, applications developed with NEAT can benefit from interactions with the network control above is [MTU|1500-9000]1 (MTU|1483,2000)2 ← plane without requiring any additional modifications. = [MTU|2000]3. To illustrate the versatility of our approach, consider a NEAT Property Arrays: The Policy Manager employs of controller that deploys end-host profiles which mandate that so-called NEAT property arrays, comprised of an arbitrary all connections to a specific range of destination IP addresses number of NEAT properties, A = p , p , . . . , p . NEAT i { 1 2 n} must be secured. To implement this, the controller will send a arrays are used to represent application requests, policies’ SLIM encoded policy whose match attribute contains the des- match and properties attributes, CIB source rows, as well as tination IP range "ip_dst": "value": "start":"10.10 { { { candidates generated by the CIB/PIB components. As above, .1.0","end":"10.10.1.63" ,"precedence":1 , and its } } interactions between arrays are defined through a number properties attribute contains the "encrypted": "value": { of operators. For brevity here we only highlight the join true property. Additional, higher priority, profiles may be operator for two NEAT arrays A and A , comprised } ← 1 2 pushed to enforce specific ciphers depending on the selected of n1,2 properties respectively. The result of the operation transport protocol. The approach facilitates the centralized B := A A yields a new array containing all properties 1 ← 2 management of data center security policies. whose keys are contained either only in A or only in A , 1 2 Next, we consider a NEAT-enabled backup service and the update result p p for all properties with keys 1,i ← 2,i deployed in a managed data center. In such a scenario, contained in A and in A . 1 2 the application is well aware of the amount of data to be B. Encoding transferred and a typical transport connection request may be stated as "ip_dst": "value":"10.10.177.23"," As policy and CIB information needs to be easily ex- { { precedence":1 ,"transport": "value":"reliable changed between the policy manager and various local and } { ","precedence":1 ,"data_volume_Gb": "value":1 external sources we employ a simple JSON encoding. De- } { 5 . The pre-configured system profile in Listing 2 will tails and enhanced features supported by our simple pol- }} icy and information format (SLIM), are available in [16]. expand the "transport|reliable" property to generate a candidate for each transport protocol supported on the host. As an example, the property p1:=(MTU|1500-9000)+1 is encoded as "MTU": "value": "start":1500,"end" The "data_volume_Gb" property will trigger the policy { { { :9000 ,"precedence":0, "score":1 . Default property in Listing 1, selecting the egress interface, expressing a } } attributes "precedence":0 and "score":1 may be omitted preference to use MPTCP – if supported by the source and for brevity. To facilitate adoption, we are also considering destination hosts – and tagging the connection as an elephant implementing an adapter that maps our data model to YANG. flow. Finally, the policy will cause the PM to notify the Listing 1 depicts a NEAT policy, which appends four controller about this new flow using the to_controller new properties to any flow request specifying that the ap- property (setting the time to wait for a reply to 0s). As plication intends to transmit more than 10 Gb of data. The a result of this notification, the controller may configure replace matched policy attribute indicates that the matched suitable switch forwarding entries to handle this particular properties should not be removed from the candidate array. flow (and potential subflows), or it may push an updated policy to enforce the use of a particular VLAN or DSCP field V. EXAMPLE USE CASES for this type of flow, or to update the threshold that defines A main goal of this work is to demonstrate that the NEAT what constitutes an elephant flow. architecture provides a generic, flexible framework that en- The above examples aim to highlight that the relatively ables a wide range of applications to seamlessly integrate into simple structure of the NEAT policy component may be used a software defined infrastructure. Specifically, when deployed to efficiently handle a wide range of network policies.

95 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

1.0 VI.RELATED WORKAND FUTURE ENHANCEMENTS

0.8 We discussed relevant works on application awareness in SDN and hosts-controller interaction in Section I. Here we 0.6 focus on activities related to network policies and future improvements of our approach. 0.4 Policy is an overarching term that describes network con- straints, configurations and settings. It usually includes access Cumulative probability Cumulative 0.2 control, QoS setting, firewall and forwarding rules, traffic load balancing and engineering. There has been an increasing inter- 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 est in how SDN can facilitate the deployment and enforcement Detection time (s) of network policies [4]. However, recent efforts have largely 1.0 MB Mahout 1.0 MB NEAT 0.1 s Hedera 5.0 MB Mahout 5.0 MB NEAT 1.0 s Hedera focused on network configuration [29], [3], middleboxes 10.0 MB Mahout 10.0 MB NEAT 2.5 s Hedera traversal chaining [20], [6] and forwarding efficiency [17]. In 50.0 MB Mahout 50.0 MB NEAT 5.0 s Hedera 100.0 MB Mahout 100.0 MB NEAT 10.0 s Hedera this work, we introduce an end-host policy framework to drive networking configuration and optimization for applications running on end-hosts deployed in SDN environments. Fig. 4. Elephant flow detection time Recent developments in control programs for SDN make it easier for network operators to express and deploy consis- tent, verifiable high-level policies. Examples of these works Simple Elephant Flow Handling: To further illustrate include Frenetic [8], Nettle [23], Procera [24], Maple [25] and the benefits of NEAT, we provide a brief evaluation of two Pyretic [15]. These approaches provide various abstractions previous proposals from the literature aiming to improve data mechanisms for programming OpenFlow networks, but do not center resource management through elephant flow detection. consider the interaction with end-hosts and the optmization of Using NEAT, the end-host application can explicitly inform applications deployed in the network. the controller about initiated elephant flows through the north- Merlin [21] provides a declarative policy language and bound API. This approach substantially reduces the overhead compiler to manage bandwidth provisioning, packet process- of control messages related to network monitoring and re- ing functions and forwarding rules in a network, taking duces the detection time significantly. Mahout [5] identifies into account constraints expressed by policies as well as elephant flows by monitoring the end-host socket buffer, which the constraints of the physical topology. Merlin offers basic already reduces the control messages overhead compared to in- interaction with end-hosts implemented using standard Linux network monitoring adopted by Hedera [1]. Besides being a utilities, iptables and tc. point solution, the major drawback of monitoring the end-host Conceptually, the PANE framework presented in [7] is most socket buffers is the non-negligible elephant flow detection closely related to our work, as it provides mechanisms for delay. This is a basic consequence of the time required to hinting application requirements and enforcing, potentially observe that the buffer reaches a given threshold. In contrast, conflicting, policies in an OpenFlow network. PANE uses using NEAT, elephant flows can be tagged immediately using a centralized architecture with a global network information application metrics and the corresponding flow information base (NIB) and primarily focuses on advanced approaches may be promptly provided to the controller as part of the flow for deploying consistent flow table rules within the network establishment phase. infrastructure. In contrast, NEAT offers an application-centric In the following, we highlight the challenges of elephant approach, in which polices are enforced at the end-host, with flow detection using heuristics as employed in Mahout and each host maintaining local PIB and CIB repositories. Our Hedera. We analyzed a trace for 2000 nodes distributed across primary focus lies on providing a flexible, generic mechanism 100 racks sending traffic for 20 seconds, generated using [26]. for expressing arbitrary application requirements, resource For ease of exposition, we consider an available bandwidth of constraints and policies which can be elegantly integrated and 1 Gbps. We evaluated the trace using the following polling mapped into any existing SDN control framework. rates for Hedera: 0.1, 1, 2, 5, and 10 s. We set the elephant We argue that our work is complementary to the afore- flow detection threshold for NEAT and Mahout to be 1, 5, 10, mentioned efforts, and we believe that a tighter integration 50 and 100 MB. Figure 4 illustrates that the flow detection time of NEAT with high-level network abstraction frameworks and increases with the elephant flow threshold and polling rate for DSLs is worth exploring in future works, to enable a concur- Mahout and Hedera, respectively. Clearly, for NEAT the flow rent control and optimization of both network infrastructures detection time is always zero independently of the elephant and end-host applications. Further, it is worth emphasizing that flow threshold. Furthermore, as the elephant flow threshold the benefits offered by the NEAT Policy Manager are not lim- can be controlled by the policy system (exemplified earlier in ited to hosts integrated in an SDN infrastructure, but extend to Listing 1), the control message and management overhead for applications running in arbitrary network environments. With handling the elephant flows can also be flexibly tuned. this work, our goal is to allow applications utilizing NEAT

96 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

to seamlessly adapt to the conditions of their deployment [6] S. K. Fayazbakhsh, L. Chiang, V. Sekar, M. Yu, and J. C. Mogul. environment without requiring any modification to their code. Enforcing network-wide policies in the presence of dynamic middlebox actions using flowtags. In Proc. of USENIX NSDI 2014, pages 543–546, In this paper, we omit the discussion of several important Apr. 2014. aspects due to space constraints. These include securing the [7] A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. host-controller communication channels; implementing trust Participatory networking: An api for application control of sdns. In Proc. of ACM SIGCOMM 2013, pages 327–338, New York, NY, USA, and authentication mechanisms; mitigating attacks from ma- 2013. ACM. licious and misconfigured end-hosts, and parameterizing the [8] N. Foster, M. J. Freedman, R. Harrison, J. Rexford, M. L. Meola, and frequency of host-controller interactions in different scenarios. D. Walker. Frenetic: A high-level language for openflow networks. In Proc. of the Workshop on Programmable Routers for Extensible Services We strongly believe, that the mechanisms presented in this of Tomorrow, PRESTO ’10, pages 6:1–6:6, 2010. paper can efficiently address these challenges. For example, [9] M. Ghobadi, S. H. Yeganeh, and Y. Ganjali. Rethinking end-to-end controllers and administrators may deploy overriding policies congestion control in software-defined networks. In Proc. of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pages 61–66, that constrain the properties that applications may request, or 2012. limit the number of messages that reach the controller. These [10] G. Karagiannis, J. Bi, D. Romascanu, J. Strassner, M. Klyus, Q. Sun, crucial techniques will be addressed in separate works. and L. M. Contreras. Problem Statement for Simplified Use of Policy Abstractions (SUPA). Internet-Draft draft-bi-supa-problem-statement- Further planned enhancements include a seamless inte- 02, IETF, July 2016. Work in Progress. gration of the NEAT framework with legacy applications: [11] P. Kazemian, M. Chang, H. Zeng, G. Varghese, N. McKeown, and proxies running the NEAT stack could be deployed as virtual S. Whyte. Real time network policy checking using header space analysis. In Proc. of USENIX NSDI 2013, pages 99–111, 2013. appliances within data centers, or incorporated directly into [12] N. Khademi, D. Ros, M. Welzl, Z. Bozakov, A. Brunstrom, G. Fairhurst, the hypervisor layer. In terms of implementation, future work K.-J. Grinnemo, D. Hayes, P. Hurtig, T. Jones, S. Mangiante, M. Tuxen,¨ will aim to align our SLIM model to emerging initiatives such and F. Weinrank. NEAT: A platform- and protocol-independent internet transport API. IEEE Communications Magazine, June 2017. Accepted as SUPA [10], and to develop mechanisms for policy checking for publication. and validation, leveraging previous works in that space applied [13] H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, and R. Clark. to SDN [11], [13]. Kinetic: Verifiable dynamic network control. In Proc. of USENIX NSDI 2015, pages 59–72, May 2015. VII.CONCLUSION [14] H. Mekky, F. Hao, S. Mukherjee, Z.-L. Zhang, and T. Lakshman. Application-aware data plane processing in SDN. In Proc. of HotSDN In this paper we presented a versatile, policy-driven frame- 2014, pages 13–18, 2014. work aiming to enhance the integration of end-host appli- [15] C. Monsanto, J. Reich, N. Foster, J. Rexford, and D. Walker. Composing software-defined networks. In Proc. of USENIX NSDI 2013, pages 1–14, cations in SDN environments. We described the framework 2013. architecture and outlined key features of our implementation. [16] NEAT GitHub repository. https://github.com/NEAT-project/neat/, 2016. We highlighted exemplary use cases that demonstrate the flex- [17] O. Padon, N. Immerman, A. Karbyshev, O. Lahav, M. Sagiv, and S. Shoham. Decentralizing SDN policies. In ACM SIGPLAN Notices, ibility and effectiveness of our simple approach based around volume 50, pages 663–676, 2015. properties and policies, showing that applications employing [18] S. Paul and R. Jain. Openadn: Mobile apps on global clouds using the NEAT framework may seamlessly benefit from a wide openflow and software defined networking. In Globecom Workshops (GC Wkshps), 2012 IEEE, pages 719–723, Dec 2012. range of SDN optimizations. [19] Z. A. Qazi, J. Lee, T. Jin, G. Bellala, M. Arndt, and G. Noubir. Application-awareness in SDN. In Proc. of ACM SIGCOMM 2013, ACKNOWLEDGEMENT pages 487–488, 2013. [20] Z. A. Qazi, C.-C. Tu, L. Chiang, R. Miao, V. Sekar, and M. Yu. SIMPLE- This work has received funding from the European Union’s fying middlebox policy enforcement using SDN. In Proc. of ACM Horizon 2020 research and innovation programme under grant SIGCOMM 2013, pages 27–38, 2013. agreement No. 644334 (NEAT). The views expressed are [21] R. Soule,´ S. Basu, P. J. Marandi, F. Pedone, R. Kleinberg, E. G. Sirer, and N. Foster. Merlin: A language for provisioning network resources. solely those of the author(s). In Proc. of CoNEXT 2014, pages 213–226, New York, NY, USA, 2014. ACM. REFERENCES [22] P. Sun, M. Yu, M. J. Freedman, J. Rexford, and D. Walker. Hone: Joint [1] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. host-network traffic management in software-defined networks. Journal Hedera: Dynamic flow scheduling for data center networks. In Proc. of of Network and Systems Management, 23(2):374–399, 2015. USENIX NSDI 2010, pages 19–19, 2010. [23] A. Voellmy and P. Hudak. Nettle: Taking the sting out of programming [2] H. Ballani et al. Enabling end-host network functions. In Proc. of ACM network routers. In PADL, pages 235–249, 2011. SIGCOMM 2015, pages 493–507, 2015. [24] A. Voellmy, H. Kim, and N. Feamster. Procera: A language for high- [3] M. F. Bari, S. R. Chowdhury, R. Ahmed, and R. Boutaba. Policycop: level reactive network control. In Proc. of HotSDN 2012, pages 43–48, an autonomic qos policy enforcement framework for software defined 2012. networks. In Future Networks and Services (SDN4FNS), 2013 IEEE [25] A. Voellmy, J. Wang, Y. R. Yang, B. Ford, and P. Hudak. Maple: SDN for, pages 1–7. IEEE, 2013. Simplifying sdn programming using algorithmic policies. In Proc. of [4] Y. Ben-Itzhak, K. Barabash, R. Cohen, A. Levin, and E. Raichstein. En- ACM SIGCOMM 2013, pages 87–98, New York, NY, USA, 2013. ACM. forSDN: Network policies enforcement with SDN. In 2015 IFIP/IEEE [26] P. Wette and H. Karl. DCT2Gen. Computer Communications, 80(C):45– International Symposium on Integrated Network Management (IM), 58, Apr. 2016. pages 80–88. IEEE, 2015. [27] D. Wing and A. Yourtchenko. Happy Eyeballs: Success with Dual-Stack [5] A. Curtis, W. Kim, and P. Yalagandula. Mahout: Low-overhead data- Hosts. RFC 6555 (Proposed Standard), Apr. 2012. center traffic management using end-host-based elephant detection. In [28] K.-K. Yap, T.-Y. Huang, B. Dodson, M. S. Lam, and N. McKeown. Proc. of IEEE INFOCOM 2011, pages 1629–1637, April 2011. Towards software-friendly networks. In Proc. of APSys 2010, pages 49–54, 2010. [29] Y. Yuan, D. Lin, R. Alur, and B. T. Loo. Scenario-based programming for SDN policies. In Proc. of CoNEXT 2015, pages 34:1–34:13, 2015.

97 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

G Demo: A NEAT framework for application-awareness in SDN en- vironments

The following demo [68] has been produced by project participants and has been presented at the IFIP Networking conference, Stockholm, June 2017. It won the Best Demo Award.

98 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

A NEAT framework for application-awareness in SDN environments

Ricardo Santos∗, Zdravko Bozakov†, Simone Mangiante†, Anna Brunstrom∗ and Andreas Kassler∗ ∗Karlstad University, Karlstad, Sweden name.surname @kau.se { } †Dell EMC Research Europe, Ovens, Ireland name.surname @dell.com { }

Abstract—Software-Defined Networking (SDN) has led to a performance of the individual applications as well as that of paradigm shift in the way how networks are managed and the overall network. To achieve application awareness, end- operated. In SDN environments the data plane forwarding rules host applications may interact with a network controller to are managed by logically centralized controllers operating on global view of the network. Today, SDN controllers typically express their specific resource demands or obtain feedback posses little insight about the requirements of the applications about the current network conditions. Several recent proposals executed on the end-hosts. Consequently, they rely on heuristics utilize local agents in end-hosts dedicated to collect metrics to implement traffic engineering or QoS support. In this work, from applications and characterize them (e.g., [3]). Further, we propose a framework for application-awareness in SDN controllers may utilize their global view of the network to environments where the end-hosts provide a generic interface for the SDN controllers to interact with. As a result, SDN configure the behavior of network connections initiated by controllers may enhance the end-host’s view of the attached end-host applications and tune transport protocol parameters network and deploy policies into the edge of the network. (e.g. [4]). The use-case specific approaches in these and other Further, controllers may obtain information about the specific related works illustrate a clear demand for mechanisms that requirements of the deployed applications. Our demonstration integrate end-host applications into SDN environments. extends the OpenDaylight SDN controller to enable it to interact with end-hosts running a novel networking stack called NEAT. Building on previous successes, in this demo we introduce We demonstrate a scenario in which the controller distributes a generic, overarching framework for end-host SDN integra- policies and path information to manage bulk and low-latency tion, integrated into the NEAT transport architecture [5]. Our flows. framework provides structured interfaces for communication between applications and external controllers. The framework I.INTRODUCTION includes an expressive policy system which is able to fulfill The Software-Defined Networking (SDN) paradigm a large range of use-case requirements, without requiring a promises to facilitate the management and operation of re-engineering of the involved applications. We introduce our datacenter networks by enabling an automated, centralized framework in the next section, followed by a description of control and optimization of pooled network resources. To the demo in Section III. achieve this goal, network controllers strive to maintain a rich and up-to-date global view of the network topology and II.END-HOST SDN INTEGRATION WITH NEAT the resources available therein. In order to efficiently map The NEAT transport architecture [5] and accompanying network traffic to the physical network substrate, controllers software stack [6] is designed to offer a flexible and evolvable require information about the properties of flows traversing transport system. Applications interface the NEAT System the network. Flow level information may be extracted by through an enhanced API that effectively decouples them monitoring forwarding devices and used to obtain valuable from the operation of the transport protocols and the network metrics for traffic engineering. However, such information features being used. This allows the best transport option to often does not offer the SDN controller a full picture of the be configured at run time based on application requirements, myriad of network applications deployed at the edges and current network conditions, and the supported transport ser- their respective requirements. For example, one may consider vices on the path. Applications may supply the NEAT System the wide range of network applications which tunnel traffic with the requirements desired for each connection as well as over TCP on port 80. In such cases, inferring whether a optional hints about the type of traffic that will be transferred. network flow is associated with a conferencing application A key component of the NEAT framework is the Policy with low latency requirements or a low bandwidth instant Manager (PM), which is responsible for matching application messaging application is a challenging task. requirements with system policies as well as available informa- Application-aware networking [1], [2] refers to the ability tion about system and network state, stored in a Policy Infor- of an intelligent network to consider the requirements of mation Base (PIB) and a Common Information Base (CIB), applications connecting to it and, as a result, optimize the respectively. For each application request, NEAT configures ISBN 978-3-901882-94-4 c 2017 IFIP and establishes the most suitable transport connection, based

99 of 141 Project no. 644334 hog h otsCBrpstr.A letapiain,we applications, client end-hosts As the repository. to CIB host’s estimates controller the latency The through end-to-end [8]. paths provides pre- network a then a different implements deter- for exceeds to latencies packets controller size monitoring mine creating the flow by approach addition, the probing path In if threshold. pattern using defined flows DSCP elephant as dedicated transfers policies a tag NEAT to uses applications controller instruct The to for topology. flows network latency-sensitive given and a transfers bulk of placement the installing switches. for managed responsible the is in controller rules north- the flow a end- addition, NEAT-enabled implements In with that hosts. communicating module for a interface with bound ODL protocol. the extended OpenFlow by We the controlled using are controller that CORE, (ODL) vSwitch Inside OpenDaylight Open emulated 1). running (Fig. nodes an [7] use emulator we through CORE the interconnected using network are NEAT- The end-hosts requirements. management different enabled the with the flows using illustrate of handling scenarios controller and The SDN framework. an NEAT and end-hosts NEAT-enabled end-hosts. specific connection at transport policies of suitable parameters deploying the by traffic may influence certain PM for and used the be classes should addition, paths network by In which match mandate utilized best requirements. which automatically application candidates is connection specific determine information to PM This the rates. bandwidth available loss latency, as or such characteristics network- path end-to-end, wide about may information controllers CIB with centralized and end-hosts logically PIB supply result, the of a external contents As which the repositories. modify through and API query REST can To a entities NEAT. exposes through NEAT end, instantiated the this connection influence to any controllers of pro- SDN it parameters enabling end-host, NEAT-enabled hook, each ideal on an runs vides by PM generated a candidates As connection PM. feasible the of list ranked a on NEAT Host ntedm,w hwhwa D otolrmyoptimize may controller SDN an how show we demo, the In between interaction the showcases demonstration This xeddTasotSse n rnprn upr fNnNA Applications Non-NEAT of Support Transparent and System Transport Extended D3.3

NEAT Policies

Path information i.1 eosrto architecture Demonstration 1. Fig.

low latency path I.D III. OpenFlow rules Calculator Path Northbound API Northbound EMONSTRATION bulk transfer path bulk transfer Northbound API Northbound NEAT Connector Scheduler Flow Latency Monitor

SDN Controller Host Poetn.644334 no. Project 141 of 100 K hmu n .But Mntrn aec ihoeflw”in openflow,” with latency “Monitoring Bouet, M. and Phemius K. in platforms,” [8] emulation network CORE of “Comparison Ahrenholz, J. 2016. https://github.com/NEAT-project/neat, repository,” [7] GitHub “NEAT [6] Fairhurst, G. Brunstrom, A. Bozakov, Z. Welzl, M. Karagiannis, Ros, T. D. Khademi, Grosvenor, N. P. M. [5] Gkantsidis, C. Costa, P. Ballani, Joint “Hone: H. Walker, D. and [4] and Rexford, J. Arndt, Freedman, Lakshman, J. M. T. Yu, M. M. and Sun, P. Zhang, Bellala, [3] Z.-L. G. Mukherjee, S. Jin, Hao, F. T. Mekky, H. Lee, [2] J. Qazi, A. Z. [1] are expressed views author(s). the The of (NEAT). those solely 644334 grant No. under programme agreement innovation and research 2020 Horizon network different of GUI. specifications. impact CORE’s traffic the through different and time demonstrate topologies real will in we monitored network Also, the be on metrics can of path effects flows supplied The and information controller. policies the latency installed through the CIB the host’s most on the to the based pushed selects interface end-to- automatically network translates of stack suitable NEAT value policy predefined The system latency. a a end to latency connection, requirement low low-latency new the the a whenever install creates source Similarly, the to agent nodes. between modules switches destination the Scheduler uses and on Flow rules controller and forwarding The suitable Calculator path. instead Path pre-calculated are high its a markings to flows value, DSCP along large DSCP without the forwarded map Packets the to paths. with controller the capacity packets by the used is marks which The controller. thus the by stack installed NEAT policy low-latency flow elephant its the trigger indicate to API NEAT the The requirement. host. uses another also to readings agent sensor addition, periodic agent, In transmits monitoring API. which real-time NEAT NEAT-enabled the a amount through implemented the transferred, we be indicates to host data NEAT-enabled of the the transfer, on each corresponding For application NEAT-enabled). a necessarily and a (not enabled, on framework server runs NEAT that the client with transfer host file NEAT simple a implemented hswr a eevdfnigfo h uoenUnion’s European the from funding received has work This may size flow the initiated, is transfer file new a Whenever ewr n evc aaeet(NM,21 t nentoa Con- International 9th 2013 on (CNSM), ference Management Service and Network 2010 166–171. 2010-MILCOM pp. 2010, Conference, Communications Military internet publication. T for protocol-independent M. and Mangiante, platform- API,” S. a transport Jones, “NEAT: T. Weinrank, Hurtig, F. P. and Hayes, D. Grinnemo, K.-J. in functions,” 2015 network SIGCOMM end-host ACM “Enabling of O’Shea, Proc. G. and Koromilas, L. Management Systems and in networks,” Network software-defined of in SDN,” management traffic host-network in 2014 processing HotSDN http://doi.acm.org/10.1145/2620728.2620735 plane in of data SDN,” “Application-aware in 2013 “Application-awareness http://doi.acm.org/10.1145/2486001.2491700 SIGCOMM Noubir, ACM G. EE 03 p 122–125. pp. 2013, IEEE, . EECmuiain Magazine Communications IEEE A 04 p 31.[nie.Available: [Online]. 13–18. pp. 2014, , CKNOWLEDGEMENT 03 p 8–8.[nie.Available: [Online]. 487–488. pp. 2013, , R EFERENCES e.10 oebr3,2017 30, November 1.0/ Rev. 05 p 493–507. pp. 2015, , o.2,n.2 p 7–9,2015. 374–399, pp. 2, no. 23, vol. , ac 07 accepted 2017, March , Confidential IEEE, . rc of Proc. Journal uxen, Proc. ¨ D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

H NEAT Sockets API: list of API function calls

C-language prototypes for the function calls comprising the NEAT Sockets API are listed below. They are grouped into eight categories as discussed in Section 3.3. int nsa_init(); void nsa_cleanup(); int nsa_map_socket(int systemSD, int neatSD); int nsa_unmap_socket(int neatSD); Listing 4: The NEAT Sockets API: initialisation and clean-up.

int nsa_socket(int domain, int type, int protocol, const char* properties); int nsa_socketpair(int domain, int type, int protocol, int sv[2], const char* properties); int nsa_close(int sockfd); int nsa_fcntl(int sockfd, int cmd, ...); int nsa_bind(int sockfd, const struct sockaddr* addr, socklen_t addrlen, struct neat_tlv* opt, const int optcnt); int nsa_bindx(int sockfd, const struct sockaddr* addrs, int addrcnt, int flags, struct neat_tlv* opt, const int optcnt); int nsa_bindn(int sockfd, uint16_t port, int flags, struct neat_tlv* opt, const int optcnt); int nsa_connect(int sockfd, const struct sockaddr* addr, socklen_t addrlen, struct neat_tlv* opt, const int optcnt); int nsa_connectx(int sockfd, const struct sockaddr* addrs, int addrcnt, neat_assoc_t* id, struct neat_tlv* opt, const int optcnt); int nsa_connectn(int sockfd, const char* name, const uint16_t port, neat_assoc_t* id, struct neat_tlv* opt, const int optcnt); int nsa_listen(int sockfd, int backlog); int nsa_accept(int sockfd, struct sockaddr* addr, socklen_t* addrlen); int nsa_accept4(int sockfd, struct sockaddr* addr, socklen_t* addrlen, int flags); int nsa_peeloff(int sockfd, neat_assoc_t id); int nsa_shutdown(int sockfd, int how); Listing 5: The NEAT Sockets API: connection establishment and teardown.

int nsa_getsockopt(int sockfd, int level, int optname, void* optval, socklen_t* optlen); int nsa_setsockopt(int sockfd, int level, int optname, const void* optval, socklen_t optlen); int nsa_opt_info(int sockfd, neat_assoc_t id, int opt, void* arg, socklen_t* size); Listing 6: The NEAT Sockets API: options handling.

101 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

ssize_t nsa_write(int fd, const void* buf, size_t len); ssize_t nsa_writev(int fd, const struct iovec* iov, int iovcnt); ssize_t nsa_pwrite(int fd, const void* buf, size_t len, off_t offset); ssize_t nsa_pwritev(int fd, const struct iovec* iov, int iovcnt, off_t offset); #ifdef _LARGEFILE64_SOURCE ssize_t nsa_pwrite64(int fd, const void* buf, size_t len, off64_t offset); ssize_t nsa_pwritev64(int fd, const struct iovec* iov, int iovcnt, off64_t offset); #endif ssize_t nsa_send(int sockfd, const void* buf, size_t len, int flags); ssize_t nsa_sendto(int sockfd, const void* buf, size_t len, int flags, const struct sockaddr* to, socklen_t tolen); ssize_t nsa_sendmsg(int sockfd, const struct msghdr* msg, int flags); ssize_t nsa_sendv(int sockfd, struct iovec* iov, int iovcnt, struct sockaddr* to, int tocnt, void* info, socklen_t infolen, unsigned int infotype, int flags); ssize_t nsa_read(int fd, void* buf, size_t len); ssize_t nsa_readv(int fd, const struct iovec* iov, int iovcnt); ssize_t nsa_pread(int fd, void* buf, size_t len, off_t offset); ssize_t nsa_preadv(int fd, const struct iovec* iov, int iovcnt, off_t offset); #ifdef _LARGEFILE64_SOURCE ssize_t nsa_pread64(int fd, void* buf, size_t len, off64_t offset); ssize_t nsa_preadv64(int fd, const struct iovec* iov, int iovcnt, off64_t offset); #endif ssize_t nsa_recv(int sockfd, void* buf, size_t len, int flags); ssize_t nsa_recvfrom(int sockfd, void* buf, size_t len, int flags, struct sockaddr* from, socklen_t* fromlen); ssize_t nsa_recvmsg(int sockfd, struct msghdr* msg, int flags); ssize_t nsa_recvv(int sockfd, struct iovec* iov, int iovcnt, struct sockaddr* from, socklen_t* fromlen, void* info, socklen_t* infolen, unsigned int* infotype, int* msg_flags); Listing 7: The NEAT Sockets API: input/output handling.

int nsa_poll(struct pollfd* ufds, const nfds_t nfds, int timeout); int nsa_select(int n, fd_set* readfds, fd_set* writefds, fd_set* exceptfds, struct timeval* timeout); int nsa_epoll_create(int size); int nsa_epoll_create1(int flags); int nsa_epoll_ctl(int epfd, int op, int fd, struct epoll_event* event); int nsa_epoll_wait(int epfd, struct epoll_event* events, int maxevents, int timeout); int nsa_epoll_pwait(int epfd, struct epoll_event *events, int maxevents, int timeout, const sigset_t* ss); Listing 8: The NEAT Sockets API: poll and select.

102 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

int nsa_getsockname(int sockfd, struct sockaddr* name, socklen_t* namelen); int nsa_getpeername(int sockfd, struct sockaddr* name, socklen_t* namelen); int nsa_getladdrs(int sockfd, neat_assoc_t id, struct sockaddr** addrs); void nsa_freeladdrs(struct sockaddr* addrs); int nsa_getpaddrs(int sockfd, neat_assoc_t id, struct sockaddr** addrs); void nsa_freepaddrs(struct sockaddr* addrs); Listing 9: The NEAT Sockets API: address handling.

int nsa_open(const char* pathname, int flags, mode_t mode); int nsa_creat(const char* pathname, mode_t mode); int nsa_lockf(int fd, int cmd, off_t len); #ifdef _LARGEFILE64_SOURCE int nsa_lockf64(int fd, int cmd, off64_t len); #endif int nsa_flock(int fd, int operation); int nsa_fstat(int fd, struct stat* buf); long nsa_fpathconf(int fd, int name); int nsa_fchown(int fd, uid_t owner, gid_t group); int nsa_fsync(int fd); int nsa_fdatasync(int fd); int nsa_syncfs(int fd); int nsa_dup(int oldfd); int nsa_dup2(int oldfd, int newfd); int nsa_dup3(int oldfd, int newfd, int flags); off_t nsa_lseek(int fd, off_t offset, int whence); int nsa_ftruncate(int fd, off_t length); #ifdef _LARGEFILE64_SOURCE off64_t nsa_lseek64(int fd, off64_t offset, int whence); int nsa_ftruncate64(int fd, off64_t length); #endif int nsa_pipe(int fds[2]); int nsa_ioctl(int fd, int request, const void* argp); Listing 10: The NEAT Sockets API: miscellaneous.

// !!! Work in Progress !!! int nsa_set_secure_identity(int sockfd, const char* pem); Listing 11: The NEAT Sockets API: security.

103 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

I Internet Draft: NEAT Sockets API

The following Internet Draft [18] has been produced by project participants.

104 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Network Working Group T. Dreibholz Internet-Draft Simula Research Laboratory Intended status: Experimental October 30, 2017 Expires: May 3, 2018

NEAT Sockets API draft-dreibholz-taps-neat-socketapi-02.txt

Abstract

This document describes a BSD Sockets-like API on top of the callback-based NEAT User API. This facilitates porting existing applications to use a subset of NEAT’s functionality.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on May 3, 2018.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust’s Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Dreibholz Expires May 3, 2018 [Page 1]

105 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

Table of Contents

1. Introduction ...... 4 1.1. Conventions ...... 4 2. Initialisation and Clean-Up ...... 4 2.1. nsa_init() ...... 4 2.2. nsa_cleanup() ...... 4 2.3. nsa_map_socket() ...... 5 2.4. nsa_unmap_socket() ...... 5 3. Connection Establishment and Teardown ...... 5 3.1. nsa_socket() ...... 5 3.2. nsa_socketpair() ...... 6 3.3. nsa_close() ...... 6 3.4. nsa_fcntl() ...... 7 3.5. nsa_bind() ...... 7 3.6. nsa_bindx() ...... 8 3.7. nsa_bindn() ...... 9 3.8. nsa_connect() ...... 9 3.9. nsa_connectx() ...... 10 3.10. nsa_connectn() ...... 11 3.11. nsa_listen() ...... 11 3.12. nsa_accept() ...... 12 3.13. nsa_accept4() ...... 12 3.14. nsa_shutdown() ...... 13 4. Options Handling ...... 13 4.1. nsa_getsockopt() ...... 13 4.2. nsa_setsockopt() ...... 14 4.3. nsa_opt_info() ...... 14 5. Security ...... 15 5.1. nsa_set_secure_identity() ...... 15 5.2...... 15 6. Input/Output Handling ...... 15 6.1. nsa_write() ...... 15 6.2. nsa_writev() ...... 16 6.3. nsa_pwrite() ...... 16 6.4. nsa_pwrite64() ...... 16 6.5. nsa_pwritev() ...... 17 6.6. nsa_pwritev64() ...... 17 6.7. nsa_send() ...... 17 6.8. nsa_sendto() ...... 18 6.9. nsa_sendmsg() ...... 18 6.10. nsa_sendv() ...... 19 6.11. nsa_read() ...... 20 6.12. nsa_readv() ...... 20 6.13. nsa_pread() ...... 21 6.14. nsa_pread64() ...... 21 6.15. nsa_preadv() ...... 21 6.16. nsa_preadv64() ...... 21

Dreibholz Expires May 3, 2018 [Page 2]

106 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

6.17. nsa_recv() ...... 22 6.18. nsa_recvfrom() ...... 22 6.19. nsa_recvmsg() ...... 23 6.20. nsa_recvv() ...... 23 7. Poll and Select ...... 24 7.1. nsa_poll() ...... 24 7.2. nsa_select() ...... 25 8. Address Handling ...... 25 8.1. nsa_getsockname() ...... 25 8.2. nsa_getpeername() ...... 26 8.3. nsa_getladdrs() ...... 26 8.4. nsa_freeladdrs() ...... 27 8.5. nsa_getpaddrs() ...... 27 8.6. nsa_freepaddrs() ...... 28 9. Miscellaneous ...... 28 9.1. nsa_open() ...... 28 9.2. nsa_creat() ...... 28 9.3. nsa_lockf() ...... 28 9.4. nsa_lockf64() ...... 29 9.5. nsa_flock() ...... 29 9.6. nsa_fstat() ...... 29 9.7. nsa_fpathconf() ...... 29 9.8. nsa_fchown() ...... 30 9.9. nsa_fsync() ...... 30 9.10. nsa_fdatasync() ...... 30 9.11. nsa_syncfs() ...... 30 9.12. nsa_dup2() ...... 31 9.13. nsa_dup3() ...... 31 9.14. nsa_dup() ...... 31 9.15. nsa_lseek() ...... 31 9.16. nsa_lseek64() ...... 32 9.17. nsa_truncate() ...... 32 9.18. nsa_truncate64() ...... 32 9.19. nsa_pipe() ...... 32 9.20. nsa_ioctl() ...... 33 10. Code Examples ...... 33 11. Testbed Platform ...... 33 12. Security Considerations ...... 33 13. IANA Considerations ...... 33 14. Acknowledgments ...... 33 15. References ...... 34 15.1. Normative References ...... 34 15.2. Informative References ...... 34 15.3. URIs ...... 36 Author’s Address ...... 36

Dreibholz Expires May 3, 2018 [Page 3]

107 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

1. Introduction

The NEAT project [11], [12], [5], [3], [8] wants to achieve a complete redesign of the way in which Internet applications interact with the network. Our goal is to allow network "services" offered to applications - such as reliability, low-delay communication or security - to be dynamically tailored based on application demands, current network conditions, hardware capabilities or local policies, and also to support the integration of new network functionality in an evolutionary fashion.

This document describes the NEAT Sockets API on top of the callback- based NEAT User API [4]. It provides a BSD Sockets-like API that facilitates porting existing applications to use a subset of NEAT’s functionality. For further information on NEAT, see also [11], [12], [13], [14], [15].

1.1. Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [1].

2. Initialisation and Clean-Up

2.1. nsa_init()

nsa_init() is used to explicitly initialise the NEAT Sockets API. In the usual case, however, the NEAT Sockets API is automatically initialized when creating a NEAT socket. Explicit initialisation may only be necessary in a multi-threaded program, in order to avoid parallel initialisation calls.

Function Prototype:

int nsa_init()

Return Value:

nsa_init() returns the new NEAT socket descriptor, or -1 in case of error. The error code will be set in the errno variable.

2.2. nsa_cleanup()

nsa_cleanup() is used to free all resources allocated by NEAT. Note, that the NEAT Sockets API is automatically initialized when creating a NEAT socket.

Dreibholz Expires May 3, 2018 [Page 4]

108 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

Function Prototype:

void nsa_cleanup()

2.3. nsa_map_socket()

nsa_map_socket() is used to map a system socket descriptor into the NEAT socket descriptor space. This is useful for using NEAT API functions as wrapper to calls on non-NEAT sockets. Mapped socket descriptors can be unmapped by using nsa_unmap_socket().

Function Prototype:

int nsa_map_socket(int systemSD, int neatSD)

Arguments:

systemSD: System socket descriptor.

neatSD: Desired NEAT socket descriptor; -1 for automatic allocation.

Return Value:

nsa_map_socket() returns the new NEAT socket descriptor, or -1 in case of error. The error code will be set in the errno variable.

2.4. nsa_unmap_socket()

nsa_unmap_socket() is used to unmap a system socket descriptor from the NEAT socket descriptor space.

Function Prototype:

int nsa_unmap_socket(int neatSD)

Arguments:

neatSD: NEAT socket descriptor.

3. Connection Establishment and Teardown

3.1. nsa_socket()

nsa_socket() creates a new NEAT socket. The NEAT socket can either be a wrapper around the NEAT User API (if properties are specified) or be a wrapper around a system socket (if no properties are specified).

Dreibholz Expires May 3, 2018 [Page 5]

109 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

Function Prototype:

int nsa_socket(int domain, int type, int protocol, const char* properties)

Arguments:

domain: Domain for system socket (e.g. AF_INET).

type: Type for system socket (SOCK_SEQPACKET).

protocol: Protocol for system socket (IPPROTO_SCTP).

properties: Properties for NEAT Core socket.

Return Value:

nsa_socket() returns the NEAT socket descriptor in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the socket() documentation for details.

3.2. nsa_socketpair()

nsa_socketpair() is a wrapper around the socketpair() call, returning NEAT socket descriptors instead. Note, that socketpair() only supports AF_UNIX sockets, i.e. this function is just a wrapper for the system function.

Function Prototype:

int nsa_socketpair(int domain, int type, int protocol, const char* properties)

See the socketpair() documentation for details.

3.3. nsa_close()

nsa_close() closes a given NEAT socket.

Function Prototype:

int nsa_close(int sockfd)

Arguments:

sockfd: NEAT socket descriptor.

Dreibholz Expires May 3, 2018 [Page 6]

110 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

nsa_close() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the close() documentation for details.

3.4. nsa_fcntl()

nsa_fcntl() manipulates a given NEAT socket.

Function Prototype:

int nsa_fcntl(int sockfd, int cmd, ...)

Arguments:

sockfd: NEAT socket descriptor.

cmd: Command.

...: Command-specific arguments.

nsa_fcntl() returns a command-specific value.

For NEAT sockets, the following commands are specified:

F_GETFL: Obtain value of the socket descriptor status flags. For NEAT sockets, the flag O_NONBLOCK specifies whether the socket is non-blocking. By default, it is blocking (i.e. O_NONBLOCK is not set).

F_SETFL: Set value of the socket descriptor status flags. For NEAT sockets, the flag O_NONBLOCK specifies whether the socket is non- blocking. By default, it is blocking (i.e. O_NONBLOCK is not set). F_SETFL can then be used to change the blocking mode.

See the fcntl() documentation for details.

3.5. nsa_bind()

nsa_bind() binds a given NEAT socket to a given address. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_bindn() instead. Note further, that nsa_bind() also supports a single address only (i.e. no multi-homing). nsa_bindx() SHOULD be used instead to support multi-homing.

Function Prototype:

Dreibholz Expires May 3, 2018 [Page 7]

111 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

int nsa_bind(int sockfd, const struct sockaddr* addr, socklen_t addrlen, struct neat_tlv* opt, const int optcnt)

Arguments:

sockfd: NEAT socket descriptor.

addr: Address to bind to.

addrlen: Length of the address structure "addr".

opt: NEAT options (NULL, if there are none).

optcnt: Number of NEAT options provided by "opt".

nsa_bind() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the bind() documentation for details.

3.6. nsa_bindx()

nsa_bindx() binds a given NEAT socket to a given set of addresses. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_bindn() instead.

Function Prototype:

int nsa_bindx(int sockfd, const struct sockaddr* addrs, int addrcnt, int flags, struct neat_tlv* opt, const int optcnt)

Arguments:

sockfd: NEAT socket descriptor.

addrs: Addresses to bind to.

addrcnt: Number of addresses in "addr".

flags: Optional flags (0, if there are none).

opt: NEAT options (NULL, if there are none).

optcnt: Number of NEAT options provided by "opt".

Dreibholz Expires May 3, 2018 [Page 8]

112 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

nsa_bindx() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the sctp_bindx() documentation for details.

3.7. nsa_bindn()

nsa_bindn() binds a given NEAT socket to a given port. NEAT takes care of handling local addresses.

Function Prototype:

int nsa_bindn(int sockfd, uint16_t port, int flags, struct neat_tlv* opt, const int optcnt)

Arguments:

sockfd: NEAT socket descriptor.

port: Port number to bind to.

flags: Optional flags (0, if there are none).

opt: NEAT options (NULL, if there are none).

optcnt: Number of NEAT options provided by "opt".

nsa_bindn() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

3.8. nsa_connect()

nsa_connect() connects a given NEAT socket to a given remote address. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_connectn() instead. Note further, that nsa_connect() also supports a single address only (i.e. no multi- homing). nsa_connectx() SHOULD be used instead to support multi- homing.

Function Prototype:

int nsa_connect(int sockfd, const struct sockaddr* addr, socklen_t addrlen, struct neat_tlv* opt, const int optcnt)

Arguments:

sockfd: NEAT socket descriptor.

Dreibholz Expires May 3, 2018 [Page 9]

113 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

addr: Address to connect to.

addrlen: Length of the address structure "addr".

opt: NEAT options (NULL, if there are none).

optcnt: Number of NEAT options provided by "opt".

nsa_connect() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the connect() documentation for details.

3.9. nsa_connectx()

nsa_connectx() connects a given NEAT socket to a given set of remote addresses. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_connectn() instead.

Function Prototype:

int nsa_connectx(int sockfd, const struct sockaddr* addrs, int addrcnt, neat_assoc_t* id, struct neat_tlv* opt, const int optcnt)

Arguments:

sockfd: NEAT socket descriptor.

addrs: Addresses to connect to.

addrcnt: Number of addresses in "addr".

id Pointer to store association ID to (not used yet, use NULL!).

opt: NEAT options (NULL, if there are none).

optcnt: Number of NEAT options provided by "opt".

nsa_connectx() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the sctp_connectx() documentation for details.

Dreibholz Expires May 3, 2018 [Page 10]

114 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

3.10. nsa_connectn()

nsa_connectn() connects a given NEAT socket to a given remote name and port. The remote name is resolved by NEAT to corresponding remote addresses.

Function Prototype:

int nsa_connectn(int sockfd, const char* name, const uint16_t port, neat_assoc_t* id, struct neat_tlv* opt, const int optcnt)

Arguments:

sockfd: NEAT socket descriptor.

name: Remote name to connect to.

port: Remote port number to connect to.

id Pointer to store association ID to (not used yet, use NULL!).

opt: NEAT options (NULL, if there are none).

optcnt: Number of NEAT options provided by "opt".

nsa_connectn() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

3.11. nsa_listen()

nsa_listen() marks a given NEAT socket as listening socket, i.e. accepting incoming connections.

Function Prototype:

int nsa_listen(int sockfd, int backlog)

Arguments:

sockfd: NEAT socket descriptor.

backlog: Defines the maximum length to which the queue of pending connections for "sockfd" may grow.

nsa_listen() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

Dreibholz Expires May 3, 2018 [Page 11]

115 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

See the listen() documentation for details.

3.12. nsa_accept()

nsa_accept() extracts the first connection request in the queue of pending connections for a listening NEAT socket, creates a new connected socket, and returns a new NEAT socket descriptor referring to that socket.

Function Prototype:

int nsa_accept(int sockfd, struct sockaddr* addr, socklen_t* addrlen)

Arguments:

sockfd: NEAT socket descriptor.

addr: Pointer to storage space to store the peer’s primary address to (or NULL, if address is not needed).

addrlen: Pointer to variable with size of the storage in "addr" (or NULL, if address is not needed).

nsa_accept() returns the new NEAT socket descriptor in case of success, or -1 in case of error. The error code will be set in the errno variable. In case of success, the peer’s primary address is stored in "addr", if there is sufficient space. The variable pointer to by "addrlen" will then contain the actual address size.

See the accept() documentation for details.

3.13. nsa_accept4()

nsa_accept4() extracts the first connection request in the queue of pending connections for a listening NEAT socket, creates a new connected socket, and returns a new NEAT socket descriptor referring to that socket. If successful, and flags!=0, nsa_accept4() furthermore makes the new socket non-blocking (SOCK_NONBLOCK flag) and/or close-on-exec (SOCK_CLOEXEC flag). For flags==0, the behaviour is identical to nsa_accept().

Function Prototype:

int nsa_accept4(int sockfd, struct sockaddr* addr, socklen_t* addrlen, int flags)

Arguments:

Dreibholz Expires May 3, 2018 [Page 12]

116 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

sockfd: NEAT socket descriptor.

addr: Pointer to storage space to store the peer’s primary address to (or NULL, if address is not needed).

addrlen: Pointer to variable with size of the storage in "addr" (or NULL, if address is not needed).

nsa_accept4() returns the new NEAT socket descriptor in case of success, or -1 in case of error. The error code will be set in the errno variable. In case of success, the peer’s primary address is stored in "addr", if there is sufficient space. The variable pointer to by "addrlen" will then contain the actual address size.

See the accept() documentation for details.

3.14. nsa_shutdown()

nsa_shutdown() shuts down the connection of a given NEAT socket.

Function Prototype:

int nsa_shutdown(int sockfd, int how)

Arguments:

sockfd: NEAT socket descriptor.

how: Not used for NEAT sockets (set to SHUT_RDWR).

nsa_shutdown() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the shutdown() documentation for details.

4. Options Handling

4.1. nsa_getsockopt()

nsa_getsockopt() gets a socket option of a given NEAT socket.

Function Prototype:

int nsa_getsockopt(int sockfd, int level, int optname, void* optval, socklen_t* optlen)

Arguments:

Dreibholz Expires May 3, 2018 [Page 13]

117 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

sockfd: NEAT socket descriptor.

level: Option level.

optname: Option number.

optval: Buffer to store option value to.

optlen: Pointer to variable with length of the buffer in "optval".

nsa_getsockopt() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the getsockopt() documentation for details.

4.2. nsa_setsockopt()

nsa_getsockopt() sets a socket option of a given NEAT socket.

Function Prototype:

int nsa_setsockopt(int sockfd, int level, int optname, const void* optval, socklen_t optlen)

Arguments:

sockfd: NEAT socket descriptor.

level: Option level.

optname: Option number.

optval: Buffer with option value to set.

optlen: Length of buffer with option value.

nsa_setsockopt() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the setsockopt() documentation for details.

4.3. nsa_opt_info()

nsa_opt_info() gets a socket option of a given NEAT socket.

Function Prototype:

Dreibholz Expires May 3, 2018 [Page 14]

118 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

int nsa_opt_info(int sockfd, neat_assoc_t id, int opt, void* arg, socklen_t* size)

Arguments:

sockfd: NEAT socket descriptor.

id: Association identifier (0 in case of 1:1-style sockets).

opt: Option number.

arg: Buffer to store option value to.

size: Pointer to variable with length of the buffer in "arg".

nsa_opt_info() returns 0 in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the sctp_opt_info() documentation for details.

5. Security

5.1. nsa_set_secure_identity()

TBD.

5.2. ...

TBD.

6. Input/Output Handling

6.1. nsa_write()

nsa_write() sends data over a given connected NEAT socket. For NEAT sockets, nsa_write() is equal to nsa_send() with "flags" set to 0.

Function Prototype:

ssize_t nsa_write(int fd, const void* buf, size_t len)

Arguments:

fd: NEAT socket descriptor.

buf: Data to send.

len: Length of data to send.

Dreibholz Expires May 3, 2018 [Page 15]

119 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

nsa_write() returns the number of sent bytes in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the write() documentation for details.

6.2. nsa_writev()

nsa_writev() sends data over a given connected NEAT socket. The data is provided by an iovec structure.

Function Prototype:

ssize_t nsa_writev(int fd, const struct iovec* iov, int iovcnt)

Arguments:

sockfd: NEAT socket descriptor.

iov: Data to send provided as iovec structures.

iovcnt: Number of provided iovec structures.

nsa_writev() returns the number of sent bytes in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the writev() documentation for details.

6.3. nsa_pwrite()

nsa_pwrite() is a wrapper around the pwrite() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_pwrite(int fd, const void* buf, size_t len, off_t offset)

See the pwrite() documentation for details.

6.4. nsa_pwrite64()

nsa_pwrite64() is a wrapper around the pwrite64() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_pwrite(int fd, const void* buf, size_t len, off64_t offset)

Dreibholz Expires May 3, 2018 [Page 16]

120 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

See the pwrite64() documentation for details.

6.5. nsa_pwritev()

nsa_pwritev() is a wrapper around the pwritev() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_pwritev(int fd, const struct iovec* iov, int iovcnt, off_t offset)

See the pwritev() documentation for details.

6.6. nsa_pwritev64()

nsa_pwritev64() is a wrapper around the pwritev64() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_pwritev(int fd, const struct iovec* iov, int iovcnt, off64_t offset)

See the pwritev64() documentation for details.

6.7. nsa_send()

nsa_send() sends data over a given connected NEAT socket.

Function Prototype:

ssize_t nsa_send(int sockfd, const void* buf, size_t len, int flags)

Arguments:

sockfd: NEAT socket descriptor.

buf: Data to send.

len: Length of data to send.

flags: Optional flags (0, if there are none).

nsa_send() returns the number of sent bytes in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the send() documentation for details.

Dreibholz Expires May 3, 2018 [Page 17]

121 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

6.8. nsa_sendto()

nsa_sendto() is a wrapper around the sendto() call, using NEAT socket descriptors instead. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_send() instead. On NEAT sockets, a provided destination address is ignored.

Function Prototype:

ssize_t nsa_sendto(int sockfd, const void* buf, size_t len, int flags, const struct sockaddr* to, socklen_t tolen)

Arguments:

sockfd: NEAT socket descriptor.

buf: Data to send.

len: Length of data to send.

flags: Optional flags (0, if there are none).

to: Address to send data to (ignored for NEAT sockets).

tolen: Length of address to send data to (ignored for NEAT sockets).

nsa_sendto() returns the number of sent bytes in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the send() documentation for details.

6.9. nsa_sendmsg()

nsa_sendmsg() sends data over a given connected NEAT socket. The data and control information is provided by a msghdr structure. On NEAT sockets, a provided destination address is ignored.

Function Prototype:

ssize_t nsa_sendmsg(int sockfd, const struct msghdr* msg, int flags)

Arguments:

sockfd: NEAT socket descriptor.

Dreibholz Expires May 3, 2018 [Page 18]

122 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

msg: Data to send and corresponding control information as msghdr structure.

flags: Optional flags (0, if there are none).

nsa_sendmsg() returns the number of sent bytes in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the sendmsg() documentation for details.

6.10. nsa_sendv()

nsa_sendv() sends data over a given connected NEAT socket. The data and control information is provided by iovec and info structures. On NEAT sockets, a provided destination address is ignored.

Function Prototype:

ssize_t nsa_sendv(int sockfd, struct iovec* iov, int iovcnt, struct sockaddr* to, int tocnt, void* info, socklen_t infolen, unsigned int infotype, int flags)

Arguments:

sockfd: NEAT socket descriptor.

iov: Data to send provided as iovec structures.

iovcnt: Number of provided iovec structures.

to: Address(es) to send data to (ignored for NEAT sockets).

tocnt: Number of of addresses to send data to (ignored for NEAT sockets).

info: Control information.

infolen: Length of control information.

infotype: Type of control information.

flags: Optional flags (0, if there are none).

nsa_sendv() returns the number of sent bytes in case of success, or -1 in case of error. The error code will be set in the errno variable.

Dreibholz Expires May 3, 2018 [Page 19]

123 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

See the sctp_sendv() documentation for details.

6.11. nsa_read()

nsa_read() reads data from a given connected NEAT socket. For NEAT sockets, nsa_read() is equal to nsa_recv() with "flags" set to 0.

Function Prototype:

ssize_t nsa_read(int fd, void* buf, size_t len)

Arguments:

fd: NEAT socket descriptor.

buf: Buffer to store read data to.

len: Length of the storage buffer.

nsa_read() returns the number of read bytes in case of success, 0 in case of connection shutdown, or -1 in case of error. The error code will be set in the errno variable.

See the read() documentation for details.

6.12. nsa_readv()

nsa_readv() reads data from a given connected NEAT socket. The data information buffers are provided by an iovec structure.

Function Prototype:

ssize_t nsa_readv(int fd, const struct iovec* iov, int iovcnt)

Arguments:

fd: NEAT socket descriptor.

iov: Data to send provided as iovec structures.

iovcnt: Number of provided iovec structures.

nsa_readv() returns the number of read bytes in case of success, 0 in case of connection shutdown, or -1 in case of error. The error code will be set in the errno variable.

See the readv() documentation for details.

Dreibholz Expires May 3, 2018 [Page 20]

124 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

6.13. nsa_pread()

nsa_pread() is a wrapper around the pread() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_pread(int fd, void* buf, size_t len, off_t offset)

See the pread() documentation for details.

6.14. nsa_pread64()

nsa_pread64() is a wrapper around the pread64() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_pread(int fd, void* buf, size_t len, off_t offset)

See the pread64() documentation for details.

6.15. nsa_preadv()

nsa_preadv() is a wrapper around the preadv() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_preadv(int fd, const struct iovec* iov, int iovcnt, off64_t offset)

See the preadv() documentation for details.

6.16. nsa_preadv64()

nsa_preadv64() is a wrapper around the preadv64() call, using a NEAT socket descriptor instead.

Function Prototype:

ssize_t nsa_preadv(int fd, const struct iovec* iov, int iovcnt, off64_t offset)

See the preadv64() documentation for details.

Dreibholz Expires May 3, 2018 [Page 21]

125 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

6.17. nsa_recv()

nsa_recv() reads data from a given connected NEAT socket.

Function Prototype:

ssize_t nsa_recv(int sockfd, void* buf, size_t len, int flags)

Arguments:

sockfd: NEAT socket descriptor.

buf: Buffer to store read data to.

len: Length of the storage buffer.

flags: Optional flags (0, if there are none).

nsa_recv() returns the number of read bytes in case of success, 0 in case of connection shutdown, or -1 in case of error. The error code will be set in the errno variable.

See the recv() documentation for details.

6.18. nsa_recvfrom()

nsa_recvfrom() reads data from a given connected NEAT socket. The peer’s sending address of the data (if possible and useful for underlying transport protocol) is obtained as well. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_recv() instead.

Function Prototype:

ssize_t nsa_recvfrom(int sockfd, void* buf, size_t len, int flags, struct sockaddr* from, socklen_t* fromlen)

sockfd: NEAT socket descriptor.

buf: Buffer to store read data to.

len: Length of the storage buffer.

flags: Optional flags (0, if there are none).

from: Pointer to storage space to store the peer’s primary address to (or NULL, if address is not needed).

Dreibholz Expires May 3, 2018 [Page 22]

126 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

fromlen: Pointer to variable with size of the storage in "from" (or NULL, if address is not needed).

nsa_recvfrom() returns the number of read bytes in case of success, 0 in case of connection shutdown, or -1 in case of error. The error code will be set in the errno variable. In case of success, the peer’s sending address (if possible and useful for underlying transport protocol) may be stored in "from", if there is sufficient space. The variable pointer to by "fromlen" will then contain the actual address size.

See the recvfrom() documentation for details.

6.19. nsa_recvmsg()

nsa_recvmsg() reads data from a given connected NEAT socket. The data and control information buffers are provided by a msghdr structure.

Function Prototype:

ssize_t nsa_recvmsg(int sockfd, struct msghdr* msg, int flags)

Arguments:

sockfd: NEAT socket descriptor.

msg: Data to send and corresponding control information as msghdr structure.

flags: Optional flags (0, if there are none).

nsa_recvmsg() returns the number of read bytes in case of success, 0 in case of connection shutdown, or -1 in case of error. The error code will be set in the errno variable.

See the recvmsg() documentation for details.

6.20. nsa_recvv()

nsa_recvv() reads data from a given connected NEAT socket. The data and control information buffers are provided by iovec and info structures.

Function Prototype:

Dreibholz Expires May 3, 2018 [Page 23]

127 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

ssize_t nsa_recvv(int sockfd, struct iovec* iov, int iovcnt, struct sockaddr* from, socklen_t* fromlen, void* info, socklen_t* infolen, unsigned int* infotype, int* msg_flags)

Arguments:

sockfd: NEAT socket descriptor.

iov: Data to send provided as iovec structures.

iovcnt: Number of provided iovec structures.

from: Pointer to storage space to store the peer’s primary address to (or NULL, if address is not needed).

fromlen: Pointer to variable with size of the storage in "from" (or NULL, if address is not needed).

info: Pointer to storage space for control information.

infolen: Pointer to variable with length of control information.

infotype: Pointer to variable for storing the control information type to.

flags: Pointer to variable with optional flags.

nsa_recvv() returns the number of sent received in case of success, or -1 in case of error. The error code will be set in the errno variable.

See the sctp_recvv() documentation for details.

7. Poll and Select

7.1. nsa_poll()

nsa_poll() waits for activity (input/output/error/...) on a set of given NEAT sockets.

Function Prototype:

int nsa_poll(struct pollfd* ufds, const nfds_t nfds, int timeout)

Arguments:

Dreibholz Expires May 3, 2018 [Page 24]

128 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

ufds: NEAT socket descriptor and requested activity for each NEAT socket.

nfds: Number of sockets given by "ufds".

timeout: Timeout in milliseconds.

nsa_poll() returns the number of NEAT sockets with activity in case of success, 0 in case of timeout, or -1 in case of error. The error code will be set in the errno variable.

See the poll() documentation for details.

7.2. nsa_select()

nsa_select() is a wrapper around the select() call, using NEAT socket descriptors instead. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_poll() instead.

Function Prototype:

int nsa_select(int n, fd_set* readfds, fd_set* writefds, fd_set* exceptfds, struct timeval* timeout)

See the select() documentation for details.

8. Address Handling

8.1. nsa_getsockname()

nsa_getsockname() obtains the first local address of a socket. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_getladdrs() instead to support multi-homed transport protocols!

Function Prototype:

int nsa_getsockname(int sockfd, struct sockaddr* name, socklen_t* namelen)

Arguments:

sockfd: NEAT socket descriptor.

name: Storage space for the address.

namelen: Pointer to variable with the storage space’s size.

Dreibholz Expires May 3, 2018 [Page 25]

129 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

Return Value:

nsa_getsockname() returns 0 in case of success (with the actual address size stored into the "namelen" variable), or -1 in case of error. The error code will be set in the errno variable.

See the getsockname() documentation for details.

8.2. nsa_getpeername()

nsa_getpeername() obtains the first remote address of a connected socket. Note: this function is provided as legacy wrapper, and it is RECOMMENDED to use nsa_getpaddrs() instead to support multi-homed transport protocols!

Function Prototype:

int nsa_getpeername(int sockfd, struct sockaddr* name, socklen_t* namelen)

Arguments:

sockfd: NEAT socket descriptor.

name: Storage space for the address.

namelen: Pointer to variable with the storage space’s size.

Return Value:

nsa_getpeername() returns 0 in case of success (with the actual address size stored into the "namelen" variable), or -1 in case of error. The error code will be set in the errno variable.

See the getpeername() documentation for details.

8.3. nsa_getladdrs()

nsa_getladdrs() obtains the local addresses of a socket. The storage space for the addresses will be automatically allocated and needs to be freed by nsa_freeladdrs().

Function Prototype:

int nsa_getladdrs(int sockfd, neat_assoc_t id, struct sockaddr** addrs)

Arguments:

Dreibholz Expires May 3, 2018 [Page 26]

130 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

sockfd: NEAT socket descriptor.

id: Association identifier (0 in case of 1:1-style sockets).

addrs: Pointer to variable to store pointer to addresses to.

nsa_getladdrs() returns the number of addresses stored into a newly allocated space. The pointer to this space is stored into the variable provided by "addrs". In case of error, -1 is returned, and the error code will be set in the errno variable.

8.4. nsa_freeladdrs()

nsa_freeladdrs() frees addresses obtained by nsa_getladdrs().

Function Prototype:

void nsa_freeladdrs(struct sockaddr* addrs)

Arguments:

addrs: Pointer to addresses to be freed.

8.5. nsa_getpaddrs()

nsa_getpaddrs() obtains the remote addresses of a connected socket. The storage space for the addresses will be automatically allocated and needs to be freed by nsa_freepaddrs().

Function Prototype:

int nsa_getpaddrs(int sockfd, neat_assoc_t id, struct sockaddr** addrs)

Arguments:

sockfd: NEAT socket descriptor.

id: Association identifier (0 in case of 1:1-style sockets).

addrs: Pointer to variable to store pointer to addresses to.

nsa_getpaddrs() returns the number of addresses stored into a newly allocated space. The pointer to this space is stored into the variable provided by "addrs". In case of error, -1 is returned, and the error code will be set in the errno variable.

Dreibholz Expires May 3, 2018 [Page 27]

131 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

8.6. nsa_freepaddrs()

nsa_freepaddrs() frees addresses obtained by nsa_getpaddrs().

Function Prototype:

void nsa_freepaddrs(struct sockaddr* addrs)

Arguments:

addrs: Pointer to addresses to be freed.

9. Miscellaneous

This section contains miscellaneous wrapper functions, mostly around file I/O. Since Unix file descriptors are used together with socket descriptors in functions like poll(), select(), etc., it is necessary to wrap functions handling file descriptors as well.

9.1. nsa_open()

nsa_open() is a wrapper around the open() call, returning a NEAT socket descriptor instead.

Function Prototype:

int nsa_open(const char* pathname, int flags, mode_t mode)

See the open() documentation for details.

9.2. nsa_creat()

nsa_creat() is a wrapper around the creat() call, returning a NEAT socket descriptor instead.

Function Prototype:

int nsa_creat(const char* pathname, mode_t mode)

See the creat() documentation for details.

9.3. nsa_lockf()

nsa_lockf() is a wrapper around the lockf() call, using a NEAT socket descriptor instead.

Function Prototype:

Dreibholz Expires May 3, 2018 [Page 28]

132 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

int nsa_lockf(int fd, int cmd, off_t len)

See the lockf() documentation for details.

9.4. nsa_lockf64()

nsa_lockf64() is a wrapper around the lockf64() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_lockf(int fd, int cmd, off64_t len)

See the lockf64() documentation for details.

9.5. nsa_flock()

nsa_flock() is a wrapper around the flock() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_flock(int fd, int operation)

See the flock() documentation for details.

9.6. nsa_fstat()

nsa_fstat() is a wrapper around the fstat() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_fstat(int fd, struct stat* buf)

See the fstat() documentation for details.

9.7. nsa_fpathconf()

nsa_fpathconf() is a wrapper around the fpathconf() call, using a NEAT socket descriptor instead.

Function Prototype:

long nsa_fpathconf(int fd, int name)

See the fpathconf() documentation for details.

Dreibholz Expires May 3, 2018 [Page 29]

133 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

9.8. nsa_fchown()

nsa_fchown() is a wrapper around the fchown() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_fchown(int fd, uid_t owner, gid_t group)

See the fchown() documentation for details.

9.9. nsa_fsync()

nsa_fsync() is a wrapper around the fsync() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_fsync(int fd)

See the fsync() documentation for details.

9.10. nsa_fdatasync()

nsa_fdatasync() is a wrapper around the fdatasync() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_fdatasync(int fd)

See the fdatasync() documentation for details.

9.11. nsa_syncfs()

nsa_syncfs() is a wrapper around the syncfs() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_syncfs(int fd)

See the syncfs() documentation for details.

Dreibholz Expires May 3, 2018 [Page 30]

134 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

9.12. nsa_dup2()

nsa_dup2() is a wrapper around the dup2() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_dup2(int oldfd, int newfd)

See the dup2() documentation for details.

9.13. nsa_dup3()

nsa_dup3() is a wrapper around the dup3() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_dup3(int oldfd, int newfd, int flags)

See the dup3() documentation for details.

9.14. nsa_dup()

nsa_dup() is a wrapper around the dup() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_dup(int oldfd)

See the dup() documentation for details.

9.15. nsa_lseek()

nsa_lseek() is a wrapper around the lseek() call, using a NEAT socket descriptor instead.

Function Prototype:

off_t nsa_lseek(int fd, off_t offset, int whence)

See the lseek() documentation for details.

Dreibholz Expires May 3, 2018 [Page 31]

135 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

9.16. nsa_lseek64()

nsa_lseek64() is a wrapper around the lseek64() call, using a NEAT socket descriptor instead.

Function Prototype:

off_t nsa_lseek(int fd, off64_t offset, int whence)

See the lseek64() documentation for details.

9.17. nsa_truncate()

nsa_truncate() is a wrapper around the truncate() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_ftruncate(int fd, off_t length)

See the truncate() documentation for details.

9.18. nsa_truncate64()

nsa_truncate64() is a wrapper around the truncate64() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_ftruncate(int fd, off64_t length)

See the truncate64() documentation for details.

9.19. nsa_pipe()

nsa_pipe() is a wrapper around the pipe() call, returning NEAT socket descriptors instead.

Function Prototype:

int nsa_pipe(int fds[2])

See the pipe() documentation for details.

Dreibholz Expires May 3, 2018 [Page 32]

136 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

9.20. nsa_ioctl()

nsa_ioctl() is a wrapper around the ioctl() call, using a NEAT socket descriptor instead.

Function Prototype:

int nsa_ioctl(int fd, int request, const void* argp)

See the ioctl() documentation for details.

10. Code Examples

Running code examples can be found in the NEAT Git repository, with some tutorial material in [10]:

URL: https://github.com/NEAT-project/neat

Branch: dreibh/neat-socketapi [2]

Directory: socketapi/examples/ [3]

11. Testbed Platform

A large-scale and realistic Internet testbed platform with support for the multi-homing feature of the underlying SCTP and MPTCP protocols is NorNet. A description of NorNet is provided in [6], [7], some further information can be found on the project website [9].

12. Security Considerations

Security considerations for the SCTP sockets API are described in [2].

13. IANA Considerations

This document does not require IANA actions.

14. Acknowledgments

The author would like to thank David Ros, Michael Welzl, and Xing Zhou for their support.

Dreibholz Expires May 3, 2018 [Page 33]

137 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

15. References

15.1. Normative References

[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, .

[2] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. Yasevich, "Sockets API Extensions for the Stream Control Transmission Protocol (SCTP)", RFC 6458, DOI 10.17487/RFC6458, December 2011, .

[3] Gjessing, S. and M. Welzl, "A Minimal Set of Transport Services for TAPS Systems", draft-gjessing-taps-minset-05 (work in progress), June 2017.

[4] Fairhurst, G., "The NEAT Interface to Transport Services", draft-fairhurst-taps-neat-00 (work in progress), October 2017.

[5] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of Transport Features Provided by IETF Transport Protocols", draft-ietf-taps-transports-usage-09 (work in progress), October 2017.

15.2. Informative References

[6] Dreibholz, T., "NorNet - Building an Inter-Continental Internet Testbed based on Open Source Software", Proceedings of the LinuxCon Europe, October 2016, .

[7] Gran, E., Dreibholz, T., and A. Kvalbein, "NorNet Core - A Multi-Homed Research Testbed", Computer Networks, Special Issue on Future Internet Testbeds Volume 61, Pages 75-87, ISSN 1389-1286, DOI 10.1016/j.bjp.2013.12.035, March 2014, .

[8] Dreibholz, T., "NEAT - A New, Evolutive API and Transport- Layer Architecture for the Internet", Online: https://www.neat-project.org/, 2017, .

Dreibholz Expires May 3, 2018 [Page 34]

138 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

[9] Dreibholz, T., "NorNet - A Real-World, Large-Scale Multi- Homing Testbed", Online: https://www.nntb.no/, 2017, .

[10] Dreibholz, T., "A Practical Introduction to NEAT at Hainan University", Invited Talk at Hainan University, College of Information Science and Technology (CIST), April 2017, .

[11] Weinrank, F., Grinnemo, K., Bozakov, Z., Brunstroem, A., Dreibholz, T., Hurtig, P., Khademi, N., and M. Tuexen, "A NEAT Way to Browse the Web", Proceedings of the ACM, IRTF and ISOC Applied Networking Research Workshop (ANRW) Pages 33-34, ISBN 978-1-4503-5108-9, DOI 10.1145/3106328.3106335, July 2017, .

[12] Fairhurst, G., Jones, T., Bozakov, Z., Brunstroem, A., Damjanović, D., Eckert, K., Grinnemo, K., Hansen, A., Khademi, N., Mangiante, S., McManus, P., Papastergiou, G., Ros, D., Tuexen, M., Vyncke, E., and M. Welzl, "NEAT Architecture", Number D1.1, December 2015, .

[13] Welzl, M., Brunstroem, A., Damjanović, D., Evensen, K., Eckert, T., Fairhurst, G., Khademi, N., Mangiante, S., Petlund, A., Ros, D., and M. Tuexen, "NEAT - First Version of Services and APIs", Number D1.2, March 2016, .

[14] Khademi, N., Bozakov, Z., Brunstroem, A., Damjanović, D., Evensen, K., Fairhurst, G., Grinnemo, K., Jones, T., Mangiante, S., Papastergiou, G., Ros, D., Tuexen, M., and M. Welzl, "NEAT - First Version of Low-Level Core Transport System", Number D2.1, March 2016, .

Dreibholz Expires May 3, 2018 [Page 35]

139 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Internet-Draft NEAT Sockets API October 2017

[15] Khademi, N., Bozakov, Z., Brunstroem, A., Dale, Oe., Damjanović, D., Evensen, K., Fairhurst, G., Grinnemo, K., Jones, T., Mangiante, S., Petlund, A., Ros, D., Stenberg, D., Tuexen, M., Weinrank, F., and M. Welzl, "NEAT - Core Transport System, with both Low-level and High-level Components", Number D2.2, March 2017, .

15.3. URIs

[2] https://github.com/NEAT-project/neat/tree/dreibh/neat-socketapi

[3] https://github.com/NEAT-project/neat/tree/dreibh/neat- socketapi/socketapi/examples

Author’s Address

Thomas Dreibholz Simula Research Laboratory, Network Systems Group Martin Linges vei 17 1364 Fornebu, Akershus Norway

Phone: +47-6782-8200 Fax: +47-6782-8201 Email: [email protected] URI: https://simula.no/people/dreibh

Dreibholz Expires May 3, 2018 [Page 36]

140 of 141 Project no. 644334 D3.3 Confidential Extended Transport System and Transparent Support of Non-NEAT Applications Rev. 1.0/ November 30, 2017

Disclaimer The views expressed in this document are solely those of the author(s). The European Com- mission is not responsible for any use that may be made of the information it contains. All information in this document is provided “as is”, and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability.

141 of 141 Project no. 644334