Master of Science Thesis

Empirical analysis of traffic to establish a flow termination time-out

Leipzig, November 2012 presented by Juan Molina Rodríguez Electronic Engineering Student

directed by Ralf Hoffman Ipoque GmbH

supervised by Josep Solé Pareta and Valentín Carela Español

Abstract

The inspection of contents of packets flowing on the Internet, also called Deep Packet Inspection (DPI), is the main technology used for traffic classification and anomaly searching due to its reliability and accuracy. During the last years, the evolution on the Internet has led to a deep incursion in many scenarios of DPI and several applications based on it.

The exponential increase in bandwidth on the Internet has made the DPI on-line mode a highly exigent task. This technology has the responsibility of facing large amounts of data in real time, which supposes a big challenge. To achieve this task, it is a must to optimize the process involved on it. This implies not only an efficient software usage but also to exploit the hardware elements. For that reason both the scientific and private community have become interested in recent years in optimizing this technology in several aspects (e.g. searching of patterns or specific hardware architectures).

Delving into that topic, it is important to consider the memory usage since it is not an unlimited resource. To properly carry out an analysis of the traffic, DPI uses several parameters which have to be stored while the connections or flows are alive. Thus, in order to improve this process, it is necessary to know what is the expected time-out for a flow to finish and therefore delete its related information from memory.

Hence, to achieve this purpose, this MSc Thesis is aimed to perform an empirical analysis of real Internet traffic. In order to obtain representative results two completely different traces have been analysed, one captured in the core of a big ISP network and the other in a mobile operator scenario, near the edge. It brings not only more reliability to the results, but also serves to characterize these two very different scenarios.

From that samples, a broad set of parameters have been found out. Although many of them are not directly related with the final target, they provide a comprehensive characterization of real traffic behaviour. Results like the proportion of traffic classified by groups, the RTT, the time between packets or the finalization statistics are exposed and briefly analysed, obtaining some interesting results from them. Although there are some studies covering specific issues exposed here, this work is, to the knowledge of the author, unique in the field of profiling the traffic by protocol groups.

Based on these results, and as a main purpose of this work, it has been exhaustively elaborated a time-out study considering the transport protocol (i.e. TCP and UDP) by protocol group and globally for all the traffic. From that results it has been proven over a commercial DPI tool (the Ipoque’s PACE engine) that their standard global time-out can be reduced up to three times (initially was set up at 600 seconds) without almost affecting the detection rate and effort, but reducing the memory requirements by 60%. This time-out can be even lower depending on the network characteristics. Moreover, it has also been evaluated the time- out for the subscriber information. It is not as critical as the flow one, but it is also worthy and coherent to optimize this value in order to achieve a better memory saving.

Altogether, it has the benefits of allowing more flows and subscribers to be studied, or requiring less memory blocks, which would imply power and cost saving. In addition, with the results obtained from this MSc Thesis further work could be developed in several fields, like network security or protocol design, for instance.

III Resumen del Proyecto

La inspección de contenidos que circulan por Internet, también conocida Deep Packet Inspection (DPI), es la principal tecnología utilizada para la clasificación de tráfico y búsqueda de anomalías, debido a su fiabilidad y precisión. Durante los últimos años, la evolución en Internet ha dado lugar a una profunda incursión de DPI y varias aplicaciones basadas en éste en muchos escenarios.

El aumento exponencial de ancho de banda en Internet ha hecho del análisis on-line una tarea muy exigente. Esta tecnología tiene la función de hacer frente a grandes cantidades de datos en tiempo real, lo cual supone un gran reto. Para lograr esta tarea, es necesario optimizar el proceso involucrado, lo cual implica no sólo un uso eficiente de software sino también aprovechar los elementos hardware. Por esta razón, tanto la comunidad científica como la privada, se han interesado en los últimos años en la optimización de éste campo en varios aspectos (e.g. búsqueda de patrones o arquitecturas de hardware específicas).

Indagando en este asunto, es importante tener en cuenta el uso de memoria, ya que no es un recurso ilimitado. Para llevar a cabo un correcto análisis del tráfico, DPI utiliza varios parámetros que deben ser almacenados mientras que las conexiones o flujos están activos. Por lo tanto, con el fin de mejorar este proceso, es necesario saber cuál es el tiempo esperado para que un flujo finalice y por lo tanto eliminar su información en memoria.

Por ello, este proyecto tiene como objetivo realizar un análisis empírico sobre tráfico real de Internet. A fin de obtener resultados representativos, han sido analizadas dos trazas completamente diferentes, una capturada en el núcleo de un gran ISP y la otra en el ámbito de un operador de móvil, cerca del borde de la red. Esto aporta más fiabilidad a los resultados y sirve para caracterizar estos dos escenarios.

Se han estudiado un amplio conjunto de parámetros. Aunque muchos de ellos no están directamente relacionados con el objetivo final, proporcionan una caracterización del comportamiento del tráfico. Resultados como la proporción de tráfico por grupos, los RTT, el tiempo entre paquetes o las estadísticas del modo de finalización se exponen y se analizan brevemente, obteniendo algunos resultados interesantes. Aunque hay algunos trabajos que abarcan temas específicos expuestos aquí, este trabajo es, para el conocimiento del autor, único en el campo de clasificar el tráfico por grupos de protocolos.

Con base en estos resultados, y como objetivo principal de este trabajo, se ha elaborado un exhaustivo estudio de time-outs, teniendo en cuenta el protocolo de transporte (e.g. TCP o UDP), por grupos de protocolo y global para todo el tráfico. Los resultados se han evaluado con una herramienta DPI comercial (PACE de Ipoque). Su time-out global se puede reducir hasta tres veces (inicialamente establecido en 600 segundos) sin verse casi alterada la detección, pero reduciendo los requisitos de memoria en un 60%. Este tiempo de espera puede ser incluso menor en función de las características de la red. Por otra parte, también se ha evaluado el time-out para la información de abonado. Éste no es un factor tan crítico como time-out de flujos, pero también es útil y coherente optimizarlo con el fin de lograr un mayor ahorro de memoria.

Con todo, se consigue permitir el estudio de más flujos y abonados, o que se requieran menos bloques de memoria, lo que implicaría un ahorro de potencia y costes. Además, con los resultados obtenidos en este trabajo podrían ahondarse otros aspectos, como la seguridad de la red o el diseño del protocolos, por ejemplo.

IV Resum del Projecte

La inspecció de continguts que circulen per Internet, també coneguda com Deep Packet Inspection (DPI), és la principal tecnologia utilitzada per a la classificació de trànsit i recerca d’anomalies, per la seva fiabilitat i precisió. Durant els últims anys, l’evolució a Internet ha donat lloc a una profunda incursió de DPI i diverses aplicacions basades en aquest en molts escenaris.

L’augment exponencial d’ample de banda a Internet ha fet l’anàlisi on-line una tasca molt exigent. Aquesta tecnologia té la funció de fer front a grans quantitats de dades en temps real, la qual cosa suposa un gran repte. Per aconseguir aquesta tasca, cal optimitzar el procés involucrat, la qual cosa implica no només un ús eficient de software sinó també aprofitar els elements hardware. Per això, tant la comunitat científica com la privada, s’han interessat en els últims anys en l’optimització d’aquest camp en diversos aspectes (e.g. recerca de patrons o arquitectures hardware específiques).

Indagant en aquest assumpte, és important tenir en compte l’ús de memòria, ja que no és un recurs il˚ulimitat. Per dur a terme una correcte anàlisi del trànsit, DPI utilitza diversos paràmetres que han de ser emmagatzemats mentre que les connexions o fluxos estan actius. Per tant, per tal de millorar aquest procés, cal saber quin és el temps esperat perquè un flux finalitzi i per tant eliminar la seva informació en memòria.

Per això, aquest projecte té com a objectiu realitzar un anàlisi empíric sobre trànsit real d’Internet. Per tal d’obtenir resultats representatius, han estat analitzades dues traces completament diferents, una capturada en el nucli d’un gran ISP i l’altra en l’àmbit d’un operador de mòbil, prop de la vora de la xarxa. Això aporta més fiabilitat als resultats i serveix per caracteritzar aquests dos escenaris.

S’han estudiat un ampli conjunt de paràmetres. Encara que molts d’ells no estan directament relacionats amb l’objectiu final, proporcionen una caracterització del comportament del trànsit. Resultats com la proporció de trànsit per grups, els acrshort rtt, el temps entre paquets o les estadístiques de la manera de finalització s’exposen i s’analitzen breument, obtenint alguns resultats interessants. Encara que hi ha alguns treballs que abasten temes específics exposats aquí, aquest treball és, per al coneixement de l’autor, únic al camp de classificar el trànsit per grups de protocols.

Amb base a aquests resultats, i com a objectiu principal d’aquest treball, s’ha elaborat un exhaustiu estudi de time-outs, tenint en compte el protocol de transport (e.g. TCP o UDP), per grups de protocol i global per a tot el trànsit. Els resultats s’han avaluat amb una eina DPI comercial (PACE de Ipoque). El seu time-out global es pot reduir fins a tres vegades (inicialament establert en 600 segons) sense veure gairebé alterada la detecció, però reduint els requisits de memòria en un 60%. Aquest time-out pot ser fins i tot menor en funció de les característiques de la xarxa. D’altra banda, també s’ha avaluat el time-out per la informació d’abonat. Aquest no és un factor tan crític com time-out de fluxos, però també és útil i coherent optimitzar-ho per tal d’aconseguir un major estalvi de memòria.

Amb tot, s’aconsegueix permetre l’estudi de més fluxos i abonats, o que es requereixin menys blocs de memòria, el que implicaria un estalvi de potència i costos. A més, amb els resultats obtinguts en aquest treball es podria aprofundir en altres aspectes, com la seguretat de la xarxa o el disseny del protocols, per exemple.

V Acknowledgements

I want to show my gratitude to Ipoque generally, for giving me the opportunity to form a part of this great company. Specifically to Klaus Degner for selecting me to perform here my MSc Thesis, to Ralf Hoffmann for guiding me and helping me with my work, and to all those colleagues who made me feel like at home.

I also want to acknowledge Josep Solé Pareta and Valentín Carela Español for their implication, support, advise and help.

For his selfless help with the appearance and spelling of this document I also want to thank Enric López Jara.

Lastly, on a personal level, I have to thank my couple Cristina for giving me all her support and for waiting for me all the time I have been away.

VI Contents

1 Introduction 1

2 Basics 4 2.1 Transmission Control Protocol ...... 4 2.1.1 Connection process ...... 4 2.1.2 Timing parameters ...... 6 2.2 Statistical parameters ...... 8 2.2.1 Arithmetic average ...... 8 2.2.2 Standard deviation ...... 9 2.3 Probability distributions ...... 11 2.3.1 Gaussian distribution ...... 11 2.3.2 Gamma and exponential distributions ...... 13 2.4 Protocol and Application Classification Engine (PACE) ...... 14

3 State of the art 16

4 Design and implementation 18 4.1 Framework: PACE and PAT ...... 18 4.2 Application Module TCP Flow: TCP ...... 21 4.2.1 Flow started ...... 21 4.2.2 Process packet ...... 21 4.2.3 Flow ended ...... 26 4.3 Post-processing ...... 27 4.3.1 Final values ...... 27 4.3.2 Interpretation and representation ...... 33 4.4 Application Module TCP Flow: UDP ...... 34 4.4.1 Flow started ...... 34 4.4.2 Process packet ...... 35 4.4.3 Flow ended ...... 35

5 Results 38 5.1 Used traces: ISP_Core and ISP_Mob ...... 38 5.2 Generic view ...... 40 5.2.1 Flow usage ...... 40

VII Contents

5.2.2 Generic data ...... 42 5.2.2.1 ISP_Core ...... 46 5.2.2.2 ISP_Mob ...... 47 5.3 Initialization ...... 49 5.3.1 RTT times ...... 49 5.3.1.1 ISP_Core ...... 51 5.3.1.2 ISP_Mob ...... 51 5.3.2 Protocol response ...... 52 5.3.2.1 ISP_Core ...... 54 5.3.2.2 ISP_Mob ...... 54 5.4 Data transfer ...... 56 5.4.1 RTT times ...... 56 5.4.1.1 ISP_Core ...... 58 5.4.1.2 ISP_Mob ...... 59 5.4.2 Times between packets ...... 60 5.4.2.1 ISP_Core ...... 60 5.4.2.2 ISP_Mob ...... 65 5.5 Termination ...... 67 5.5.1 Standard termination times ...... 68 5.5.1.1 ISP_Core ...... 68 5.5.1.2 ISP_Mob ...... 71 5.5.2 Reset termination times ...... 73 5.5.2.1 ISP_Core ...... 74 5.5.2.2 ISP_Mob ...... 74

6 Set up and evaluation of time-outs 75 6.1 Flow tracking table time-outs ...... 75 6.1.1 TCP time-outs ...... 76 6.1.2 UDP time-outs ...... 82 6.1.3 Time-outs evaluation ...... 84 6.1.3.1 Single time-out ...... 84 6.1.3.2 Theoretical profiled time-outs ...... 87 6.2 Identity tracking table time-out ...... 88

7 Summary and conclusions 91

Acronyms 93

VIII List of Figures

2.1 TCP connection process ...... 5 2.2 TCP timing parameters ...... 7 2.3 Gaussian Distribution ...... 12 2.4 Gamma Distribution ...... 13

5.1 Capture point behaviour ...... 49 5.2 Protocol response ...... 52 5.3 Protocol response comparison ...... 54 5.4 HTTP behaviour ...... 64 5.5 Blocking scenarios RST behaviour ...... 67 5.6 Flow termination proportions ...... 70

6.1 PCK-PCK after RST pocess CDF ...... 79 6.2 PCK-PCK for UDP flows CDF ...... 83 6.3 Flow tracking table time-out variation impact core network ...... 85 6.4 Flow tracking table time-out variation impact mobile network ...... 85 6.5 Flow tracking table time-out variation impact mobile network extended ...... 86 6.6 Protocol group time-out variation impact core network ...... 88 6.7 Identity tracking table time-out variation impact core network ...... 89 6.8 Identity tracking table time-out variation impact mobile network ...... 89

IX List of Tables

5.1 Traces properties ...... 39 5.2 Flow usage ...... 40 5.3 Maximum number of flows in system ...... 41 5.4 General TCP ISP_Core ...... 42 5.5 General TCP ISP_Mob ...... 43 5.6 Protocol groups proportions TCP ISP_Core ...... 43 5.7 Protocol groups proportions TCP ISP_Mob ...... 44 5.8 General UDP ISP_Core ...... 44 5.9 General UDP ISP_Mob ...... 44 5.10 Protocol groups proportions UDP ISP_Core ...... 45 5.11 Protocol groups proportions UDP ISP_Mob ...... 45 5.12 Initial times ISP_Core (ms) ...... 50 5.13 Initial times ISP_Mob (ms) ...... 50 5.14 Protocol response ISP_Core (ms) ...... 53 5.15 Protocol response ISP_Mob (ms) ...... 53 5.16 -ACK times TCP ISP_Core (ms) ...... 56 5.17 PCK-ACK times TCP ISP_Core (ms) ...... 57 5.18 DAT-ACK times TCP ISP_Mob (ms) ...... 57 5.19 PCK-ACK times TCP ISP_Mob (ms) ...... 58 5.20 DAT-DAT times TCP ISP_Core (ms) ...... 60 5.21 PCK-PCK times TCP ISP_Core (ms) ...... 61 5.22 PCK-PCK times UDP ISP_Core (ms) ...... 61 5.23 DAT-DAT times TCP ISP_Mob (ms) ...... 62 5.24 PCK-PCK times TCP ISP_Mob (ms) ...... 62 5.25 PCK-PCK times UDP ISP_Mob (ms) ...... 63 5.26 Termination mode proportions ISP_Core ...... 68 5.27 Termination mode proportions ISP_Mob ...... 69 5.28 FIN-ACK times ISP_Core (ms) ...... 69 5.29 FIN-ACK times ISP_Mob (ms) ...... 71 5.30 RST times ISP_Core (ms) ...... 73 5.31 RST times ISP_Mob (ms) ...... 74

6.1 Time-out TCP ISP_Core (ms) ...... 77 6.2 Time-out TCP ISP_Mob (ms) ...... 77 6.3 PCK-RST time-out ISP_Core (ms) ...... 78

X List of Tables

6.4 PCK-RST time-out ISP_Mob (ms) ...... 79 6.5 Time-out TCP (ms) ...... 80 6.6 Time-out UDP ISP_Core (ms) ...... 82 6.7 Time-out UDP ISP_Mob (ms) ...... 82 6.8 Time-out UDP (ms) ...... 83

XI 1 Introduction

Deep Packet Inspection (DPI) has become in the last half decade an important technology for modern networks. It consists on a set of techniques focused to achieve a fine-grained identification of the traffic flowing through a network. Theses techniques can be very simple for open standard protocols, searching specific strings or simple patterns. However, they can turn into very sophisticated ones when dealing with encrypted or obfuscated data, since the needs for identifying these types of traffic are based on empirical or heuristic analysis and commonly require to handle several parameters.

The use of this technology is applied in many different scenarios, from small office networks to huge Internet Service Providers (ISPs). Also, for many different purposes, from network surveillance to traffic management. The biggest challenge faced by these applications is normally related to their requirements, since they are intended to face with real time traffic and often with big relative amounts of data. In current networks (e.g. 10/40/100 Gbps) the packets are received every few µs, and that is the time available for parsing them without requiring data buffering. In addition, some flow1 and user information has to be stored in memory in order to make the system work.

To achieve these challenges some issues have to be considered. Besides of employing an efficient software development, a main point is to dispose of optimal hardware elements focused on this purpose. Despite this, the handling of this hardware has to be optimized to achieve the best performance. One major aspect concerning this topic is the usage of memory elements since they have a limited capacity and normally their proceeding requires several Central Processing Unit (CPU) cycles, meaning longer operation times.

Based on this idea this Master of Science (MSc) Thesis has been proposed by Ipoque, a leading company in Europe providing DPI solutions. The basis for this study is focused on the observation of real Internet traffic behaviour, and concretely, in the characterization and establishment of flow termination time-outs2. It is performed by finding out statistics of timing parameters. From these results, it has been achieved an optimization of the Ipoque’s basic DPI library, the Protocol and Application Classification Engine (PACE). This software tool is a key component of the company’s product catalogue.

On this basis, some values not directly related with the final target but with the protocols and network behaviour are studied as well. This information is useful to have a reference of the times required in real scenarios and can be applied to many fields. For example, the improvement of some protocol implementations, the prevention of attacks like Denial of Service (DoS) or the searching of anomalous phenomena. Hence, this document aims to offer also an extensive but easy to read overview of the network and protocols behaviour.

1Stream of data between a specific (IP address and port number) origin and destination 2Time elapsed before a flow is considered as finished

1 The characterization of Transmission Control Protocol (TCP) traffic is more comprehensive because of the intrinsic information that can be extracted of its way of proceeding. To that effect, a view of several parameters is exposed concerning the establishment, transfer and finalization of flows, among others. To the knowledge of the author there is no highly similar work done before. In addition, some parameters of User Data Protocol (UDP) flows are studied as well in order to get a comprehensive characterization and interpretation of the results. Determining the UDP behaviour is also helpful to find out the tracking time-out, besides of providing as well useful information.

Then, returning to the memory optimization, it has to be considered that for being able to make an efficient usage of the memory resources it is important to know the behaviour of the connections tracked. PACE requires to access stored flow related information in order to proceed in the detection of protocols. For that purpose it is used a tracking hash table which uses some characteristic parameters from the packets, known as 5-tuple3. This information must be kept while the flows are active. Then, the question to take into consideration is when a flow is finished. UDP procedure is connectionless, which means that it does not include any signalling concerning this issue. On the other hand, TCP flows show a handshake process to finalize connections, but even having this information it is difficult to determine when a flow is really finished. There are situations where only one direction of the flow is seen, or even if both directions are tracked, the fact of seeing this handshake process in a middle point of a network does not guarantee that the extremes are receiving the corresponding signal.

Therefore, the way of proceeding is such that when a flow has remained inactive for a specified time-out it is considered as finished. Then, it is released the memory needed to track the flow information. In addition, there is an option to restrict the memory occupied by this data instead of waiting for an interval of inactivity, dropping the older packets first. This can prevent memory saturation but also affects the detection rate since it may store the flow information less time than the required. Anyway, in both cases some flows which are really finished would remain as active in the system while perhaps others can be dropped although they are not finished yet.

The current time established by PACE to drop a flow is 600 seconds (10 minutes). The selection of this time-out is done without a solid basis. It is just a time considered enough for not missing any packet in a flow. This procedure affects all the protocols equally while they do not behave in this way, making an inefficient use of the resources. As an outstanding finding is proved that, in a general situation, the time-out can be reduced up to three times without affecting notoriously the detection rate and reducing the memory requirements by 60%. It can be even lower depending on the network characteristics. For instance, it is found that in a network with only Hypertext Transfer Protocol (HTTP) traffic could be decreased up to 50 seconds.

Finally, it is also evaluated what is the impact of varying the time-out for the identity tracking table, which handles the information of each user. This is not as critical as the flow table since the number of elements is considerably lower. However, it is consistent to study this parameter as well since is related with the flow tracking table. The initial time-out set up for this parameter is also 600 seconds, and it is concluded that it can be drastically decreased.

3{src IP, dst IP, src PORT, dst PORT, L4 protocol}

2 The document is organized as follows. Chapter 2 introduces some basic concepts that may be necessary to assimilate some points of the rest of the document. Chapter 3 reviews and cites some related work. In chapter 4 it is explained the process followed to extract the values and the main implementation pieces are shown, explaining the problems faced and the differences of proceeding over TCP or UDP traffic. Chapter 5 is where the results are exposed and discussed, and in chapter 6 the time-out values are established and evaluated. Finally, chapter 7 contains a summary and the conclusions.

3 2 Basics

Although this study is oriented to an empirical analysis, as a starting point is contributive to set a theoretical basis. Hence, this chapter is aimed to give a brief introduction to some concepts that will be necessary to understand some issues studied in this MSc Thesis. The concepts shown are the most relevant to have a basic understanding of the subject. It is intended to give a conceptual idea rather than going into details which are not needed in order to understand and follow the document.

2.1 Transmission Control Protocol

TCP represents together with Internet Protocol (IP) the basis of the Internet paradigm, called TCP/IP. Its functions are related to the fourth level of the Open System Interconnection (OSI) stack, known as transport level, and that implies it is the responsible of the connections in a extreme to extreme way. In contrast with UDP, which represents the other main transport level protocol of the Internet, TCP includes a set of complex mechanisms focused on having an accurate control on the transmission parameters. This way of procedure is described as connexion oriented. This implies having a better reliability in detriment of latency performance.

2.1.1 Connection process

For the current study is interesting to briefly review which are the steps involved in the TCP process. For a more comprehensive understanding review documentation as Request For Comments (RFC) [1]. Basically, the connection process in a TCP flow goes through three phases.

• Initialization: This process is usually known as 3 way-handshake. The side which wants to start a connection sends a Synchronize (SYN) message. If the other part accepts the connection this message is responded by a Synchronize/Acknowledgement (SYNACK). When the starting extreme receives the SYNACK confirmation it considers the connection as started and then sends an Acknowledgement (ACK) message to the other side, which considers also the connection established when this is received.

• Data transfer: Once the connection is established the transmission of data can be started. During this interval, in fact also for the initialization and termination process, a monitoring process is performed. Each TCP datagram includes a field with a sequence number and an ACK. The first is an identification number while the second indicates the last set of data which has been confirmed so far from the other side (by means of the sequence number). Although this procedure is not relevant itself for the purpose

4 2.1 Transmission Control Protocol

of this study, it represents the basis for the characteristic TCP control mechanisms as the congestion control and flow control.

• Termination: The termination of a TCP flow is usually caused by a 4 way-handshake, similar to the employed in the initialization. The side which is finished with the connection sends a Finalization (FIN) message to the other extreme. When that one responses with an ACK the first side considers the connection closed (it can still receive data but is not sending any more). In fact, at this point the connection would be "half-closed". The process is completely finished when from the other side the same steps are completed. In some situations this 4 way-handshake can be done in only three steps, if the ACK to the first FIN contains also the second FIN message. It can be also possible a situation where a connection is closed because of the sending of a Reset (RST) message, which in a normal situation aims to restart the connection due to the occurrence of an error.

Figure 2.1 – TCP connection process The TCP connection processes: initialization (red), data transfer (green) and termination (yellow)

5 2.1 Transmission Control Protocol

2.1.2 Timing parameters

The procedures explained above have a variable duration depending on the network and the kind of service, protocol or application it carries. The characterization of these times is the basis of this study. For the TCP flows the times studied are conceptually shown (see figure 2.2) and described below:

• Synchronize-Synchronize/Acknowledgement (SYN-SYNACK): Time between the initial SYN and its response SYNACK.

• Synchronize/Acknowledgement-Acknowledgement (SYNACK-ACK): Elapsed time between the SYNACK and its ACK.

• Synchronize-Data (SYN-DAT): Time between the initial SYN and the first data packet seen, no matter the direction.

• Finalization-Acknowledgement (FIN-ACK): Time between a FIN and its ACK.

• Reset-End (RST-END): Time between a RST is sent and the last packet seen in the flow.

This first group of parameters belong to the initialization and termination processes of the TCP flows. Note that they appear at most once per flow (with the exception of the FIN-ACK, which can appear twice), so from on they will be cited as single values. In addition to these parameters the next ones related with the data transfer interval are relevant too (they will be cited as combined values):

• Packet-Acknowledgement (PCK-ACK): Time between any packet sent and its ACK.

• Data-Acknowledgement (DAT-ACK): Time between any data packet sent and its ACK.

• Packet-Packet (PCK-PCK): Time between two consecutive packets.

• Data-Data (DAT-DAT): Time between two consecutive data packets.

The difference between data packets or just packets is small. In the first case refers to those which are carrying application data (e.g. HTTP, Skype, etc.). On the other hand, the second term refers to the complete set of packets, including those carrying just TCP messages (e.g. ACK, FIN, etc.). This distinction will not be applied for UDP traffic since it does not include connection messages.

For all those fields in the second group and also for the parameter defined as FIN-ACK, the correspondent times are considered in three different ways. These are associated to the to (C2S) direction, the Server to Client (S2C) direction and the combined global value. At this point is appropriate to make the next consideration.

Definition 2.1 (Direction setting TCP) The C2S direction will be from now on considered as the starting side of the connection. In other words, the side who sends the SYN message. This procedure is associated to the standard model of the Internet based on client and server structure, where in a normal situation the client queries data to the server.

6 2.1 Transmission Control Protocol

Figure 2.2 – TCP timing parameters Examples of the defined timing parameters

7 2.2 Statistical parameters

2.2 Statistical parameters

Before going into the implementation and design of the software employed for the study, it is relevant to explain what are the statistical parameters to be analysed and understand their formulation. That is because it affects the software design and performance, as it is shown further on. Concretely, two well-known statistical values are calculated.

2.2.1 Arithmetic average

The average is employed to give an idea of what is the middle value in a set of data. It represents a first order statistic and its basic formulation is as follows.

Formula 2.1 (Average standard) N 1 X x¯ = x (2.1) N i i=1

However, due to computational reasons this definition does not result useful sometimes. For example, in systems where the buffer capacity is limited. When coding this operation as it appears in the formula above

(see formula 2.1) the values xi must be stored before performing the calculation. If the volume of data is relatively high, the the Random Access Memory (RAM) capacity of a system can be saturated. To avoid this situation, another algorithm can be used (see formula 2.2). This is known as on-line algorithm due to its capability to update the average with each new sample (first formulation) or set of samples (second formulation). The drawback using this method is that the computational time needed is higher.

Formula 2.2 (Average on-line) (n − 1)¯x + x x¯ = n−1 n (2.2) n n

(n − m)¯x +x ¯ x¯ = n−m m..n , n = n0 + m (2.3) n n

8 2.2 Statistical parameters

2.2.2 Standard deviation

The Standard Deviation (STD) is a second order statistic and shows how much variation has the average in a sample set. Together with the average is a good indicator to determine the statistics of a dataset. The formulations following are the most known (see formula 2.3).

Formula 2.3 (STD standard) PN (x − x¯)2 s2 = i=1 i (2.4) N − 1

PN x2 − (PN x )2/N s2 = i=1 i i=1 i (2.5) N − 1

However, if these formulas are coded as they appear above (see formula 2.3) they are limited to process sample by sample the data, besides of the fact it has to be done when the whole set of data is available (producing memory issues again as for the average calculation) or recalculating the average for each new sample. It will be justified subsequently that sometimes it is useful to be able to update the STD having not a sample but the average and the STD of a set of samples. Then, it is possible to drift the formulation to the next one (see formula 2.4), where is not needed the value of a single sample but the average and STD or a set of samples.

Formula 2.4 (STD dataset) PN n(¯x2 + s2 n−1 ) − x¯2 N s2 = n n n n N (2.6) N − 1

Moreover, it is possible to use it in an on-line way (see formula 2.5). However, it can result in a high computational performance since it has to recalculate as well the average for every set of samples (see formula 2.2).

Formula 2.5 (STD dataset on-line)

m(¯x2 + s2 m−1 ) − x¯2 n s2 = m..n m..n m n , n = n0 + m (2.7) n n − 1

Therefore, another solution for an on-line algorithm is provided (see formula 2.6). Nonetheless, this method only allows a sample by sample calculation (i.e. it is not possible to employ the average and STD of sub- datasets).

9 2.2 Statistical parameters

Formula 2.6 (STD on-line) M s2 = 2,n (2.8) n n − 1

n X 2 M2,n = (xi − x¯n) = M2,n−1 + (xn − x¯n)(xn − x¯n−1) (2.9) i=1

10 2.3 Probability distributions

2.3 Probability distributions

Knowing the statistical time response of the network and protocols is essential for a depth analysis and an appropriate understanding of the dataset properties. Therefore, the related distributions to these items are used in some cases.

But firstly, it is necessary to do a review of the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) concepts, since these are the basic tools to easily understand the matter. It is not intended nor necessary to delve the formulation (it is shown as a reference), but to take a conceptual overview.

Definition 2.2 (Probability Density Function) The PDF is a function f(x) that describes how is distributed the probability of a random event to occur in a determined interval. The area of this function is equal to one. Z b P[a ≤ X ≤ b] = f(x) dx (2.10) a

Definition 2.3 (Cumulative Distribution Function) The CDF function F (x) describes what is the probability of a random event to have occurred in a determined instant. It can be formulated as the integral of the PDF. Its maximum value is one, corresponding to the highest probability for the event to have occurred. Z x F (x) = f(t) dt (2.11) −∞

Both the PDF and the CDF can be approximated by the histogram and cumulative histogram of a discrete distribution. The histogram simply expresses the frequency of apparition of a sample in a specified interval. The cumulative histogram is similar but adds for every interval the frequency of the intervals before. This procedure is equivalent to an integration operation in a discrete domain. Obtaining the approximate CDF from the cumulative histogram is relatively easy. Each interval has to be divided by the maximum of the distribution. However, getting the PDF results more complicated since it has to be adjusted so that the area of the function is one.

2.3.1 Gaussian distribution

A good example to illustrate these two functions is the introduction of the gaussian distribution. This distribution is well known due to its appearance in many natural processes. Its PDF is formulated as follows (see formula 2.7).

11 2.3 Probability distributions

Formula 2.7 (Gaussian PDF)

2 1 − 1 x−µ 2 2 n − 1 f(x) = √ e 2 ( σ ) , µ =x, ¯ σ = s (2.12) σ 2π n

More interesting than the formulation is the performance of its PDF, which is commonly refereed as "The Bell Curve", and its CDF (see figure 2.3). Two different realizations are plotted to give a better idea of the relationship with his statistical parameters, concretely with the STD. It would be easy to see how the variation of the average (µ) affects to the curves.

Figure 2.3 – Gaussian Distribution

12 2.3 Probability distributions

2.3.2 Gamma and exponential distributions

The gamma distribution together with the exponential distribution, which can be seen as a simplification of the first one, represent a widely employed tool for the modulation of waiting times. In fact, the exponential distribution is the basis in the field of Telematics known as "Queuing Theory". The most extended use of the exponential distribution is due to the simplicity reasons in contrast with the gamma distribution. Nevertheless, for the object of this document what must be highlighted is its capability to model both the network and server responses [2].

Formula 2.8 (Gamma PDF) 1 k − 1 − x f(x) = x e θ (2.13) Γ(k)θk

Γ(k) = (k − 1)! (2.14)

µ2 µ σ n − 1 k = , θ = = , µ =x, ¯ σ2 = s2 (2.15) σ k µ n

In the same fashion done for the gaussian distribution, its formulation and appearance is shown (see figure 2.4). In this case one of the realizations shows the case where the gamma distribution become an exponential one, this occurs specifically when the parameter k = 1 (see formula 2.8).

Figure 2.4 – Gamma Distribution

13 2.4 Protocol and Application Classification Engine (PACE)

2.4 Protocol and Application Classification Engine (PACE)

As the basis of this work is built over PACE it is consistent to briefly describe it. This software library employs pattern matching and and behavioural, heuristic and statistical analysis in real-time. It consists of fully configurable DPI software and has been optimized for performance and classification reliability. It is highly flexible and can be integrated in any existing platform such as firewalls, network security and policy management appliances, and lawful interception systems.

It is entirely developed in the C programming language and runs in any , Mac, Solaris and Windows (OS). Among other interesting characteristics, it is remarkable that it can handle throughputs up to 100 Gbps and faster in Symmetric Multi-Processing (SMP) environments. It is optimized to work in multi-core systems taking profit of the multi-threading processing advantages. A further description of it properties can be found in [3], and later on some technical aspects are described (see Framework: PACE and PAT, section 4.1 on page 18).

PACE has the capability to recognise more than two hundred single protocols (e.g. HTTP, SIP, etc.) and one thousand applications (e.g. , Twitter, etc.), besides of the classification of them into sub-protocols. Handling this large quantity of protocols in a separate way is not practical or useful for this study. Many protocols behave in a similar way, and some others are not relevant enough to study them alone. For that purpose the extracted parameters are based on the classification following considering groups of protocols. A further classification can be found in [4].

• generic: unknown, undetected, IPSEC_UDP

• p2p: , eDonkey, /Fasttrack, , WinMX, DirectConnect, AppleJuice, , Ares, , Mute, , Manolito, iMesh, , , Thunder/Webthunder, OFF, OpenFT, StealthNet, Aimini, ANtsP2P, Mojo

• gaming: Steam, Halflife2, XBOX, World of Warcraft, Quake, Second Life, Battlefield, HTTP Application QQGame, WARCRAFT3, ClubPenguin, Florensia, MapleStory, PS3, Dofus, WII, World of Kung Fu, Fiesta, SplashFighter, CrossFire, Guildwars, Armagetron, rFactor, GameKit

• tunnel: SSL, OpenVPN, IPSEC, GRE, VTUN, SoftEthernet, , HamachiVPN, IP in IP, GTP, L2TP, ISAKMP, PPP, PPTP, YourFreedom Tunnel, NetMotion, Teredo, , ComodoUnite, SSTP, JAP

• business: Citrix, SAP, HTTP Application Activesync, UltraBac, JBK3000, AdobeConnect, Lync

• voip: Skype, SIP, H323, IAX, MGCP, STUN, Iskoot, Fring, Generic Voice, VoipSwitch VOIP Tunnel, Truphone, TeamSpeak, Tunnelvoice, , Tango, Scydo, FiCall, Goober, Ventrilo, MyPeople

• im: XDCC, MSN, Yahoo, Oscar, IRC, Jabber, , QQ, POPO, Paltalk, HTTP application Google Talk, MEEBO, JABBER Application NIMBUZZ, IMplus, WhatsApp, IMO, eBuddy, MSRP

• streaming: ORB, Slingbox, Vcast, RTSP, FLASH, MMS, MPEG, QUICKTIME, WINDOWSMEDIA, REALMEDIA, TVANTS, SOPCAST, TVUPLAYER, PPSTREAM, PPLIVE,

14 2.4 Protocol and Application Classification Engine (PACE)

ZATTOO, QQLive, Feidian, UUSEE, RTP, AVI, OGG, Move, Kontiki, , Shoutcast, HTTP Application VeohTV, Pandora, RTCP, IPTV, , Octoshape, RealDataTransport, , iPlayer, NetFlix

• mobile: MultimediaMessaging, Blackberry, OperaMini, BOLT

• remote_control: SSH, VNC, Telnet, Skinny, PCAnywhere, RDP, XDMCP, TeamViewer

• mail: POP, SMTP, IMAP, Mapi

• network_management: DNS, ICMP, DHCP, IGMP, EGP, SCTP, NTP, SSDP, OSPF, BGP, SNMP, Kerberos, RADIUS, Syslog, NETBIOS, MulticastDNS, IPP, LDAP, ICMPv6, DHCPv6, LDP

• database: PostgreSQL, MySQL, TDS, msSQL

• filetransfer: DirectDownloadLink, FTP, SMB/CIFS, NFS, TFTP, AFP, WebDAV

• web: HTTP, WebSocket, Wuala

• conference: ooVoo, Webex, CitrixGoTo

15 3 State of the art

As the trigger of this MSc Thesis is to establish a parameter of a particular commercial product it is not found any highly correlated work. That is an interesting point because it makes it somehow exclusive, but it turns into a drawback since there is not a comparative background. More generally, the statistic behaviour followed by the different protocols and groups of them is a field that neither has been studied in depth. This fact is particularly noticeable when trying to find related work concerning studies about time between packets.

However, some works are somehow related to this work topic, but in a simplified way and considering just a few parameters. Most of them are focused in the TCP performance. Basically, the object of these studies regard on the network and servers response but not in the protocol behaviour itself.

A widely studied parameter for this finality is the Round-Trip Time (RTT) in TCP connections as in [5, 6, 7, 8, 9]. For example, in [5] it is tracked the SYNACK-ACK of a web server and used to study the RTT trends and its relationship with the TCP throughput. The main findings are the relationship with the times of the day and the high variation in the RTT values. This last point is also concluded in [6], where the traces under consideration are collected in an ISP link. The work exposed in [7] and more recently in [8] is directly focused to the performance of TCP’s congestion control and throughput. Further research on this topic can be accessed in [9].

Another issue commonly studied regarding TCP traffic is the operation of DoS attacks. In this sense there are some tools known as Intrusion Detection System (IDS), like Bro4 or Snort5, focused to prevent this threats. For that purpose there are many methods available. The main one is a statistical analysis but there are some others. As instance in [10] it is proposed a method to avoid SYN flooding push by controlling the number of SYN and FIN or RST packets. A more recent version based the same principle can be found in [11].

In any case, these studies are not directly related to the objective of this work and just a small portion of thematic with it. It is not intended to provide a characterization of the RTT parameter but to have an approximation of what are the common values seen for the times between packets and their correspondent ACK.

Closer to this study is the work done in [12], where it is highlighted the lack of references in this field. This job is intended to find out what is the influence of different kinds of traffic on the Internet. Notwithstanding, the classification only covers Peer to Peer (P2P) and HTTP traffic. They are compared in terms of volume of data and more interestingly in the connection establishment and finalization behaviour. The similarities

4http://www.bro-ids.org/ 5http://www.snort.org/

16 with the results are exposed in the appropriate section (see Results, chapter 5 on page 38). The rest of the document is focused on the differences in traffic along different time periods. The traffic traces used were recorded during the 2006.

Other works somehow related with the findings and objectives of this MSc Thesis are either outdated or just a few similarities, so they are not included since they not added any significant contribution. The only exception in this sense is the inclusion of [13]. Although this work is totally outdated (October 1995) it establishes a single time-out which can be compared to this MSc Thesis findings, besides of presenting a methodology for profiling Internet traffic flows.

17 4 Design and implementation

This chapter describes the main aspects of the software developed and the process followed to perform the study. As a first step, the basic tool employed and the framework are described. Later, the software design is shown in its most important parts. Also are exposed the challenges faced during the analysis processing.

4.1 Framework: PACE and PAT

The development of the main software tool used for this study is done over PACE (see Protocol and Application Classification Engine (PACE), section 2.4 on page 14). More concretely, the basis for the main program has been built over the PACE Analysing Tool (PAT). It is an internal tool intended to handle and analyse many parameters extracted from PACE, simplifying the interaction with it. It also makes available some additional features, like allowing different input and output formats or loading a configuration file, among others. Its structure is module based making easier the development of new functions without being necessary to modify any existing code. It employs a set of callbacks which react to some events. Those used for this project are listed bellow.

• configure: Called to set up the module.

• destroy: Called to disable the module, intended to free the used resources, etc.

• finalize: Called before disable the module, where the last operations must be done.

• process_packet: Called when each packet enters the system.

• flow_started: Called when a new flow starts.

• flow_ended: Called when a flow is ended.

To add a new module is just needed to register this one following the modular structure. The convention used for this purpose specifies what should be the name of each module, being the one developed called AM_TCPFLOW and then the main file am_tcpflow.c.

A relevant and useful point of using PAT as a basis is the ease to access all the information extracted by PACE. These parameters are passed to the callbacks by means of the pat_packet struct (see illustration 4.1). It contains the structs pat_flow_user_data (line 18), pat_id_user_data (line 8) and pat_ip_address (line 1). As it can be seen, a wide set of parameters is provided concerning the information of the packet, the flow and the subscriber which it belongs. Directly related with the

18 4.1 Framework: PACE and PAT

Illustration 4.1 (Structs PAT) struct pat_ip_address { 1 u32 is_ip_v6; 2 union { 3 u32 ipv4; 4 struct in6_addr ipv6; 5 } ip; 6 }; 7 8 struct pat_id_user_data { 9 struct pat_ip_address ip_addr; 10 u64 id_packets_up; 11 u64 id_packets_down; 12 u64 id_bytes_up; 13 u64 id_bytes_down; 14 void *am_data[PAT_MAX_ACTIVE_ANALYZING_MODULES]; 15 }; 16 17 struct pat_flow_user_data { 18 struct ipoque_flow_struct* flow_struct; 19 struct pat_ip_address src_ip; 20 struct pat_ip_address dst_ip; 21 u16 src_port; 22 u16 dst_port; 23 u8 l4_protocol; 24 u16 flow_protocol; 25 u16 flow_subprotocol; 26 u64 flow_packets; 27 u64 flow_packets_with_payload; 28 u64 flow_bytes; 29 u64 flow_packets_up; 30 u64 flow_packets_down; 31 u64 flow_bytes_up; 32 u64 flow_bytes_down; 33 u64 flow_number; 34 void *am_data[PAT_MAX_ACTIVE_ANALYZING_MODULES]; 35 }; 36 37 struct pat_packet { 38 u64 packet_number; 39 struct timeval ts; 40 u32 len; 41 u32 ether_len; 42 struct ether_header *ether; 43 u32 ip_len; 44 struct iphdr *ip; 45 struct ip6_hdr *ip6; 46 u8 *l4_ptr; 47 u32 l4_len; 48 u8 l4_protocol; 49 struct tcphdr *tcp; 50 struct udphdr *udp; 51 u32 l7_len; 52 u8 interface; // 0 for unknown, 1 or 2 when interface is known 53 u8 dir; // 0 or 1 54 u8 slowpath_used; // 0 or 1 55 56 u16 detected_protocol; 57 u16 detected_subprotocol; 58 struct pat_id_user_data *src; 59 struct pat_id_user_data *dst; 60 struct pat_flow_user_data *flow; 61 62 u8 data[0]; 63 }; 64 Set of structs employed by PAT containing information about the packet, the flow where it belongs and its subscriber

19 4.1 Framework: PACE and PAT purpose of this work is interesting to highlight the ts variable (line 40), which is set up with the timestamp of each packet. It is also worthy to comment the usage of the struct tcphdr (line 50) since it contains the flags of the TCP flows (e.g. ACK, FIN, etc.) which allow to identify the packets. Also it is useful to be able to directly access some parameters like the flow_packets (line 27), the flow_packets_with_payload (line 28) and the flow_bytes (line 29).

20 4.2 Application Module TCP Flow: TCP

4.2 Application Module TCP Flow: TCP

The am_tcpflow is the main development created to get the extraction of the timing values (besides of some other parameters). This section shows only the fundamental functions of its implementation since the whole code is not included. As it is based on the PAT framework it employs the information provided by pat_packet struct and the program operation is based on the events explained above (see section 4.1).

4.2.1 Flow started

When a flow starts it is called the function am_tcpflow_flow_started. If the flow is detected as TCP then the variables are initialized and the memory required is allocated. The program is designed to calculate the parameters already exposed (see Timing parameters, subsection 2.1.2 on page 6) for each flow and globally for the whole trace.

Therefore, a global structure of data, named am_tcpflow_data, is maintained during the process. In addition, a data structure must be created for each flow, which is named extra_flow_data_tcp. Each one of these structs requires more than 1 kB of memory (1,376 bytes). This point has to be contemplated due to performance issues. As many structs as active flows per trace are created and that can cause memory saturation. Moreover, to this memory requirements it has to be added the resources needed by the PACE itself such as the memory reserved for the hash tables used to track the flows and the subscribers information (440 bytes and 980 bytes respectively, for a standard configuration).

4.2.2 Process packet

Since a flow begins, each packet belonging to that flow entering the systems triggers the callback am_tcpflow_process_packet. If there are no errors this function is the responsible to extract the time parameters related to this packet.

However, before that happens a verification is done concerning the time-stamps. It can occur that this value is corrupted. This can lead to the appearance of negative values in the time between packets when the time-stamp is previous to the current one. To minimize this effect a correction is done when it happens, assigning the most recent time seen so far to the incoming packet (see illustration 4.2). For that purpose, before proceeding to calculate any difference between times, the local variable detection_time (which initially corresponds to the ts parameter of the pat_packet struct defined above) is compared with the flow_extra_tcp->past_pck (line 4). If it is smaller then the current detection time it is omitted and overwritten with the flow_extra_tcp->past_pck value (line 6). Otherwise, the field flow_extra_tcp->past_pck is updated with the actual value (line 10).

The extraction of timing parameters in TCP flows does not make sense in unidirectional traffic. Unidirectional flows are those tracked only in one direction. This can take place for many reasons, a common one is when traffic is sent by different links on a network. In this situation, the characteristics and relevant

21 4.2 Application Module TCP Flow: TCP

Illustration 4.2 (Time correction) void am_tcpflow_process_packet(void *_data, const struct pat_packet *packet) 1 { 2 ... 3 if (timercmp(&detection_time, &flow_extra_tcp->past_pck, <)) 4 { 5 detection_time = flow_extra_tcp->past_pck; 6 } 7 else 8 { 9 flow_extra_tcp->past_pck = detection_time; 10 } 11 ... 12 } 13 When a time-stamp error occurs the current time (detection_time) is set up with the time seen so far

values as the handshake processes or the PCK-ACK and DAT-ACK times can not be calculated. Hence, for this study, only bidirectional flows are tracked. Moreover, to simplify the tracking and processing only the flows showing the SYN-SYNACK handshake (see Connection process, subsection 2.1.1 on page 4) are considered as a started flows, and therefore processed.

This procedure implies having active the memory required for that kind of flows even though only one SYN packet is seen. It is in some way hardly optimal, but necessary. Is important to remember that this situation is actually the one causing this study. Therefore the program has to wait to see a possible response to the initial SYN packet.

What really represents a non-optimal procedure is to maintain the memory resources for the started flows since they are not taken into account in the calculation of values. However, although they will occupy resources, its impact is really small and can be neglected when studying large traces (where the live time of a flow is much lower than the trace duration).

Then, if the flow is considered started, the values of each field studied are stored or calculated, depending on its nature. For example, for the absolute values like time time between packets it is only necessary to save the data, and the calculation can be done just when the flow is finished. This is so because these times only appear once per flow. To this procedure can be assigned the SYN-SYNACK, SYN-DAT, FIN-ACK and RST-END times. The same principle applies to the establishment of the maximum and minimum values per flow.

On the other hand, for the statistical values is more convenient to calculate them in an on-line mode. Otherwise, the memory requirements can grow too much and cause memory saturation in combination with the necessities seen in the section before (see subsection 4.2.1). For the average calculation only the on-line version is performed (see Arithmetic average, formula 2.2 on page 8). Nevertheless, it is given the option to perform the STD in both the standard and the on-line way (see Standard deviation, subsection 2.2.2 on page 9). The option can be switched by means of the flag ONLINE_STD.

22 4.2 Application Module TCP Flow: TCP

The rationale for this double option lies on the memory problems faced by performing the calculations in the standard way (see Standard deviation, formula 2.3 on page 9). Consider that for applying the standard way it is needed to store all the timestamps for every parameter to be calculated (DAT-ACK, PCK-ACK, DAT-DAT and PCK-PCK) for every flow, for each direction and globally. Moreover, for the absolute STD values of the trace all the timestamps have to be stored. Since the number of packets processed are around 7,074 millions (see Traces properties, table 5.1 on page 39), and each timestamp has a size of 64 bits, just for storing the global time between packets (PCK-PCK) of that trace would be needed around 50 GB of memory, which is notoriously excessive.

In addition, although it is not measured, performing the on-line method is supposed to be faster since does not need to access the memory. Considering that the loss of accuracy is neglectful it is obvious that the usage for processing large traces has to be the done in the on-line way (see Standard deviation, formula 2.6 on page 10).

The functions employed for both ways of calculating the STD are shown (see illustration 4.3). If the ONLINE_STD (line 1) is not defined the program runs the standard way (see Standard deviation, formula 2.3 on page 9). For that purpose every time a new value has to be added to the list the function add_element_std_aux is called (line 3). Then a struct is defined as std_list for creating a dynamic list, so each new element contains the time value (line 6) and a reference to the previous value (lines 7,8 and 9). When the trace is finished the function std_calculate (line 12) is called and the dynamic list is traversed (line 17) to calculate the desired value (line 19) until there are no more elements in the list, so the final value can be calculated (line 25 or 26). On the other hand, if the ONLINE_STD flag is defined the operation is performed in the on-line way (see Standard deviation, formula 2.6 on page 10). That is, for each new value the std_calculate_online function is called (line 33) and the STD is calculated (line 39). For performing this method is defined a struct called std_online_val which includes the parameters needed by the corresponding formula, and which are updated in every call (line 42 and 43).

The last main task in this function is the tracking and calculation of DAT-ACK and PCK-ACK times (see illustration 4.4). For every packet entering the program the information needed to see its ACK status is stored in the struct follow_ack_list (line 1). These parameters are the sequence number (line 2), necessary to identify the packet which is being acknowledged, the amount of data carried by this packet (line 4), necessary to distinguish between DAT-ACK or PCK-ACK and obviously the time-stamp (line 3). Besides of saving its own information (line 13), it is checked if this packet is the ACK of the packets from the other direction stored before (line 21). If it happens, the packet which is being acknowledged and the data concerning packets before this one are deleted since TCP paradigm establishes that they are acknowledged as well (line 26). If not happens then it is checked the next packet on the list (line 37 and 21). This operation is done checking the whole list of packets stored (line 19).

23 4.2 Application Module TCP Flow: TCP

Illustration 4.3 (STD functions) #ifndef ONLINE_STD 1 2 void add_element_std_aux(struct std_list **std_list, struct timeval *time) 3 { 4 struct std_list *new_std_aux = calloc(1, sizeof(struct std_list)); 5 new_std_aux->time = *time; 6 if (*std_list == NULL) new_std_aux->next = NULL; 7 else new_std_aux->next = *std_list; 8 *std_list = new_std_aux; 9 } 10 11 void std_calculate(struct std_list *std_list, float *res, float average, u64 n) 12 { 13 float std = 0; 14 struct std_list *std_eval = std_list; 15 16 while(std_eval != NULL) 17 { 18 std += pow((timeval_to_usec(&std_eval->time) - average),2); 19 struct std_list *std_aux = std_eval->next; 20 free(std_eval); 21 std_eval = std_aux; 22 } 23 std_list = NULL; 24 if (n > 1) std = std/(n-1); 25 std = sqrt(std); 26 27 *res = std; 28 } 29 30 #else 31 32 float std_calculate_online(struct timeval *time, float average, struct std_online_val *val, 33 u64 num) { 34 float std = 0; 35 u64 timeval = timeval_to_usec(time); 36 37 if (val->b_first == 1) 38 std = sqrt((val->M + (average - timeval)*(val->average - timeval))/(num-1)); 39 40 val->b_first = 1; 41 val->M = val->M + (average - timeval)*(val->average - timeval) ; 42 val->average = average; 43 44 return std; 45 } 46 The two first functions are used to create a dynamic list with all the time values and to calculate the STD in a standard way (see Standard deviation, formula 2.3 on page 9). The third function is used to perform the on-line calculation (see Standard deviation, formula 2.6 on page 10)

24 4.2 Application Module TCP Flow: TCP

Illustration 4.4 (ACK times) 1 struct follow_ack_list{ 2 u32 seq_num; 3 struct timeval time; 4 u32 l7_len; 5 struct follow_ack_list *next; 6 }; 7 ... 8 void am_tcpflow_process_packet(void *_data, const struct pat_packet *packet) 9 { 10 ... 11 //search packet and ack 12 add_element_follow_ack_list(flow_extra_tcp, data, packet); 13 14 struct follow_ack_list *ack_follow_eval = flow_extra_tcp->ack_follow_packet_ack[packet->dir 15 ^1]; struct follow_ack_list *last_ack_follow = NULL; 16 u8 b_first_packet = 1; 17 18 while (ack_follow_eval != NULL) 19 { 20 if (((htonl(packet->tcp->ack_seq) == ack_follow_eval->seq_num + 1) ||((htonl(packet->tcp-> 21 ack_seq) == ack_follow_eval->seq_num + ack_follow_eval->l7_len) && (ack_follow_eval-> l7_len > 0)))) { 22 ... 23 //delete_element_follow_ack_list(); 24 25 while(ack_follow_eval != NULL) 26 { 27 struct follow_ack_list *aux_ack_follow_delete = ack_follow_eval->next; 28 free(ack_follow_eval); 29 ack_follow_eval = aux_ack_follow_delete; 30 } 31 32 if (last_ack_follow != NULL) last_ack_follow->next = NULL; 33 if (b_first_packet == 1) flow_extra_tcp->ack_follow_packet_ack[packet->dir^1] = NULL; 34 35 } 36 else 37 { 38 struct follow_ack_list *aux_ack_follow_eval = ack_follow_eval->next; //save next field 39 last_ack_follow = ack_follow_eval; //save last pointer to set next field to NULL when 40 deleting ack_follow_eval = aux_ack_follow_eval; //overwrite pointer 41 b_first_packet = 0; 42 } 43 } 44 ... 45 } 46 When a packet is acknowledged is important to delete the data of packets arrived before this one to avoid memory leaks

25 4.2 Application Module TCP Flow: TCP

4.2.3 Flow ended

As explained before (see Introduction, chapter 1 on page 1) a flow is ended when an inactivity of 600 seconds (10 minutes) is seen. At this point the final absolute times are established. In case of performing the STD calculation in the standard way it is done here too (see illustration 4.3). The absolute values for the maximum and minimum times are also established with the results of the finished flow.

When a flow is finished is also done the set up of the output, by means of a feature that prepares the data to be displayed to the screen or to be printed to external files. The data format used to the second purpose is the Comma-Separated Value (CSV). At this point some problems concerning memory saturation can appear again. The output framework of PAT buffers the information until the traces are finished (callback finalize). So, when big traces are processed a feature in the program can avoid this problem. When it is activated it prints the buffered data when a specific value (100 by default) is reached.

26 4.3 Post-processing

4.3 Post-processing

As this study aims to establish different results depending on the protocols or groups of them, some more steps are necessary after this first processing. For this purpose some script programming has been developed. The scripting languages employed are shell-scripting (Linux) and Ruby. In addition, in order to plot the approximations of the PDF and CDF is used the math tool Octave.

4.3.1 Final values

The first step to obtain the desired values is to separate the results from the CSV files according to its protocol or protocol group. The data output format from am_tcpflow module is printed in a single CSV line per flow, so with the help of the grep command this classification can be done.

With the data classified the information available so far is formed by different kind of time values (see Timing parameters, subsection 2.1.2 on page 6). For the single parameters (those which appear once in a flow: SYN-SYNACK, SYNACK-ACK, SYN-DAT, FIN-ACK and RST-END) are available single time values. However, for the other parameters (DAT-DAT,PCK-PCK,DAT-ACK,PCK-ACK) the data available is the average and STD in that flow. As the purpose of the study is to characterize the statistical parameters by protocol profiles, the processing of these two groups has to be different.

Thus, for the first group the procedure to calculate the average and the STD has to be done in the standard way. Nevertheless, the same values of the other fields must be calculated with different algorithms or modifying the standard versions. Moreover, an approximation of the PDF and the CDF is done, and due to the same reasons they have to be done in a different way.

These operations are performed both by a Ruby script. Below are exposed the main points and calculation methods, emphasizing in the different ways to calculate. Some common methods are used for the different sets of data (see illustration 4.5). The class Array (line 1) defines the sum (line 3), mean (line 6), std (line 10) and intervals (line 16) methods. The functionality of the three first is clear. The intervals method is used to calculate the intervals for the histogram and distribution of the datasets, which are contained in the class Distribution (line 29), so it creates a vector with the values for such purpose (line 21).

Single values Since the data available is the complete set of samples, the calculation of the final values for this group is based on the standard way for both the average (see Arithmetic average, formula 2.1 on page 8) and the STD (see Standard deviation, formula 2.3 on page 9). To perform this task a dedicated class has been created (see illustration 4.6). It defines its variables (line 5) and two methods. The first is used to add the data of the corresponding variable (line 14,15 and 16) and the second is the one called to obtain the final statistics (line 20 and 21) by using the common methods already defined in the class Array (see illustration 4.5).

27 4.3 Post-processing

Illustration 4.5 (Common methods) class Array 1 2 def sum 3 self.inject(0){|accum, i| accum + i} 4 end 5 6 def mean 7 self.sum/self.length.to_f 8 end 9 10 def std (m) 11 var = self.inject(0){|accum, i| accum +(i-m)**2 }/(self.length - 1).to_f 12 return Math.sqrt(var) 13 end 14 15 def intervals (max) 16 res = Array.[] 17 for i in 0..INTERVALS-1 18 res[i] = 0 19 self.each do |value| 20 res[i] = max.to_f/INTERVALS*(i+1) 21 end 22 end 23 return (res) 24 end 25 ... 26 end 27 ... 28 class Distribution 29 30 attr_accessor :intervals, :histogram, :distribution 31 32 def initialize 33 @intervals = [] 34 @histogram = [] 35 @distribution = [] 36 end 37 ... 38 end 39 These methods are used by both ways of calculating the STD

28 4.3 Post-processing

Illustration 4.6 (Class DataSingles) class DataSingle 1 2 attr_accessor :list, :min, :max, :avg, :std 3 4 def initialize 5 @list = [] 6 @min = 0xFFFFFFFF 7 @max = 0 8 @avg = 0 9 @std = 0 10 end 11 12 def addData(val) 13 @list << val.to_i if not val.to_i == 0 14 @min = val.to_i if val.to_i < @min and not val.to_i == 0 15 @max = val.to_i if val.to_i > @max and not val.to_i == 0 16 end 17 18 def calculateStatistics 19 @avg = @list.mean if @list.length > 0 20 @std = @list.std(@avg) if @list.length > 1 21 end 22 end 23 This class performs the standard operations for the average (see Arithmetic average, formula 2.1 on page 8) and the STD (see Standard deviation, formula 2.3 on page 9)

For obtaining the PDF and the CDF are defined two more methods (see illustration 4.7) in the Array (line 1) class and also a method calculate (line 29) in the Distribution class (line 27). The methods defined inside the Array class are intended to generate both vectors with the correspondent values when the data available are the single time values (see Probability distributions, section 2.3 on page 11). In the distribution (line 3) function the proceeding in so that the vector increases its result if it is bigger than the corresponding interval (line 8). The histogram (line 14) proceeding is similar but the condition (line 19) now has two limits since it is not accumulating the values below the current interval.

Combined values As it has been introduced above, for this group the way of calculating the parameters has to be adapted. The average is calculated by employing the on-line formulation for a set of values (see Arithmetic average, formula 2.2 on page 8). The STD is calculated following the formula for a set of values as well, but not in the on-line way (see Standard deviation, formula 2.4 on page 9). So, another class is created for this group (see illustration 4.8). Note the differences in the addData (line 17) and calculateStatistics (line 27) methods, where as already said another formulation is used.

To obtain the PDF and CDF the code has to be modified again since the samples are not standard samples but averages (see illustration 4.9). The methods to calculate each distribution distribution_avg (line 3) and histogram_avg (line 17) receive now the of elements to consider for each interval of the distribution (in the case before it was not needed since it was one). That is, instead of increasing by one for each time

29 4.3 Post-processing

Illustration 4.7 (Single values distribution) class Array 1 ... 2 def distribution (max) 3 res = Array.[] 4 for i in 0..INTERVALS-1 5 res[i] = 0 6 self.each do |value| 7 res[i] += 1 if (value.to_f <= max.to_f/INTERVALS*(i+1)) 8 end 9 end 10 return (res) 11 end 12 13 def histogram (max) 14 res = Array.[] 15 for i in 0..INTERVALS-1 16 res[i] = 0 17 self.each do |value| 18 res[i] += 1 if (value.to_f > max.to_f/INTERVALS*i) and (value.to_f <= max.to_f/ 19 INTERVALS*(i+1)) end 20 end 21 return (res) 22 end 23 ... 24 end 25 26 class Distribution 27 ... 28 def calculate(val_list, avg, std) 29 if val_list.length > 0 then @intervals = val_list.intervals(avg+STD_VAL*std) else 30 @intervals.set_zero end if val_list.length > 0 then @histogram = val_list.histogram(avg+STD_VAL*std) else 31 @histogram.set_zero end if val_list.length > 0 then @distribution = val_list.distribution(avg+STD_VAL*std) else 32 @distribution.set_zero end end 33 ... 34 end 35 The calculation of the values used to approximate the PDF and CDF (see Probability distributions, section 2.3 on page 11)

30 4.3 Post-processing

Illustration 4.8 (Class DataContent) class DataContent 1 2 attr_accessor :avg_list, :num_dist, :aux_avg, :aux_std, :aux_avg_num, :min, :max, :avg, :std 3 4 def initialize 5 @avg_list = [] 6 @num_dist = [] 7 @aux_avg = 0 8 @aux_std = 0 9 @aux_avg_num = 0 10 @min = 0xFFFFFFFF 11 @max = 0 12 @avg = 0 13 @std = 0 14 end 15 16 def addData(num, min, max, avg, std) 17 @avg_list << avg.to_i if not avg.to_i == 0 18 @num_dist << num.to_i if not avg.to_i == 0 19 @aux_avg += (avg.to_f)*(num.to_f) if not avg.to_i == 0 20 @aux_std += num.to_f*(avg.to_f*avg.to_f+std.to_f*std.to_f*((num.to_f-1)/num.to_f)) if not 21 avg.to_i == 0 @aux_avg_num += num.to_i if not avg.to_i == 0 22 @min = min.to_i if min.to_i < @min and not min.to_i == 0 23 @max = max.to_i if max.to_i > @max and not max.to_i == 0 24 end 25 26 def calculateStatistics 27 @avg = aux_avg.to_f / @aux_avg_num if @aux_avg_num > 0 28 @std = Math.sqrt((@aux_std.to_f - @avg*@avg*@aux_avg_num)/(@aux_avg_num-1)) if @aux_avg_num 29 > 1 end 30 31 end 32 For the combined values the calculation of PDF and CDF is a little bit more complex. The average is calculated as a variation of the standard mode (see Arithmetic average, formula 2.1 on page 8) while the STD is calculated in a not standard mode (see Standard deviation, formula 2.4 on page 9)

31 4.3 Post-processing element as in the single values case, since the data available now is the average and a number of times composing this average this is what is considered to approximate this distributions (line 9 and 23).

Illustration 4.9 (Combined values distribution) class Array 1 ... 2 def distribution_avg (num_val, max) 3 res = Array.[] 4 for i in 0..INTERVALS-1 5 res[i] = 0 6 k = 0 7 self.each do |value| 8 res[i] += num_val[k].to_i if (value.to_f <= max.to_f/INTERVALS*(i+1)) 9 k = k + 1 10 end 11 end 12 return (res) 13 end 14 15 16 def histogram_avg (num_val, max) 17 res = Array.[] 18 for i in 0..INTERVALS-1 19 res[i] = 0 20 k = 0 21 self.each do |value| 22 res[i] += num_val[k].to_i if (value.to_f > max.to_f/INTERVALS*i) and (value.to_f <= 23 max.to_f/INTERVALS*(i+1)) k = k + 1 24 end 25 end 26 return (res) 27 end 28 ... 29 end 30 ... 31 class Distribution 32 ... 33 def calculate_avg(avg_list, num, avg, std) 34 if avg_list.length > 0 then @intervals = avg_list.intervals(avg+STD_VAL*std) else 35 @intervals.set_zero end if avg_list.length > 0 then @histogram = avg_list.histogram_avg(num, avg+STD_VAL*std) else 36 @histogram.set_zero end if avg_list.length > 0 then @distribution = avg_list.distribution_avg(num, avg+STD_VAL*std 37 ) else @distribution.set_zero end end 38 39 end 40 Since the data available for this group are not the samples but the averages its approximation has to be modified

Note that the calculation of intervals for the PDF and CDF extraction is done up to the value {avg + STD_V AL∗std} in the both the single values (see illustration 4.7) and the combined values (see illustration 4.9) implementation. The main reason for this boundedness is to avoid values out of order or "outliers" in the representation. They appear often, due to different reasons when doing statistic studies [14]. In this specific study the consideration of them is more related with simplification of the representation rather than

32 4.3 Post-processing due to its "outlier" behaviour. This filtering is only done for the representation of the distribution, and not for calculating the final average and the STD. This is so performed because filtering these values can affect the accuracy of the results, specially the STD (since it ponders quadratically the sample). There are some definitions about when consider a sample as an "outlier", a widely used one is to consider the so written with a value for STD_VAL=2.5.

During this part of the process it has been tracked the percentage of data used for the distribution representation. It is done to be sure that the exclusion of these samples has not a strong impact with the statistic values, or at least that it considers a high percentage of the samples. This results are not entirely shown, but some of them are exposed when estimating the time-out parameter (see Set up and evaluation of time-outs, chapter 6 on page 75).

4.3.2 Interpretation and representation

At the end of this process the final data is available to be interpreted. The next and last step of the process is to prepare the data to be readable and obtain the graphics for the PDF and especially for the CDF. As the data obtained for that purpose is actually the histogram and cumulative histogram it has to be treated to get the approximations (see Probability distributions, section 2.3 on page 11). Also is interesting to see and compare the expected theoretical distributions (modelled with the calculated parameters) with the obtained directly form the samples (or the averages). For that purpose it has been used the Octave tool. The scripts used are not exposed since they not have any particular point to highlight.

33 4.4 Application Module TCP Flow: UDP

4.4 Application Module TCP Flow: UDP

As it has been introduced and is shown in the next chapter (see chapter 5), it results really useful to complement the characterization of TCP traffic with the parallel study of UDP as well. The operation of UDP traffic is easier than the TCP one since it does not include any type of intrinsic control mechanisms. Due to that reason the process for tracking this values is more simple than the followed by the TCP mechanism.

The UDP study can be activated or disabled in the am_tcpflow with the definition of a flag. The way of proceeding is strongly similar to the description done for the TCP traffic (see Application Module TCP Flow: TCP, section 4.2 on page 21). However, there are some aspects to comment. The parameters to be studied in this case are just the times between packets. There is no distinction possible between packets or data packets, as it was done for the TCP flows. Moreover, the maximum and minimum values are not studied in this case. For that reason a new struct extra_flow_data_udp is declared to track the UDP flows just with the information needed (256 bytes). In the same way as for TCP traffic the general information about the whole trace is maintained in the struct am_tcpflow_data.

The post-processing steps for this group of data, it is those focused to obtain the final values are almost the same followed by TCP (see section 4.3). In fact, the process now is more simple since it only has to handle one kind of values (there are no ACK in this scenario) and in addition the maximum an minimum values are not considered (see Combined values, illustration 4.8 on page 31).

4.4.1 Flow started

When a flow starts (callback am_tcpflow_flow_started) and it is a UDP one the related struct is allocated. At this point appears a challenge to be faced treating this traffic. For TCP flows is easy to determine which are the C2S and S2C directions (see Timing parameters, definition 2.1 on page 6). However, UDP does not behave in the same way, in addition in many scenarios it works in a Client to Client (C2C) way. As the main purpose of studying UDP flows is to complement the TCP results, it is not adequate to consider this C2C direction, so a criteria to determine what to do with this packets has to be established.

PACE includes a feature to characterize the direction of a flow, but most of the times it needs at least a few number of packets, something that not always is available. Moreover, it is able to classify the C2C direction even though it will not be considered. Thus, after performing some tests, the next criteria seems to be a good one to consider the statistical times.

Definition 4.1 (Direction setting UDP) The C2S direction is set as the one of the first packet seen. However, if PACE is able to determine the actual C2S or S2C direction it is modified.

If PACE can determine the direction as C2S or S2C the flow is so marked. When the direction of the traffic is C2C, by probability using the definition above (see definition 4.1), around half of connections are determined as C2S and the other half as S2C. In case of PACE is not able to determine the direction is probably because

34 4.4 Application Module TCP Flow: UDP there are not packets enough for that. However the same principle as for C2C case is applied. For standard client and server traffic is quite likely that the side starting the connection is the client.

4.4.2 Process packet

The calculation of statistic values is mainly done in the same way than for TCP except for a difference (see Process packet, subsection 4.2.2 on page 21). A problem faced in this scenario is that unlike for TCP it can not be considered until the flow is ended. It occurs due to the procedure of selecting the direction, since it can not be determined with the first packet. This changes the way to calculate the statistical parameters for the absolute values, since they can not be calculated in an on-line mode due to not knowing the direction. Then, the absolute average and STD have to be calculated when the flow is ended (see subsection 4.4.3).

4.4.3 Flow ended

As just explained the main differences in the way of calculation of the UDP procedure take place here, being the average (see illustration 4.10) calculated with multiple values and not one by one. The formula (see Arithmetic average, formula 2.2 on page 8) is basically coded in the avg_calculate_multiple function (line 6).

Illustration 4.10 (Multiple samples average) float avg_calculate_multiple(float avg_sum, float last_avg, u64 last_num, u64 num) 1 { 2 float average; 3 4 if (num != 0) 5 average = (last_avg*last_num + avg_sum*(num - last_num))/(num); 6 else 7 average = last_avg; 8 9 return average; 10 } 11 Calculation of average employing not single samples but the average of some samples (see Arithmetic average, formula 2.2 on page 8)

The STD (see illustration 4.11) calculated with the multiple sample version (see formula 2.4) is the same one used in the Ruby script (see Combined values, illustration 4.8 on page 31). Every time a flow finishes it is used its average and STD to calculate by means of the function std_calculate_multiple_aux (line 1) an intermediate value called data->std_udp_aux_dir[] (line 15). The final calculation is done when the complete data is available (callback finalize, line 21) coding the formula already referred (line 23).

35 4.4 Application Module TCP Flow: UDP

Illustration 4.11 (Multiple samples STD) double std_calculate_multiple_aux(double avg, double std, u64 num) 1 { 2 double aux = 0; 3 double dnum = (double)num; 4 5 aux = dnum*(avg*avg + std*std*((dnum - 1)/dnum)); 6 7 return aux; 8 } 9 10 ... 11 12 void am_tcpflow_flow_ended(void *_data, struct pat_flow_user_data *flow, enum 13 toh_removal_reason reason) { ... 14 data->std_udp_aux_dir[DIR] += std_calculate_multiple_aux(flow_extra_udp->avg_packet_dir[ 15 flow_extra_udp->dir], flow_extra_udp->std_packet_dir[flow_extra_udp->dir], flow_extra_udp->num_packet_dir[flow_extra_udp->dir]); ... 16 } 17 18 ... 19 20 void am_tcpflow_finalize(void *_data) 21 {... 22 data->std_abs_udp_packet_dir[DIR] = sqrt((data->std_udp_aux_dir[DIR] - pow(data-> 23 avg_abs_udp_packet_dir[DIR],2)*data->abs_num_udp_packet_dir[DIR])/(data-> abs_num_udp_packet_dir[DIR] - 1)); ... 24 } 25 Procedure to calculate the STD without having the samples but the average and STD of a set of samples (see Standard deviation, formula 2.4 on page 9). The last step is done inside the callback finalize

36 4.4 Application Module TCP Flow: UDP

Moreover, another relevant issue is contemplated in this part related with the consideration of flows to be studied. For the TCP procedure this point does not apply since the condition is to see a SYN-SYNACK process, so that implies having at least two packets. Hence, for UDP flows has to be specified when to consider a flow as a started one, and this is when at least two packets have been seen (otherwise any timing statistical parameter can be studied).

37 5 Results

After the process followed the amount of data collected is quite abundant and therefore laborious to interpret. Not all the parameters calculated are useful, or it does not make sense to study them for some protocol groups. Hence, this chapter is aimed to present and justify as simplified as possible the main findings and results of this study. It is intended to synthesize as much as possible the results without loosing valuable information. Therefore some values are omitted, when it is done a justification is given.

5.1 Used traces: ISP_Core and ISP_Mob

As the main objective to achieve is to characterize real world traffic, the samples for this study need to be recorded in real scenarios. For testing and developing some other traces have been used. However, those ones have been mostly recorded to study specific protocols under an specific conditions, providing not real behaviour. Therefore, two main traces are used for extracting the final statistical values. As they have been recorded in two different network scenarios they provide a good comparative basis for the results.

These traces are taken under a confidential agreement, which means that specific details can not be published. However, for the purpose of this study a description of some issues is enough, like the type of network where the data was taken, the size of the traces or their duration.

ISP_Core This trace is the main one to take in consideration. It was recorded in a internal link of a big Tier-1 ISP, after the access and aggregation sections and before the output router to the backbone. This type of networks are often designed with multiple links due to balancing and failure issues. As a consequence of that, since this trace is captured in one of those links, many of the traffic recorded is asymmetric. The duration of this trace is around 38,160 seconds (10.6 hours) and its size is around 2.5 TB. It was recorded on 2009. It is a good sample to extract the followed values since it is composed by the aggregation of many different traffic. No data is omitted in this case for the TCP traffic. Even for the low frequent protocol groups the number of samples is good enough to extract statistical parameters (the minimum number of flows studied is 1,458). For the UDP traffic the business, mobile, remote_control, filetransfer and conference groups are omitted since the number of detected flows are not representative (74, 127, 89, 180 and 102 flows respectively).

ISP_Mob This trace is a good complement for the ISP_Core one since it had been recorded in a completely different scenario, a mobile operator network. It was taken somewhere between the Serving GPRS Support Node (SGSN) and the Gateway GPRS Support Node (GGSN), elements of a standard

38 5.1 Used traces: ISP_Core and ISP_Mob

General Packet Radio Service (GPRS) network. As it is near to the edge, this trace is supposed to contain a high proportion of symmetric traffic. As it is common in these scenarios, the transmission of the data is done over the GPRS Tunneling Protocol (GTP). However, PACE includes features for the decapsulation of this tunnelling protocol. This trace has a duration of 2,700 seconds (45 minutes) and a size of 130 GB. It was recorded on 2010. For the study of this trace the results of the remote_control, database and conference groups have been omitted for the TCP results. This is due to the low number of flows detected, being 63, 83 and 3 respectively. For the UDP only the conference group result has been omitted (only 1 flow detected).

Duration (sec) Size (MB) Data Tx (MB) Packets ISP_Core 38,160 2,591,636 2,470,177 7,074,618,384 ISP_Mob 2,700 133,046 129,486 233,359,695

Table 5.1 – Traces properties

The table above (see table 5.1) shows some of the parameters seen in the description of the traces and in addition the total amount of data and the number of packets transmitted during each trace.

The results concerning each trace are presented in different sections for each set of parameters due to simplicity and easiness to compare. Then, the information is structured so that the first section is intended to provide a general view of the traces and of each protocol group classified by TCP or UDP and with the addition of some parameters defined below (see subsection 5.2.2). The rest of data is presented in subsections according to the three general groups defined for TCP (see Connection process, subsection 2.1.1 on page 4). As the UDP data does not fit into this classification it is shown with the DAT-DAT and PCK-PCK times of the TCP flows in order to compare and extract the main findings.

39 5.2 Generic view

5.2 Generic view

Inside this section are included two differentiated sets of data which are not directly related with the timing analysis of the flows. It is helpful to better characterize each of the traces and also to see some interesting statistics about the presence of protocols in the networks, among others.

5.2.1 Flow usage

As a first part of the results but still describing the traces it is interesting to show some more information about them. In the table below (see table 5.2) it can be seen the number of flows containing each trace globally and separated by TCP or UDP. It is also exposed how many of these flows have been taken into account in the calculation of values as explained in the chapters above. For TCP are considered the flows showing at least a SYN-SYNACK and its response SYNACK-ACK (see Process packet, subsection 4.2.2 on page 21), while for UDP the condition is to have at least two packets in any direction (see Application Module TCP Flow: UDP, section 4.4 on page 34).

Total TCP UDP TCP used UDP used ISP_Core 295,729,886 159,444,165 127,930,249 42,521,933 55,182,658 ISP_Mob 6,093,604 3,904,103 2,063,391 3,454,210 1,850,579

Table 5.2 – Flow usage

Note that the results are in concordance with the expected to be seen in each trace, according to their characteristics. In the ISP_Core trace the number of used TCP flows (42,521,933) is around 27% of the total (159,444,165), which is attributed to the presence of many uni-directional flows in addition to the presence of ongoing flows (which are not used). On the other hand, and since in this scenario the traffic is symmetric, the ratio for the ISP_Mob trace is around 88%.

For the UDP flows these ratios are 43% for ISP_Core and 90% for ISP_Mob. Although in this situation is more complex to give an explanation the same basis can be applied. Remember that for UDP flows both unidirectional and bidirectional flows have been considered, being the only condition of exclusion the flows containing a single packet (see Flow ended, subsection 4.4.3 on page 35). Then the low rate comparing the ISP_Core trace with the ISP_Mob one can be justified because the ratio of bidirectional flows with only two packets is relatively high. This fact combined with the asymmetric nature of the ISP_Core trace could lead to miss many flows seeing only a single packet per link.

Reaching this point it is also worthy to show some performance related issues continuing with the number of flows taken into account. In the next table (see table 5.3) can be seen the maximum number of both total and used active flows (flows maintained into the tracking table) globally and separately.

40 5.2 Generic view

Total TCP UDP TCP used UDP used ISP_Core 5,596,839 2,958,085 2,535,594 891,997 1,094,663 ISP_Mob 1,463,571 959,272 513,498 823,228 454,719

Table 5.3 – Maximum number of flows in system

The number of flows inside the system gives an idea about the memory required for maintaining the flows during a time interval (600 seconds). Although these values are maximums and they do not represent with fidelity the average behaviour, they are a good approximation. This is because after a transitory time the system tends to be stable (i.e. the number of elements in memory reaches its maximum). Note that for both traces the sum of the TCP and UDP parameters is approximately the maximum total number, which suggests that there is not a big deviation for this approach. Moreover the ratio between total and used flows considering this maximum values is in concordance with the values extracted from the flow usage (see table 5.2).

Performing this study under a time-out affects the results. Deleting the flows remaining inactive over 600 seconds can modify the precision of the study. It happens because some time values can be considered as a new flow. However, it will be seen in the next sections that the impact of this limitation is not highly relevant since the statistical results are considerably lower than this time-out. Moreover, regarding the memory performance of the am_tcpflow module and the whole program, a higher value for this time-out would probably lead to a memory saturation. Consider that the maximum number of active flows in the system for the ISP_Core is 5,596,839 (see table 5.3). That implies a necessity of near 4.4 GB of RAM only for the needs of this module (each TCP flow requires 1,376 bytes of memory and each UDP one 256 bytes), without taking into account the necessities of PACE (see Flow started, subsection 4.2.1 on page 21).

41 5.2 Generic view

5.2.2 Generic data

In this subsection, some additional items have been calculated to provide more information about the dataset. They are not directly related with the timing parameters but are useful indicators about the traces. For example, an interesting extraction from this data allows to see what proportion of flows, data and packets of each group is present in each sample. These parameters are:

• num_flows: Total number of flows studied.

• avg_pck & std_pck: Average and STD of the packets per flow.

• avg_dat & std_dat: Average and STD of the data transmitted per flow (expressed in bytes).

• avg_dur & std_dur: Average and STD of the duration of flows, expressed in milliseconds (ms).

group num_flows avg_pck std_pck avg_dat std_dat avg_dur std_dur generic 5,346,917 38 1,598 11,245 956,736 55,975 490,612 p2p 2,542,877 125 2,048 62,592 1,600,631 148,153 610,045 gaming 9,233 2,190 9,510 405,253 3,391,519 457,180 1,786,831 tunnel 4,533,297 43 1,297 18,985 1,026,681 42,712 371,814 business 5,995 16 49 2,359 16,554 7,750 31,595 voip 357,090 173 4,505 21,840 518,705 290,321 1,281,897 im 4,506,580 64 593 17,070 282,739 172,195 860,206 streaming 454,437 773 4,665 646,711 3,719,770 92,634 318,971 mobile 1,458 178 482 53,957 264,067 1,401,219 2,814,487 control 146,183 81 1,344 12,403 431.21 37,387 537,966 mail 6,025,310 44 456 17,217 344,242 22,010 141,982 management 3,853 11 8 649 2,157 14,157 50,711 database 16,537 32 239 5,574 62,588 21,264 320,123 filetransfer 294,396 219 3,687 187,437 3,328,188 85,011 616,347 web 18,275,149 42 1,105 23,816 636,207 45,654 155,234 conference 2,621 596 6,194 124,071 1,973,004 143,009 767,550

Table 5.4 – General TCP ISP_Core

42 5.2 Generic view

group num_flows avg_pck std_pck avg_dat std_dat avg_dur std_dur generic 346,225 21 354 3,744 121,572 35,410 133,529 p2p 88,673 50 430 9,989 196,814 92,367 201,617 gaming 142 1,035 3,266 109,167 422,130 112,209 297,515 tunnel 320,885 61 816 32,340 722,587 83,842 220,126 business 2,516 20 34 6,236 22,688 48,133 95,591 voip 2,285 67 356 3,387 24,652 130,948 340,717 im 15,931 61 506 11,971 157,012 114,021 298,420 streaming 24,563 562 2,591 423,253 2,042,749 48,586 144,474 mobile 10,298 70 103 39,889 77,820 16,115 42,766 mail 51,951 83 492 37,820 396,681 86,761 143,702 management 501 41 367 6,305 57,335 12,586 53,682 filetransfer 10,285 40 503 12,711 464,896 16,532 81,736 web 2,579,806 30 351 14,018 317,162 20,129 59,418

Table 5.5 – General TCP ISP_Mob

group num_flows num_pck data_tx generic 12.57% 7.94% 4.67% p2p 5.98% 12.45% 12.35% gaming 0.02% 0.79% 0.29% tunnel 10.66% 7.59% 6.68% business 0.01% 0.00% 0.00% voip 0.84% 2.42% 0.61% im 10.60% 11.37% 5.97% streaming 1.07% 13.75% 22.81% mobile 0.00% 0.01% 0.01% control 0.34% 0.46% 0.14% mail 14.17% 10.27% 8.05% management 0.01% 0.00% 0.00% database 0.04% 0.02% 0.01% filetransfer 0.69% 2.53% 4.28% web 42.98% 30.34% 34.1% conference 0.01% 0.06% 0.03%

Table 5.6 – Protocol groups proportions TCP ISP_Core

43 5.2 Generic view

group num_flows num_pck data_tx generic 10.02% 5.61% 2.09% p2p 2.57% 3.43% 1.43% gaming 0.00% 0.11% 0.03% tunnel 9.29% 15.28% 16.77% business 0.07% 0.04% 0.03% voip 0.07% 0.12% 0.01% im 0.46% 0.76% 0.31% streaming 0.71% 10.75% 16.80% mobile 0.3% 0.56% 0.66% mail 1.5% 3.37% 3.17% management 0.01% 0.02% 0.01% filetransfer 0.3% 0.32% 0.21% web 74.69% 59.48% 58.43%

Table 5.7 – Protocol groups proportions TCP ISP_Mob

group num_flows avg_pck std_pck avg_dat std_dat avg_dur std_dur generic 7,813,767 36 15,940 18,439 22,347,512 102,364 1,074,390 p2p 7,620,500 8 457 1,602 160,753 124,362 714,940 gaming 13,888 489 6,086 64,851 971,718 97,705 518,377 tunnel 10,728 4,063 79,176 1,079,016 13,968,484 1,843,451 5,895,218 voip 1,071,958 594 6,615 40,162 811,670 181,479 1,438,980 im 195,566 839 8,671 112,823 1,838,863 126,630 568,679 streaming 32,190,709 15 1,211 2,046 235,752 62,368 451,268 management 6,264,970 6 1,058 416 67,884 51,270 638,630

Table 5.8 – General UDP ISP_Core

group num_flows avg_pck std_pck avg_dat std_dat avg_dur std_dur generic 244,569 145 4,010 165,193 5,588,515 94,625 420,181 p2p 244,546 9 601 5,634 840,822 47,226 214,829 gaming 1,408 237 3,538 38,371 680,024 299,491 116,563 tunnel 5,554 548 3,833 201,914 2,163,790 524,805 889,738 voip 19,521 44 2,004 5,139 279,284 56,095 210,270 im 858 772 5,928 245.95 2,362,482 148,346 474,546 streaming 7,842 1,465 4,961 477,612 2,057,325 177,642 315,538 mobile 32,211 41 232 17,531 180,320 444,101 595,108 management 1,294,069 4 21 520 4,074 12,287 137,803

Table 5.9 – General UDP ISP_Mob

44 5.2 Generic view

group num_flows num_pck data_tx generic 14.16% 16.47% 47.57% p2p 13.81% 3.65% 4.03% gaming 0.03% 0.39% 0.30% tunnel 0.02% 2.53% 3.82% voip 1.94% 36.91% 14.21% im 0.35% 9.52% 7.29% streaming 58.33% 28.27% 21.75% management 11.35% 2.00% 0.86%

Table 5.10 – Protocol groups proportions UDP ISP_Core

group num_flows num_pck data_tx generic 13.22% 58.9% 83.74% p2p 13.21% 3.75% 2.86% gaming 0.08% 0.55% 0.11% tunnel 0.3% 5.07% 2.32% voip 1.05% 1.42% 0.21% im 0.05% 1.1% 0.44% streaming 0.42% 19.11% 7.76% mobile 1.74% 2.21% 1.17% management 69.93% 7.9% 1.39%

Table 5.11 – Protocol groups proportions UDP ISP_Mob

45 5.2 Generic view

5.2.2.1 ISP_Core

The first table (see table 5.4) shows the TCP general parameters. It is remarkable that the average number of packets per flow is quite similar for many of the protocol groups but not for the gaming (2,190 packets), streaming (773 packets) and conference (596 packets), showing a relative high value. However, the average amount of data transmitted per flow is also relatively high in comparison with the other groups. Then, this values are consistent taking into account the average of data transmitted per packet (i.e. the proportion of average data divided by the average number of packets per flow).

Also it is relevant to consider the average duration of each flow for understand its behaviour. For example, for the three mentioned groups the average durations are quite different. In the gaming group it is 457,153 ms, so the average rate of transference would be near to 1 kB/s since the average data per flow is near 400 kB. However, for the streaming group, which shows a similar data transmitted per flow (around 600 kB) the approximated average transference rate is 6 kB/s, since the average duration in this case is 92,634 ms.

The STD for both the number of packets and the amount of data transmitted per flow show a proportional behaviour in terms of the average between them per protocol group. This fact confirms that there is a relationship between the number of packets and the data transmitted. Continuing with this parameter, particularly low is the STD in relation to the average for the gaming (9,519/2,190), business (49/16), mobile (482/178) and specially the network_management (8/11) groups. This could be interpreted as the flows related to this protocol groups have a regular composition while the rest of them are more variable. However could be also that this small value for the STD is due to the low proportion of these flows so they do not represent several scenarios.

Concerning the average duration of flows per protocol there are not many disparities except for the mobile group. Its average duration is relatively long in comparison with the rest of values, being this one around 23 minutes (1,401,219 ms) when the rest of them are in the range from some seconds to no more than 8 minutes. Digging in these kind of traffic it is seen that mostly all of it belongs to the Blackberry application. Inside this protocol definition there are mixed different types of traffic (mail, HTTP, etc.) so this implies a non-generic way of proceed and explains this high value.

In addition to these parameters it is also interesting to see the percentages of traffic analysed by number of flows, number of packets and by data transmitted. Comparing these values can help to understand the relationship between them and to qualitatively see some aspects like how dense is a flow in terms of packets or data. It also provides a useful indicative view of what percentages of traffic are present in a real network. The calculation of packets and data transmitted is done by multiplying the absolute number of flows with the average number of packets and the average data per flow, respectively (this can be done if it is considered that both values are statistically independent).

First of all it is relevant to comment that the undetection rate represents around 12.5% of the flows and 4.7% of the data (see table 5.6). Considering the complexity and distribution of the network and that the trace is a snapshot of the real traffic it can be seen as a good rate. Anyway, a partial explanation for this undetermined data could be that the average number of packets per flow is relatively low, being not enough for the detection process (see table 5.4). If it is considered that these flows could belong to the p2p or voip groups, which in most cases are detected after a behaviour analysis, the detection process would need to see

46 5.2 Generic view a certain number of packets. Indeed, the average number of packets seen in the commented groups is notably higher than for the generic one. In addition, the TCP flows seen without any payload are considered as unknown, which is consistent with the relation between percentage of flows (12.5%) and data (4.7%).

It is also interesting to note the behaviour of the streaming group, where around 1.1% of the flows represents 22.8% of the data transmitted. This implies having large flows in both number of packets and data transmitted. An inverse example would be mail group, where 14.2% of flows carry 8% of the data.

Since it is useful to complement this study with the inclusion of UDP flows is shown as well the same set of data for this traffic (see table 5.8). In this situation, the average number of packets has less uniformity than in the TCP case. The values obtained for the streaming and p2p groups are remarkable since their averages are relatively low (8 and 15 ms) but the STD relatively high (457 and 1,211 ms). In both cases the number of flows is higher than those seen in in the TCP traffic, particularly in the streaming case. A suitable explanation for this fact could be that over UDP are carried the control flows of these groups in addition to carrying pure data flows, so that is why appear a big STD value. For the P2P group this approach is aligned with the exposed in [12], where it is found that P2P flows over UDP carry less than three packets in 76% of all cases.

Continuing with the UDP group it is also worthy to look at the effect of tunnel traffic since it transfers a considerable higher amount of packets and data in comparison with the other groups. Also the duration is notably higher. This is consistent with its expected behaviour since inside these flows are normally transmitted many different flows and protocols. However, notice that this behaviour is not given in the tunnel traffic over TCP. It could mean that the big tunnelled traffic pipes are carried over UDP more than over TCP.

The proportion of each group in terms of number of flows, packets and data transmitted is shown also for the UDP data (see table 5.10). This view makes clearer the assumption made for the streaming and p2p groups since in both cases the proportion of flows is quite higher than the amount of data transmitted. From this table is also easy to see that the voip it is normally carried by UDP since the flows are composed by many packets and transmit a representative amount of data.

About the generic group (remember that it is mainly composed by unknown traffic) it is normal to see in this scenario a higher proportion than for the TCP. The main reason could be that the amount of UDP flows with a few packets is quite relevant (although this value has not been extracted) and therefore the detection process is less accurate. Moreover, the fact of existing ongoing flows when the trace is captured is as well a potential reason for this rate (in the TCP case ongoing flows are not considered).

5.2.2.2 ISP_Mob

The general results (see table 5.5) show a similar behaviour than those seen for the ISP_Core trace (see table 5.4). The average number of packets per flow is quite regular again with the exception of the gaming (1,035) and streaming (562 ) groups but being coherent with the average amount of data transmitted per packet.

47 5.2 Generic view

About the STD parameters (average packets and data per flow) it is complex to extract a general assessment. For most of the groups STD is one order of magnitude over the average. These values show a proportional behaviour between themselves (i.e. relationship between number of packets and data transmitted) but also a similar behaviour comparing it with the obtained results for the ISP_Core. Therefore it seems to be a fact that the flow behaviour is not strongly associated with the network scenario and is more related with the protocol characteristics. Like in the ISP_Core trace, the STD factors obtained for the gaming (3,266/1,035), business (34/20) and mobile (103/70) groups are particularly low. But again the reason for that result could be that the number of flows considered is lower than for the other protocol groups.

Taking a look to the average time duration per group there are not peculiar values to comment. Seems to occur in this scenario that there is, in a generic way, higher variability of the durations compared with the ISP_Core, even though it is not excessively outstanding. In proportion with the average values the STD is mainly about a factor between two or three.

Again it is exposed the percentage of both the flows and data transmitted by each protocol group (see table 5.7). The rate of undetected TCP flows in this scenario is around 10% while in terms of data is 2.1%. This value is better than the achieved in the ISP_Core trace, which is understandable since the complexity of the flows in this trace is smaller. The proportions of typical file-sharing traffic as p2p and filtransfer is notably lower than in the ISP_Core trace while the web traffic is almost the double, which is common to be seen in mobile networks. What results peculiar is the low proportion of mail and im traffic in comparison with the core trace.

Anew are exposed the general values tracked for the UDP (see table 5.9) traffic and the proportions (see table 5.11). The behaviour in this scenario is quite unrelated with the one seen for the ISP_Core. The first outstanding result is the big percentage of data seen in the generic group (84%), which it is almost the double than in the other studied trace. The reason of this difference could be that in this scenario the number of ongoing flows is relatively high since the duration of the trace is notably smaller (see table 5.1).

Another big difference is seen in the streaming group since the proportion of UDP flows (0.4%) is notably lower than for the ISP_Core trace (58%). The proportion of voip is also quite unequal. Instead, the proportion of the network_management number of flows is considerably higher, which could be reasonable considering the different nature of the network. On the other hand, the behaviour seen for the p2p group is quite similar in both cases.

48 5.3 Initialization

5.3 Initialization

In this section are shown the values defined as SYN-SYNACK, SYNACK-ACK and SYN-DAT times (see Timing parameters, subsection 2.1.2 on page 6). The two first parameters are somehow equivalent to a RTT. Note that these values depend on where the trace is recorded and the network response, but also on the protocol behaviour just for the SYN-DAT time. However, it is complex to see what is the impact of each issue in these values.

The study of these times is useful to characterize where the trace has been recorded, which is important to understand and determine some properties of them. Then it is useful to approach, in terms of time, where in the network the traffic is analysed (see definition 5.1). Although this definition is included in this section, it is also applicable considering the PCK-ACK times (see subsection 5.4.1).

Definition 5.1 (Capture point determination) By comparing the RTT times between the C2S and S2C directions it is possible to have a conceptual idea of where in the network the trace is recorded in terms of time. The concept is simple, if these values are similar the point is centred in the network, while the more different they are the more near to an extreme, being the side with the bigger time the nearest to the capture point (see figure 5.1)

Figure 5.1 – Capture point behaviour Depending on where the traffic is analysed (discontinuous red line) the same measure can show different values for the RTT

5.3.1 RTT times

This section contain the initialization times, that is those defined as SYN-SYNACK, SYNACK-ACK and SYN-DAT. Studying this times separately (they could be considered as PCK-ACK times) can be useful for many reasons. For example, some implementations of TCP, as TCP Vegas, make use of an early RTT minimum estimation value [5]. Also, knowing these times can be contributive to prevent DoS attacks by deleting connections after a time-out.

49 5.3 Initialization

group avg_syn_ack std_syn_ack avg_asyn_ack std_asyn_ack avg_syn_dat std_syn_dat generic 639 2,698 756 4,599 2,263 10,289 p2p 680 1,726 704 2,873 2,917 10,260 gaming 224 220 186 769 537 1,599 tunnel 491 3,198 324 854 970 6,009 business 69 158 424 178 504 423 voip 362 704 428 2,298 976 3,497 im 240 469 259 1,034 669 4,034 streaming 222 852 195 690 499 2,676 mobile 235 242 489 1,135 1,300 4,447 control 63 328 317 389 448 1,816 mail 258 1,654 474 1,319 1,369 4,287 management 276 625 295 1,034 590 1,288 database 96 287 231 208 379 452 filetransfer 254 735 240 749 803 1,933 web 249 1,067 200 981 554 5,009 conference 200 357 285 749 653 1,410

Table 5.12 – Initial times ISP_Core (ms)

group avg_syn_ack std_syn_ack avg_asyn_ack std_asyn_ack avg_syn_dat std_syn_dat generic 246 1,350 883 4,814 2,163 9,807 p2p 971 1,820 1,133 2,062 2,204 3,009 gaming 279 660 242 301 562 749 tunnel 71 225 357 879 560 2,357 business 30 200 545 1,473 966 2,674 voip 107 393 728 1,651 892 1,728 im 100 334 532 1,390 911 2,613 streaming 80 350 437 1,356 737 3,019 mobile 102 627 614 2,279 1,064 3,040 mail 65 547 362 846 581 2,442 management 73 368 311 556 514 737 filetransfer 120 387 405 576 995 1,740 web 66 382 655 2,284 1,067 3,159

Table 5.13 – Initial times ISP_Mob (ms)

50 5.3 Initialization

5.3.1.1 ISP_Core

Analysing the SYN-SYNACK (see table 5.12) times it results relevant that the times seen for the business (69 ms), remote_control (63 ms) and database (96 ms) since they are relatively low. The cause for this disparity can be either that the time seen is really low or that an asymmetric point of capture is given. What really is happening is that the times seen in that groups match with the second explanation. It is, the relation of the average values between the C2S and S2C directions is notably unequal. The SYNACK-ACK times are notably bigger than the SYN-SYNACK, so the point where the times are being measured is near to the server side and that is basically why they are relatively small.

Following with this concept, it can be approximated that the most of the groups are recorded in a centred point with the exception of the groups already mentioned. However, this conclusion is better to be extracted with the PCK-PCK times and so is done (see subsection 5.4.1).

The STD values (in terms of the average) must be understood as they represent the variability of the network and the protocol responses. It is a common phenomena to see a high variability within flows, as affirmed in [5, 6] and confirmed in this results. In this sense, the relation within the different groups is quite stable, which makes sense since the behaviour of this parameter is independent of the protocol (this is just the initial TCP handshake).

On the other hand, for the last parameter SYN-DAT it is seen a different behaviour regarding the protocol groups. This value stands for how long does it takes since the TCP handshake starts until the first data packet is sent, therefore makes sense to think that not all the protocol will need the same time. This values are analysed bellow (see subsection 5.3.2).

5.3.1.2 ISP_Mob

The SYN-SYNACK, SYNACK-ACK and SYN-DAT times for the mobile network trace show a different behaviour (see table 5.13) than in the ISP_Core. The first outstanding consideration is that in general terms the capture point is nearer to the server side (except for the gaming group). Again, this concept is taken up when considering the PCK-ACK times (see subsection 5.4.1).

About the variability seen in this scenario it behaves similarly to the ISP_Core one. Notice as well than the SYN-DAT times are quite similar in absolute terms. Again this values are analysed following (see subsection 5.3.2).

51 5.3 Initialization

5.3.2 Protocol response

An interesting usage for the initialization times is to establish the protocol response fastness. This concept can be approximated by seeing the relationship between the first RTT (SYN-SYNACK or SYNACK-ACK) and the SYN-DAT (see definition 5.2).

Definition 5.2 (Protocol response characterization) A possible way to approximate what is the impact of the protocol response behaviour is to consider what multiple of the initial RTT (considered as the average of SYN-SYNACK and SYNACK-ACK) is the SYN-DAT time. As the first data packet would be sent after the last ACK packet of the initialization process, a value near to 2 would express a fast protocol response. The further to this value, the slower the protocol answers. This method is just an approximation and only is accurate if the point of capture is centred (see figure 5.2).

Figure 5.2 – Protocol response Only if the capture point is centred makes sense apply this way of measure the time response fastness

According with the definition done for the protocol characterization (see definition 5.2), only if the capture point is centred it is reliable to calculate the fastness response. However, since not all the groups meet this requirement an approximation can be done by considering the SYN-SYNACK as the half of the sum of the SYN-SYNACK and the SYNACK-ACK. Looking at the figure above (see figure 5.2) it is easy to see that this value is equivalent. Despite this, when the capture point is not centred there is a lost of reliability for that value. This fact is more pronounced when the capture point is close to one of the extremes and the first data packet comes from the opposite side (that could add, in the most extreme case, almost a whole RTT time to the measure). This fact can be taken into account and even could be corrected for those protocol groups where it is known what is the direction of the first packet (for example in HTTP is normally in the C2S direction).

52 5.3 Initialization

group avg_syn_ack avg_asyn_ack RTT avg_syn_dat response generic 639 756 698 2,263 3.24 p2p 680 704 692 2,917 4.21 gaming 224 186 205 537 2.62 tunnel 491 324 407 970 2.38 business 69 424 246 504 2.04 voip 362 428 395 976 2.47 im 240 259 250 669 2.68 streaming 222 195 209 499 2.39 mobile 235 489 362 1,300 3.59 control 63 317 190 448 2.36 mail 258 474 366 1,369 3.74 management 276 295 286 590 2.07 database 96 231 164 379 2.32 filetransfer 254 240 247 803 3.25 web 259 200 225 554 2.47 conference 200 285 242 653 2.69

Table 5.14 – Protocol response ISP_Core (ms)

group avg_syn_ack avg_asyn_ack RTT avg_syn_dat response generic 246 883 564 2,163 3.83 p2p 971 1,133 1052 2,204 2.10 gaming 279 242 260 562 2.16 tunnel 71 357 214 560 2.62 business 30 545 288 966 3.36 voip 107 728 417 892 2.14 im 100 532 316 911 2.88 streaming 80 437 258 737 2.85 mobile 102 614 358 1,064 2.97 mail 65 362 214 581 2.72 management 73 311 192 514 2.68 filetransfer 120 405 262 995 3.79 web 66 655 361 1,067 2.96

Table 5.15 – Protocol response ISP_Mob (ms)

53 5.3 Initialization

Figure 5.3 – Protocol response comparison Protocol response the ISP_Core trace (red) and the ISP_Mob one (yellow)

5.3.2.1 ISP_Core

According to the definition done for determining if the capture point is center it can been seen that in this situation it is mostly centred. Therefore in this scenario it is suitable to apply the protocol response definition (see definition 5.2). The value used is that one defined as RTT which is the average of the SYN-SYNACK and SYNACK-ACK. According to this criteria a table with these values are shown above (see table 5.14).

It is interesting to highlight the low response ratio obtained in some protocol groups like network_management or business (could be that in this last one the ratio is not a good approximation since the capture point is not centred as just explained). On the other hand the p2p and mail groups are the slowest.

5.3.2.2 ISP_Mob

In this case is not as reliable to apply the evaluation of the protocol response as in the ISP_Core trace since the capture point is not centred for many of the protocol groups (see definition 5.2).

Nevertheless, as already justified this result can be approximated since both the SYN-SYNACK and SYNACK-ACK times are available. Then, in this case is key to use the defined as RTT in the table. Thus, in the corresponding table this values are shown (see table 5.15). Except for the p2p and business groups, the results are similar to the obtained for the ISP_Core trace (see figure 5.3).

54 5.3 Initialization

Although this result is mainly notional it reveals that this response time is independent on the network, which is coherent with an expected behaviour since the time a server or application takes to send a data packet hast nothing to do with the network.

55 5.4 Data transfer

5.4 Data transfer

In this section is where the major part of the times are tracked since it is where the transfer of data takes place. It has already been briefly introduced that inside this stage there are two different timing parameters to be extracted (see Timing parameters, subsection 2.1.2 on page 6). The first is the one regarding the RTT times and the other concerns the time between packets. It is important to make this distinction since although they are exposed in the same section their nature and purpose are not related.

5.4.1 RTT times

This section includes the values obtained for the RTT, that is those defined as PCK-ACK and DAT- ACK. Note that the first parameters include also the values considered in the initialization and termination processes, that is the SYN-SYNACK, the SYNACK-ACK and the FIN-ACK times. They represent a more exhaustive characterization of the network and protocol response than the handshake parameters since here are considered not only the first RTT but all those seen per each flow.

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 300 1,054 295 1,196 302 1,002 p2p 807 2,882 710 2,779 874 2,949 gaming 227 287 214 100 235 353 tunnel 324 1,090 228 795 381 1,229 business 302 506 96 114 429 603 voip 344 1,066 337 875 349 1,185 im 414 1,981 314 1,102 481 2,393 streaming 197 687 189 468 197 692 mobile 517 2,137 363 1,587 653 2,518 control 229 531 162 337 293 658 mail 305 1,244 228 1,410 416 939 management 225 498 209 523 243 477 database 251 330 195 276 302 363 filetransfer 457 980 283 979 488 977 web 259 2,032 201 804 280 2,292 conference 265 414 212 109 324 586

Table 5.16 – DAT-ACK times TCP ISP_Core (ms)

56 5.4 Data transfer

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 338 1,700 377 2,038 322 1,532 p2p 798 2,926 711 2,822 860 2,997 gaming 227 289 215 106 234 356 tunnel 326 1,868 261 1,519 370 2,069 business 274 395 82 143 425 459 voip 348 1,149 339 934 354 1,286 im 396 1,994 309 1,156 458 2,416 streaming 195 885 197 757 195 889 mobile 510 2,111 360 1,565 644 2,489 control 224 538 152 409 293 629 mail 303 1,369 231 1,520 402 1,119 management 222 527 227 520 218 535 database 253 2,357 215 3,394 289 341 filetransfer 447 1,221 278 1,473 481 1,161 web 246 1,973 213 1,239 264 2,207 conference 265 601 213 609 323 587

Table 5.17 – PCK-ACK times TCP ISP_Core (ms)

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 496 1,897 195 1,560 712 2,080 p2p 1,089 3,767 865 3,485 1,304 4,006 gaming 243 1,286 191 1,839 290 335 tunnel 380 1,337 96 428 489 1,537 business 460 1,498 48 79 718 1,864 voip 323 914 166 285 484 1,249 im 397 1,920 159 679 576 2,458 streaming 470 1,262 94 764 476 1,267 mobile 931 2,689 69 868 2,232 3,767 mail 946 2,782 71 589 1,295 3,202 management 195 563 137 467 255 641 filetransfer 299 658 98 616 426 652 web 759 2,483 118 1,690 985 2,671

Table 5.18 – DAT-ACK times TCP ISP_Mob (ms)

57 5.4 Data transfer

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 451 2,422 215 1,637 687 2,990 p2p 1,075 3,691 911 3,662 1,235 3,712 gaming 244 1,286 194 1,839 289 335 tunnel 356 1,332 90 520 474 1,549 business 416 1,848 41 146 714 2,431 voip 317 900 160 303 481 1,227 im 381 1,852 153 684 560 2,382 streaming 464 1,256 88 639 474 1,267 mobile 848 2,548 68 832 1,997 3,582 mail 900 3,037 71 579 1,263 3,561 management 192 550 128 450 258 630 filetransfer 293 629 104 569 423 635 web 595 2,181 91 1,277 873 2,501

Table 5.19 – PCK-ACK times TCP ISP_Mob (ms)

5.4.1.1 ISP_Core

It can be seen, as it would be expected, that the difference of values obtained for the DAT-ACK (see table 5.16) and PCK-ACK (see table 5.17) is small. Remember that the unique difference is that in the DAT- ACK table only the ACK for the data packets is considered while in the PCK-ACK are all included (which basically correspond to add the initialization and termination processes). These data can be considered as a wide sample of the SYN-SYNACK and SYNACK-ACK and therefore it should show a similar behaviour.

The only reference found, somehow related with this results (PCK-ACK), is in [5] but just for HTTP traffic, and although the scenario (the capture is done next to the server) and date of that measure (August 2004) differ from the current study, the results are close to those seen in this trace, being the average 169 ms and the STD 402 ms (the averaged values for the web group are 259 ms the average and 2,032 ms the STD). The bigger STD is understandable since the composition of the traffic is much diverse in this situation (the scenario of [5] is a university web server).

Considering the assumption done for the capture point location (see definition 5.1) it can be determined that for most of the protocol groups it is mainly centred as it has been advanced for the initialization times (see subsection 5.4.1). Clear exceptions are the business, mobile, remote_control, mail and filetranfer groups. In all cases the capture point is closer to the server side. Considering the groups where the asymmetry is more extreme it is reasonable to think that the related servers are located close to the network core. On the other hand, for typical C2C services like p2p or voip the results are consistent since the point of capture seen is centred.

Linking this results with the obtained for the SYN-SYNACK and SYNACK-ACK (see subsection 5.4.1) it would be expected to see a correlation between the PCK-ACK and the SYN-SYNACK or SYNACK- ACK. Concretely it should be correlated the C2S direction with the SYN-SYNACK times and the S2C with

58 5.4 Data transfer the SYNACK-ACK. This is because the SYN-SYNACK determines always the C2S direction (see Timing parameters, definition 2.1 on page 6). This correlation is roughly achieved.

Some previous work found that the initial RTT would be a good indicator of the average one, but just considering HTTP traffic [5]. To be able to compare the RTT it has to be considered the behaviour already commented about the capture point (see definition 5.1). Thus, to that purpose it is taken the average value, which has been already obtained to study the protocol response (see table 5.14). According to this values, it is corroborated that the initial RTT is a good indicator for the global one for most of the protocol groups.

In the cases where it is not achieved, it is complex to find out a general cause to justify why in some cases this relationship is not seen. However, it is possible that this fact is related with the protocol behaviour and/or the network response. It is, if the average of the RTT times is lower in the handshake than for the rest of packets (PCK-ACK), it could mean that it is related to the protocol behaviour and/or the network treatment to it.6

5.4.1.2 ISP_Mob

In the same way done for the ISP_Core trace the values have been evaluated to find out where the capture point is in each case (see definition 5.1). Having a look either to the DAT-ACK (see table 5.18) or PCK- ACK (see table 5.19) is seen that in this scenario, for all the protocol groups, the capture point is nearer to the server side, although physically it is nearer to the client side. This makes sense if it is considered that this trace is recorded in a mobile network operator environment, where the longer times occur between the terminal and the SGSN.

Evaluating the average and the STD values is seen a similar behaviour than in the ISP_Core trace, for instance that the RTT times for the mobile and p2p group is relatively high. However, in general terms and considering the average of both directions the times are higher in this scenario except for the gaming, tunnel, voip and im groups. It is complex to explain the reasons why for some protocol groups appear this similitude while for others the times are quite different as in the mail and web groups. It may happens that this is related with the terminal and edge network behaviour. Note that for some protocols the RTT in the S2C direction is significantly higher than in the C2S one. That could mean that the treatment of the terminal or the network to that protocol is not prioritised. In the opposite way, when this difference is lower, then this kind of traffic may be treated with some prioritisation. This theory is aligned with the relation seen for the gaming and voip groups, since in a normal situation they are the most priority groups to be transmitted.

About the values seen for the STD it is noticeable that there is not an excessive higher variability in this scenario, which could be expected as the effect of being under a variable environment such as the radio mobile link. Comparing this value in terms of the average between each protocol group the relation is quite similar for each of them, as it also happened for the initialization times (see subsection 5.4.1).

Regarding the correlation between the initial RTT (see table 5.15) and the averaged (see table 5.19) ones the behaviour is similar to that seen for the ISP_Core trace.

6It has been tried to see a relationship between the protocol response fastness and the lack of correlation between the PCK-ACK and the SYN-SYNACK, SYNACK-ACK, PCK-ACK and FIN-ACK times but without any conclusive results

59 5.4 Data transfer

5.4.2 Times between packets

In this section are shown the DAT-DAT and PCK-PCK times. From these values it is possible to extract relevant information for the characterization of protocols. Also, the times between packets are a relevant value to approximate an adjusted time for the inactivity period or time-out (see Introduction, chapter 1 on page 1). This parameter can be extracted from this values because here is seen the statistic time between packets per group, and therefore can be calculated what is the expected time to wait until a new packet is seen.

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 2,137 14,585 8,673 41,852 2,566 16,870 p2p 1,772 8,077 3,763 14,747 2,741 11,162 gaming 372 1,521 1,020 4,274 574 2,271 tunnel 1,334 13,057 3,465 24,198 1,584 23,618 business 465 8,114 1,239 14,705 678 10,647 voip 2,656 12,609 5,571 19,021 4,825 18,193 im 4,321 16,135 12,360 37,903 6,167 38,115 streaming 136 2,960 3,230 18,987 137 3,055 mobile 11,910 51,559 25,338 73,676 21,053 67,378 control 750 7,473 1,539 14,046 1,432 13,459 mail 746 10,423 826 11,717 1,945 19,036 management 1,278 10,442 7,825 33,821 2,634 13,823 database 1,159 18,511 2,643 28,363 2,122 27,929 filetransfer 533 46,120 2,944 130,181 589 49,281 web 880 9,269 4,714 24,456 911 9,843 conference 379 1,883 657 2,340 685 26,765

Table 5.20 – DAT-DAT times TCP ISP_Core (ms)

5.4.2.1 ISP_Core

The information from the DAT-DAT times (see table 5.20) is useful to see the cadence at which the protocol or application sends the data, while in the PCK-PCK times (see table 5.21) the empty TCP packets (without payload) are also included. In this situation there is not a high correlation between these values. It would be

60 5.4 Data transfer

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 1,519 12,866 2,769 17,846 2,722 16,741 p2p 1,196 7,084 2,216 9,201 2,217 9,593 gaming 209 1,319 448 1,777 384 1,732 tunnel 1,029 11,080 2,004 15,412 1,737 14,706 business 513 5,906 720 5,722 1,099 8,915 voip 1,688 9,981 3,161 13,677 3,450 14,137 im 2,715 13,356 5,240 17,857 5,302 18,390 streaming 120 3,200 286 4,842 181 3,871 mobile 7,929 42,291 15,317 58,228 15,188 57,701 control 469 5,890 940 8,278 923 8,298 mail 518 8,259 802 10,435 1,040 11,959 management 1,494 12,774 2,430 17,382 3,313 19,552 database 685 13,822 1,455 20,215 1,336 19,476 filetransfer 390 36,037 855 55,946 647 46,807 web 1,127 13,383 2,294 18,757 1,827 17,586 conference 240 1,592 482 2,188 469 2,208

Table 5.21 – PCK-PCK times TCP ISP_Core (ms)

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 2,878 23,551 3,427 25,744 4,719 27,850 p2p 17,310 64,100 26,099 77,177 18,017 70,828 gaming 199 2,997 295 3,816 374 3,948 tunnel 454 7,215 572 8,284 1,061 12,091 voip 306 7,888 509 10,211 404 10,430 im 151 1,471 209 1,829 198 3,171 streaming 4,375 37,134 7,068 47,118 5,846 43,562 management 11,161 55,806 20,342 73,220 21,947 79,760

Table 5.22 – PCK-PCK times UDP ISP_Core (ms)

61 5.4 Data transfer

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 3,927 21,782 9,151 36,518 4,528 28,093 p2p 2,730 10,502 5,216 18,813 4,496 16,189 gaming 351 2,210 709 3,620 629 2,910 tunnel 2,002 25,592 9,389 55,821 2,202 27,633 business 4,595 26,668 2,943 20,096 995 10,400 voip 3,315 14,432 6,897 20,893 6,035 22,732 im 3,413 20,736 7,273 32,780 5,440 27,371 streaming 97 2,346 13,520 31,417 96 2,342 mobile 393 2,952 597 4,016 303 2,292 mail 1,624 17,720 5,401 37,683 1,442 13,898 management 389 5,092 794 8,145 486 2,539 filetransfer 500 13,365 1,334 23,793 694 16,078 web 610 5,418 4,503 15,401 572 4,995

Table 5.23 – DAT-DAT times TCP ISP_Mob (ms)

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 1,796 14,293 3,043 18,868 3,526 19,840 p2p 1,907 8,660 3,475 11,648 3,579 13,058 gaming 109 824 139 946 473 2,520 tunnel 1,396 20,782 2,778 29,860 2,414 27,576 business 2,560 20,718 5,133 30,109 4,555 24,701 voip 1,976 11,315 3,744 15,264 3,892 15,843 im 1,897 15,667 3,587 21,663 3,894 22,373 streaming 87 2,269 216 3,564 117 2,870 mobile 235 2,549 439 3,258 457 3,628 mail 1,057 13,678 2,135 20,005 1,957 18,696 management 316 4,374 604 6,055 682 6,527 filetransfer 427 9,819 700 13,992 808 14,469 web 704 7,349 1,364 9,793 1,117 8,861

Table 5.24 – PCK-PCK times TCP ISP_Mob (ms)

62 5.4 Data transfer

group avg std avg_c_s std_c_s avg_s_c std_s_c generic 658 10,326 812 11,720 1,925 17,854 p2p 5,745 38,944 7,897 46,502 18,621 72,616 gaming 1,270 12,119 2,582 17,201 2,305 16,018 tunnel 959 10,502 1,884 14,705 1,844 15,588 voip 1,315 19,728 2,573 27,625 2,298 26,018 im 192 3,110 562 5,387 189 4,990 streaming 121 664 200 2,233 199 1,228 mobile 11,052 54,953 24,108 81,437 20,845 75,606 management 4,602 33,794 8,605 45,977 11,672 54,070

Table 5.25 – PCK-PCK times UDP ISP_Mob (ms)

expected to see longer times for the first group since it takes just the times between data packets while in the second is considered the time between any packet. It happens like this for most of the protocol groups but not for the business, network_management and as a more relevant case for web.

A justification of this result (at least for the web group) is to consider the S2C behaviour of this kind of traffic, which mainly correspond to HTTP. It means that in a standard process a client queries data, the server sends it and waits for more queries or for a FIN message. That implies that the server side almost does not send ACK packets, resulting in a PCK-PCK time longer than the DAT-DAT one because of the times seen in the initialization and termination processes. This behaviour is shown following (see figure 5.4). The black intervals represent the DAT-DAT times, while the red intervals in addition to the black ones represent the PCK-PCK. In this situation the DAT-DAT time would be clearly smaller than the PCK-PCK one.

The values seen for the STD and for both the DAT-DAT and PCK-PCK times are useful to see some interesting properties. For example it can be a good indicator of periods of inactivity when this value is high in comparison with the average. A good example for this effect would be the filetransfer case, where the STD shows a factor greater than 80 times the average. This make sense when thinking in protocols like the File Transfer Protocol (FTP), where connections can remain open for several hours even though there is no data to transmit, or considering the direct downloads behaviour, where it is common to have long relative waiting times before the data transfer start.

Comparing the DAT-DAT values by C2S and S2C directions it is useful to examine how symmetric is the communication. Note that the same exercise does not apply over the PCK-PCK times. That is, comparing the field PCK-PCK between the directions C2S and S2C. This difference is mainly due to the existence of acknowledgement processes. Also it could happen due to the same principle described for the web group (see figure 5.4). In this context, clear asymmetric groups are the web, the filetransfer and of course the streaming. On the other hand, clear symmetric groups are p2p, the voip and the conference groups. There are no odd results in this sense.

63 5.4 Data transfer

Figure 5.4 – HTTP behaviour The intervals marked as red are those making appear big values for the PCK-PCK times

On a different matter, when analysing the times themselves instead of comparing the results some particularities expected are not observed. The main inconsistency lies on the low cadence of data packets shown by protocol groups which are expected to behave in a much more active way. A clear instance is the voip group, where the DAT-DAT times per side are around 5 seconds. Also the gaming and conference groups would be expected to show lower values, however the disparity is not as outstanding as in the voip group.

As a first step to find a justification a filtering of the flows carrying a certain volume of data must be performed. This is done since many of the flows marked as voip do not seem to carry real conversation flows but signalling or control data. That is, lasting flows with not many packets or data transmitted. There are also many flows carrying a significant amount of data but with large durations (even some hours) showing a not common rate of DAT-DAT times. This could be understood as long periods of inactivity. Indeed, analysing arbitrary traces containing Skype and Session Initiation Protocol (SIP) conversations in both cases are seen relatively large periods of inactivity. As a result of this, filtering those flows transmitting at least 5 kB of data does not improve the values seen (they keep around the 5 seconds). However, analysing flow by flow it has been seen that those behaving like real conversation show a value in this field of about 10 ms.

Partially as a result of this fact it is useful to complement the TCP study with the UDP flows (see Application Module TCP Flow: UDP, section 4.4 on page 34). This can help to take a better interpretation of the data and see whether the behaviour of the voip group is similar over UDP. So the PCK-PCK times are shown (see table 5.22) and analysed later on.

It is seen that although the times over UDP for the voip group are lower (times around 0.5 seconds) than over TCP it is not yet as low as would be expected. Again, analysing separately under the same conditions as explained for the TCP does not improve this result. It would be interesting to inquire into this issue to be

64 5.4 Data transfer able to see what is the behaviour of voip conversations flows, however this is not the aim of this study and can be left as further work.

Continuing with the times extracted from the UDP traffic (see table 5.22) are remarkable the high values seen for the p2p, streaming and network_management groups. The p2p case agrees with the assumption made when exposing the general parameters of the UDP traffic (see table 5.8). That is, if UDP carries control flows (or specific information about P2P protocols like tracking tables, etc.) it is not strange to see this high values. Also, the values seen for streaming group seem to be not real data traffic since lower values would be expected and indeed appear when looking the TCP flows.

Leaving aside all the comments and findings resulting of the times between packets it is also interesting to evaluate these numbers in order to have a first approximation on what times are expected to see between packets. This is strongly related with the time-out that should be set up to consider when a flow is finished after a period of inactivity, which remember is the topic that triggered this project.

Returning to the PCK-PCK values for both TCP (see table 5.21) and UDP (see table 5.22), at a first glance can be seen that the statistical values are far out under the current time established of 600 seconds. If it is considered the range of values inside the interval {average + 2.5 ∗ STD} and unidirectional flows (the maximum resulting of the C2S or S2C directions), which would represent the worst case, then the maximum time to consider would be around 160 seconds for the mobile group or 140 seconds for the filetransfer one. However, this numbers are just a rough approximation and this issue is discussed deeply further on, taking into account the probability distribution curves of this set of data (see Set up and evaluation of time-outs, chapter 6 on page 75).

5.4.2.2 ISP_Mob

Again, useful information for the purpose of this work can be extracted from the ISP_Mob trace for the DAT-DAT (see table 5.23) and the PCK-PCK (see table 5.24) statistical values. There is a quite strong relationship with the data obtained for the ISP_Core trace. With the exception of the business, mobile, mail and network_management groups and with small variations in the values the explanation followed for the other trace applies also here.

The differences seen for the business group is that the time between packets is considerably higher in this scenario. Looking into single flows and comparing the behaviour of the main protocol in this group (which is the HTTP Application Activesync) it is mainly similar in both scenarios. The difference comes form some exceptions seen in the ISP_Mob trace. It includes relative high values compared with the rest. Indeed, the average duration of the business group in this scenario is notably higher than in the ISP_Core trace. This behaviour could be related with the performance of this protocol over this network.

The mail group also shows a higher time between packets in this scenario, being the case for packets in the C2S direction specially different from the ISP_Core trace. In this scenario the average duration is higher so that could be again the reason for this difference.

The mobile group is mainly composed by Multimedia Messaging System (MMS) flows so in this situation is understandable the difference seen with the ISP_Core trace (remember that this group was composed

65 5.4 Data transfer basically with Blackberry application traffic). The last group where notable differences are seen is the network_management, being the times lower for this scenario. This different behaviour is mainly given due to the appearance of many protocols with a low value for the time between packets in the ISP_Mob trace while this group is basically composed by Domain Name Server (DNS) flows in the ISP_Core trace.

It is repeated the process done in the section above for the ISP_Core trace to see what could be an adjusted time-out. The results here are even more distant to the 600 seconds, resulting on a worst case of around 80 seconds for the business group considering the {average+2.5∗STD} interval and unidirectional flows.

It is relevant to notice that in this scenario the behaviour seen for the UDP traffic in the streaming group is completely different that in the ISP_Core environment. The proportion of this traffic is notably lower in this case, and looking the times between packets appear to be more close to real streaming data (no control flows). In contrast, the gaming and voip do not show low values as it would be expected. Nevertheless, the behaviour of the p2p group is similar.

66 5.5 Termination

5.5 Termination

Finally, as a last set of results are now exposed the considered as termination times when the process is the standard FIN-ACK and when it is performed after a RST message sending. Also it is shown what is the proportion of flows finished after a standard FIN-ACK process or after a RST message. These data will be important to be considered when establishing time-out for the tracking table.

In this last section is necessary to introduce a particular behaviour seen in some flow termination. It is, when the flows are treated by a blocking scenario (see figure 5.5). It means that, even the packets are seen in the trace, the behaviour of the communication is like these packets were not arriving to the destiny. The impact of these behaviour is analysed following (see subsection 5.5.2).

Figure 5.5 – Blocking scenarios RST behaviour After a RST is seen can appear large values for the time between packets (black braces), mainly in blocking scenarios like in the one illustrated

In the following table the three first columns represent the way of finalizing in the two possible modes already described (see Connection process, subsection 2.1.1 on page 4) but also when the flows finishe after the time-out or they are still ongoing when the trace is finished. The fourth column shows the proportion of RST processes behaving like in a blocking scenario (i.e. when one or both sides can not receive messages) in proportion of the total RST. Finally, the last column contains the total proportion of termination processes occurred after a RST in a blocking scenario. It is, taking the percentage of the fourth column at the value in the third column results in the value in the fifth one.

The proceeding to collect these values is clear considering that all the times have been tracked. The consideration of RST processes in a blocking scenario is done when after a RST packet is seen, the time needed for finishing the real flow is different of zero.

67 5.5 Termination

group std_fin unclosed rst rst_block/rst rst_block generic 46.41% 14.01% 39.58% 33.51% 13.26% p2p 45.91% 9.28% 44.81% 40.39% 18.10% gaming 65.29% 10.87% 23.84% 52.25% 12.46% tunnel 46.25% 17.02% 36.74% 28.06% 10.31% business 97.16% 0.72% 2.12% 15.75% 0.33% voip 66.24% 8.47% 25.28% 39.87% 10.08% im 53.98% 10.95% 35.07% 10.96% 3.84% streaming 32.50% 24.49% 43.00% 38.85% 16.71% mobile 7.54% 16.12% 76.34% 8.00% 6.10% control 82.55% 2.84% 14.61% 66.38% 9.70% mail 50.10% 3.49% 46.41% 19.38% 8.99% management 73.97% 6.93% 19.10% 7.74% 1.48% database 98.37% 0.57% 1.06% 22.73% 0.24% filetransfer 46.27% 27.34% 26.39% 42.14% 11.12% web 50.73% 25.28% 23.99% 16.59% 3.98% conference 76.92% 2.37% 20.72% 33.15% 6.87%

Table 5.26 – Termination mode proportions ISP_Core

5.5.1 Standard termination times

This category shows the RTT times seen in the finalization process (see Connection process, subsection 2.1.1 on page 4). In the same way than the initialization ones, they represent a small sample of the global PCK-ACK times and some correlation with those values and also with SYN-SYNACK and SYNACK-ACK would be expected.

5.5.1.1 ISP_Core

Analysing the results (see table 5.26) a generic but unexpected finding is that the proportion of flows terminated after a FIN-ACK process is not as high as would be expected since it is defined as the common way. In other words, the ratio of flows finished after a RST process is quite high. Also the finalization of flows after the time-out is higher than the expected, specially for the streaming (24.5%), filetansfer (27.3%) and web (25.3%) groups. This fact is complex to justify since it is not easy to know whether this happens due to the limitation of 600 seconds, to the effect of truncating ongoing flows or the real behaviour of these protocol groups.

68 5.5 Termination

group std_fin unclosed rst rst_block/rst rst_block generic 31.69% 21.92% 46.39% 21.49% 9.97% p2p 64.14% 10.43% 25.43% 25.97% 6.60% gaming 50.00% 7.75% 42.25% 8.33% 3.52% tunnel 25.79% 24.76% 49.46% 25.16% 12.44% business 58.51% 14.19% 27.31% 10.33% 2.82% voip 65.86% 13.74% 20.39% 43.99% 8.97% im 36.78% 40.81% 22.40% 35.11% 7.87% streaming 38.77% 12.62% 48.61% 75.15% 36.53% mobile 48.78% 37.65% 13.58% 11.37% 1.54% mail 30.93% 39.33% 29.74% 33.98% 10.11% management 66.07% 3.59% 30.34% 33.55% 10.18% filetransfer 40.60% 10.52% 48.88% 59.54% 29.10% web 66.99% 18.73% 14.28% 30.00% 4.28%

Table 5.27 – Termination mode proportions ISP_Mob

group avg_fin_c_s std_fin_c_s avg_fin_s_c std_fin_s_c generic 430 2,753 527 2,588 p2p 1,035 5,451 1,379 6,406 gaming 220 148 313 1,929 tunnel 314 1,449 497 1,864 business 73 181 425 88 voip 528 3,375 1,048 5,207 im 350 2,236 416 2,086 streaming 230 821 313 2,588 mobile 206 41 345 909 control 63 1,286 302 206 mail 236 2,470 411 2,441 management 200 444 352 2,036 database 217 496 267 411 filetransfer 231 4,243 293 835 web 241 1,227 297 1,713 conference 224 614 331 807

Table 5.28 – FIN-ACK times ISP_Core (ms)

69 5.5 Termination

Figure 5.6 – Flow termination proportions Flow termination proportions up ISP_Core and down ISP_Mob. Standard FIN (green), unclosed (yellow), RST (red) and RST in blocking scenario (intense red)

70 5.5 Termination

group avg_fin_c_s std_fin_c_s avg_fin_s_c std_fin_s_c generic 260 2,195 677 5,286 p2p 2,437 15,268 1,296 2,701 gaming 853 2.88 276 386 tunnel 88 1,341 749 3,212 business 47 235 1,162 3,589 voip 135 616 693 2,475 im 140 1,286 671 2,363 streaming 119 940 471 1,988 mobile 19 58 410 1,535 mail 117 526 1,040 5,004 management 58 193 417 2.75 filetransfer 140 303 460 1,765 web 62 413 707 2,989

Table 5.29 – FIN-ACK times ISP_Mob (ms)

A comparison point for that results can be extracted from [12], where it is found that for HTTP connections 75% of cases are closed after a standard process (50.7% in the current trace) while 15% after a RST (24% in the current trace) and the other 10% mainly unclosed connections. Also is found out that around 60% of P2P connections are closed in the standard way (45.9% in the current trace) while around 20% are finished after a RST sending (44.8% in the current trace).

In both cases the current rate of finalization after a RST is higher in the obtained results (see table 5.26), which can be reasoned by considering that the traffic samples differ around three years which implies a considerable change in the network behaviour and traffic management evolution, in addition to the difference in the traffic scenarios.

Considering the FIN-ACK times (see table 5.28) would be expected to show a relationship with the PCK- ACK times (see table 5.17) in both cases and also with the SYN-SYNACK and SYNACK-ACK times (see table 5.12). This behaviour is mostly accomplished with only an exception seen for the FIN-ACK time in the S2C direction for the voip group, which has a relative high value for that time.

5.5.1.2 ISP_Mob

Again the values seen for the termination proportion (see table 5.27) are higher than the expected for both flows terminating after a time-out or after a RST process. In this case the behaviour is, besides some particular differences, similar to the one in the ISP_Core trace (see figure 5.6), but with a bigger proportion for the unclosed connections. That is normal considering that the proportion of ongoing flows is relatively high because of the shorter duration of the trace.

Comparing again the results with those obtained in [12] it is seen that in this scenario the similitude is stronger. Remember that in the commented study is concluded that for HTTP connections 75% of cases are

71 5.5 Termination closed after a standard process, 15% after a RST and the other 10% are mainly unclosed connections, while in the ISP_Mob these percentages are 70%, 14.3% and 18.7%. For P2P connections around 60% of the cases are closed in the standard way while around 20% are finished after a RST process, and currently they are 64.1% and 23.4%.

Considering the results for the FIN-ACK times (see table 5.29), again there is a relationship between these times and the RTT exposed before (see subsection 5.3). Only some disparities appear. For instance, the FIN- ACK time in the C2S direction for the gaming group is notably higher than the average of RTT. Other remarkable result is the low time seen for the mobile group in comparison with the average of PCK-ACK times, being over 4 times smaller.

72 5.5 Termination

5.5.2 Reset termination times

These times are interesting to be analysed, since they take place when the traffic behave like in a blocking scenario (see figure 5.5). The retransmission times for a TCP connection grow when a packet has not been acknowledged [15] up to high limits (the maximum recommended is 100 seconds [16]). Therefore, in some scenarios, it is normal to see high values since one of the sides or both are not able to see the packets coming from the other side. However, in a regular situation when one of the sides of the communication receive a FIN message it stops sending any packet immediately.

Hence, choosing a value too small for the time-out of the tracking table would affect the detection process. If the tracking time-out is too small then a single packet retransmitted after a long time could be considered as a new flow. That would imply more processing time for the detection since the PACE engine will try to identify it. On the other hand, establishing a small time-out implies having less flows active in the tracking table and then the reduction of memory requirements. This issue is taken up when estimating the time-out for the tracking table (see Set up and evaluation of time-outs, chapter 6 on page 75). Since the RST-END values are calculated from the first RST to the last packet seen in the flow they are not a good approximation. Therefore, to estimate the time-out it is useful to see what are the PCK-PCK times after the first RST (see figure 5.5).

group avg_rst_end std_rst_end avg_pck_rst std_pck_rst generic 30,602 242,820 8,632 43,455 p2p 34,775 106,033 9,910 27,270 gaming 9,758 90,741 428 10,405 tunnel 16,238 125,547 5,327 45,588 business 21,140 38,549 4,974 15,716 voip 65,313 596,669 11,058 31,845 im 57,783 252,651 7,191 62,414 streaming 22,994 77,307 1,938 13,264 mobile 295,740 1,120,126 22,026 74,212 control 2,721 27,363 1,304 17,690 mail 32,003 274,642 9,536 47,070 management 66,065 145,641 15,899 24,236 database 17,460 21,510 1,729 4,710 filetransfer 137,680 945,136 8,336 195,740 web 18,940 91,388 4,282 31,750 conference 3,598 19,559 933 9,769

Table 5.30 – RST times ISP_Core (ms)

73 5.5 Termination

group avg_rst_end std_rst_end avg_pck_rst std_pck_rst generic 34,276 138,999 6,722 33,147 p2p 15,095 53,451 6,354 18,664 gaming 57,834 128,352 765 5,673 tunnel 13,618 96,933 5,618 33,423 business 23,395 92,961 8,432 42,427 voip 20,311 91,428 4,753 29,569 im 8,819 68,851 1,964 17,671 streaming 7,449 50,738 344 4,060 mobile 7,007 42,840 840 8,223 mail 29,384 126,616 9,357 38,504 management 182 259 157 239 filetransfer 11,967 86,210 3,768 45,231 web 11,534 59,228 1,996 13,051

Table 5.31 – RST times ISP_Mob (ms)

5.5.2.1 ISP_Core

It is shown (see table 5.30) that the RST-END times are considerably higher than the rest of the obtained values, because of the effect just described (see subsection 5.5.2). Besides some exceptions the PCK-PCK times after a RST are around one order of magnitude below the RST-END.

Moreover, if these values are compared with the PCK-PCK times in a normal situation (see table 5.21) they are higher in average and STD, which is aligned with the behaviour expected in this scenario.

5.5.2.2 ISP_Mob

Finally here are the times after a RST (see table 5.31). Newly the PCK-PCK times in this situation are higher than in a standard one (see table 5.24), except for the network_management group probably due to not having enough samples.

These values are somehow related with the obtained for the ISP_Core trace (see table 5.30) for many of the protocol groups. But shows a completely different behaviour for others, particularly for the streaming, mobile and network_management. In the three of them the times are notably lower in this scenario.

74 6 Set up and evaluation of time-outs

It has already been introduced that the main goal of this work is to establish adjusted time-outs. This is not only an interesting value which has not been studied in depth, but also a beneficial parameter for PACE in order to achieve a more efficient performance. When the time-out is too low the memory requirements decrease but the absolute number of flows seen increases. That implies a higher effort for PACE to identify traffic since it has to start the detection process for each one of those split flows. Moreover, because of this same reason some flows would not be detected. On the other hand, if the time-out is excessively big, the detection rate and effort would be better, but the memory requirements higher. Then, it is intended to find a good middle point.

Throughout the last section a small approximation has been done considering that the PCK-PCK times are the best indicator to establish that value. However, in this chapter these values are deeply analysed and it is proposed an adjusted time-out for each protocol group. Also, it is performed an evaluation of the results in order to see the real impact of it and to approximate a global time-out value, for both the flow and the identity tracking table time-outs. The only value which can be compared to the results in this section is extracted from [13], where it is recommended a single time-out of 64 seconds. However, this work is old (1995) and the scenario and aims are quite distinct.

6.1 Flow tracking table time-outs

The fact of using the PCK-PCK statistics to do an estimation of the time-out is obvious, it is the indicator of what is the expected time to see between packets and with what variation. This parameter should be established considering always the worst case, so that is why only the unidirectional values are used (i.e. the C2S or S2C fields). Otherwise, as the averaged measure considers the time between packets seen in both directions the value is always lower. In addition, it would not apply considering unidirectional flows. Although they have not been considered to extract the results, it should not modify them.

The first step to extract these times is to establish an interval in which the percentage of data considered is representative enough. As said before (see Final values, subsection 4.3.1 on page 27) an extended method is to consider an interval taking into account the average of the values and a multiple of the STD, concretely the {average + 2.5 ∗ STD} interval. The drawback of using this method is the exclusion of some data. However it is expected that this data represents a low percentage of the total (i.e. those values considered as "ouliers" [14]).

The absolute maximum times observed have been tracked in order to take them into account if necessary (only for the TCP flows). Nevertheless these values do not result useful to estimate the maximum value for

75 6.1 Flow tracking table time-outs most of the cases since it is almost in all the protocol groups near to the 600 seconds, which is the time limitation of PACE to drop flows when an inactivity occurs. An exception is seen for the filetransfer group since it allows larger times of inactivity (for instance FTP flows can remain some hours active even though any packet is transmitted, and the time-out set up by PACE for this group is 3 hours).

Below are shown for both directions and separated by TCP and UDP flows the tables containing the values calculated in the defined interval. Also, which percentage of data has been used by discarding the values out of it. Note that the percentage of data refers to the proportion of values inside this interval but not of flows. That value (i.e. proportion of flows) is supposed to be higher since each flow contain several times and if only one of them does not fit in the interval neither the flow does.

6.1.1 TCP time-outs

As the information available for TCP flows is wider than for UDP, the parameters are defined separately. It has been introduced that there is a limitation when considering some scenarios where the termination is done after a RST process (see Reset termination times, subsection 5.5.2 on page 73). Although the proportion of flows showing this behaviour is relatively small in both the ISP_Core (see Termination mode proportions ISP_Core, table 5.26 on page 68) and the ISP_Mob (see Termination mode proportions ISP_Mob, table 5.27 on page 69), it is a good practise to keep it in mind. Thus, the time-out should be adjusted approximately as follows (see formula 6.1). The ttime−out is the proposed time-out, while the t\pck_rst and tpck\_pck are the values obtained getting the biggest value of the range {average + 2.5 ∗ STD} for the PCK-PCK standard and after a RST.

Formula 6.1 (Time-out range)

ttime−out ≈ t\pck_rst (6.1)

ttime−out ≈ tpck\_pck (6.2)

The values seen for the TCP traffic and for both traces are quite related except for some particular protocol groups. The differences are given due to the different behaviour on each network, but this lead into a more reliable final result by considering always the worst time seen. The accuracy of the results is significantly good, being the rate of data used over 99% for all the groups except for the control one in the ISP_Core trace.

Below are shown, in the same way done for the time-out tables, the calculated times considering again the same interval for the PCK-PCK times after a RST is seen for the ISP_Core (see table 6.3) and for the ISP_Mob (see table 6.4) traces.

76 6.1 Flow tracking table time-outs

group avg_c_s std_c_s time_c_s pctg_c_s avg_s_c std_s_c time_s_c pctg_s_c generic 2,769 17,846 47,384 99.53% 2,722 16,741 44,574 99.42% p2p 2,216 9,201 25,219 99.09% 2,217 9,593 26,200 99.13% gaming 448 1,777 4,890 99.87% 384 1,732 4,714 99.87% tunnel 2,004 15,412 40,535 99.45% 1,737 14,706 38,501 99.33% business 720 5,722 15,024 99.58% 1,099 8,915 23,386 99.36% voip 3,161 13,677 37,354 99.32% 3,450 14,137 38,792 99.12% im 5,240 17,857 49,881 99.51% 5,302 18.39 51,276 99.55% streaming 286 4,842 12,392 99.82% 181 3,871 9,859 99.85% mobile 15,317 58,228 160,886 100.00% 15,188 57,701 159,441 99.34% control 940 8,278 21,634 98.31% 923 8,298 21,667 98.92% mail 802 10,435 26,889 99.64% 1,040 11,959 30,938 99.54% management 2.430 17,382 45,886 99.42% 3,313 19,552 52,192 99.49% database 1,455 20,215 51,993 99.99% 1,336 19,476 50,025 99.98% filetransfer 855 55,946 140,720 99.87% 647 46,807 117,664 99.90% web 2,263 18,216 47,804 99.56% 1,800 17,075 44,488 99.52% conference 482 2,188 5,953 99.83% 469 2,208 5,988 99.89%

Table 6.1 – Time-out TCP ISP_Core (ms)

group avg_c_s std_c_s time_c_s pctg_c_s avg_s_c std_s_c time_s_c pctg_s_c generic 3,043 18,868 50,214 99.38% 3,526 19,840 53,126 99.23% p2p 3,475 11,648 32,593 99.84% 3,579 13,058 36,225 99.67% gaming 139 946 2,503 99.63% 473 2,520 6,772 99.34% tunnel 2,778 29,860 77,426 99.61% 2,414 27,576 71,353 99.46% business 5,133 30,109 80,406 99.66% 4,555 24,701 66,308 99.87% voip 3,744 15,264 41,904 99.94% 3,892 15,843 43,500 99.87% im 3,587 21,663 57,744 99.81% 3,894 22,373 59,826 99.81% streaming 216 3,564 9,125 99.60% 117 2.87 7,292 99.72% mobile 439 3,258 8,584 99.81% 457 3,628 9,526 99.73% mail 2,135 20,005 52,147 99.95% 1,957 18,696 48,697 99.89% management 604 6,055 15,740 99.65% 682 6,527 16,999 99.72% filetransfer 700 13,992 35,681 99.81% 808 14,469 36,980 99.77% web 1,364 9,793 25,845 99.51% 1,117 8,861 23,269 99.43%

Table 6.2 – Time-out TCP ISP_Mob (ms)

77 6.1 Flow tracking table time-outs

group avg std time pctg generic 8,632 43,455 117,269 99.67% p2p 9,910 27,270 78,086 99.30% gaming 428 10,405 26,440 99.88% tunnel 5,327 45,588 119,298 99.78% business 4,974 15,716 44,263 97.65% voip 11,058 31,845 90,671 99.74% im 7,191 62,414 163,225 99.84% streaming 1,938 13,264 35,098 99.54% mobile 22,026 74,212 207,556 99.83% control 1,304 17,690 45,529 99.85% mail 9,536 47,070 127,211 99.86% management 15,889 24,236 76,480 100.00% database 1,729 4,710 13,504 99.75% filetransfer 8,386 195,740 497,736 99.86% web 4,282 31,750 83,657 99.79% conference 933 9,769 25,355 99.71%

Table 6.3 – PCK-RST time-out ISP_Core (ms)

As can be seen the values obtained for the PCK-PCK times after a RST are generally higher than the times obtained from the standard PCK-PCK times. Nonetheless, it is important to remember that these kind of flows represent a low proportion of all, although considerably enough. The largest case is 18.1% for the ISP_Core (see Termination mode proportions ISP_Core, table 5.26 on page 68) and 36.5% for the ISP_Mob (see Termination mode proportions ISP_Mob, table 5.27 on page 69).

As these values are limiting the establishment of a time-out (see formula 6.1) at this point is interesting to see what is the proportion of data considered for a specific time-out. Therefore, considering the data out of the interval used as "outliers" has been extracted the data necessary to approximate the CDF of these times (see Probability distributions, section 2.3 on page 11).

Only a couple of representations are shown in order to see that in some cases is a good approximation to select a lower value while in others is not (see figure 6.1). However, is listed in an approximate way how is this variation in the different groups. A strong variation means that selecting a lower time-out has a strong impact one the proportion of packets inside that interval, while when the variation is soft it means just the opposite, the time-out can be decreased without noticing a significant variation in the percentage of data.

Hence, a strong variation is seen for the p2p, voip, streaming, remote_control, network_management and database groups. A medium variation for the generic, business, tunnel, im, mobile, mail and web groups. Finally, a slow variation is seen for the gaming, filetransfer and conference groups.

78 6.1 Flow tracking table time-outs

group avg std time pctg generic 6,722 33,147 89,589 99.52% p2p 6,354 18,664 53,014 98.35% gaming 765 5,673 14,947 97.62% tunnel 5,618 33,423 89,177 98.57% business 8,432 42,427 114,498 97.97% voip 4,753 29,569 78,675 99.89% im 1,964 17,671 46,142 99.48% streaming 344 4,060 10,495 99.01% mobile 840 8,223 21,397 100.00% mail 9,357 38,504 105,618 99.05% management 157 239 755 98.31% filetransfer 3,768 45,231 116,846 100.00% web 1,996 13,051 34,623 98.53%

Table 6.4 – PCK-RST time-out ISP_Mob (ms)

Figure 6.1 – PCK-PCK after RST pocess CDF The variation of values for some groups as p2p is fast while for others, like filetransfer, is softer

79 6.1 Flow tracking table time-outs

In all the cases the CDF can be approximated by an exponential distribution (see Gamma and exponential distributions, subsection 2.3.2 on page 13). Notice that what it is being considered as 100% of the values are in fact the percentages obtained for the ISP_Core (see table 6.3) and for the ISP_Mob (see table 6.4) in the tables above. In any case, the target of this interpretations is which groups can be approximated without loosing accuracy on the data under consideration.

Thus, in the table below is proposed a time for each group considering the available information (see table 6.5). Note that for almost all the cases the limitation is due to the PCK-PCK times after a RST. However, the proposed values are adjusted violating the definition made (see formula 6.1). It is done considering the percentage of flows terminating in this way in both the ISP_Core (see Termination mode proportions ISP_Core, table 5.26 on page 68) and the ISP_Mob (see Termination mode proportions ISP_Mob, table 5.27 on page 69), and what is the impact of choosing a lower value as described above. Of course it is just an approximation and depending on the level of detection followed the higher value chosen the better it will perform, needing, on the other hand, more memory. This relation is studied when choosing a single time-out value (see subsubsection 6.1.3.1).

group pck_rst pck_pck time-out generic 117,269 53,126 60,000 p2p 78,086 36,255 60,000 gaming 26,440 6,772 10,000 tunnel 119,298 77,426 80,000 business 114,498 80,406 90,000 voip 90,671 43,500 70,000 im 163,225 59,826 60,000 streaming 35,098 12,392 30,000 mobile 21,397 9,526 10,000 control 45,529 21,667 30,000 mail 127,211 52,147 70,000 management 76,480 52,192 70,000 database 13,504 51,993 60,000 filetransfer 497,736 140,720 150,000 web 83,657 47,804 50,000 conference 25,335 5,988 10,000

Table 6.5 – Time-out TCP (ms)

It is important to highlight that these time-outs are just guideline values, and depending on each application scenario they should be adjusted to the corresponding traffic. To better understand the criteria used to select the rounded time-out values are compared the p2p and the im groups, since in both cases the time-out is set up to 60 seconds although the PCK-PCK parameters differ. For the p2p group the time-out is closer to the RST limitation (∼78 seconds) because the proportion of flow behaving as in a blocking scenario is high, around 18% (see Termination mode proportions ISP_Core, table 5.26 on page 68), and the variation of the PCK-PCK times after a RST and the time-out is strong. On the other hand, for the im group, the RST

80 6.1 Flow tracking table time-outs limitation (∼163 seconds) is neglected because the percentage of flows behaving like in a blocking scenario is less than 4% (see Termination mode proportions ISP_Core, table 5.26 on page 68), and the variation of the PCK-PCK after a RST is considered as medium.

The value chosen for the mobile group is according to the data extracted from the ISP_Mob since the values seen in the ISP_Core one are not representative for this group. Remember they are composed basically by Blackberry application traffic and the number of flows seen is low (see Generic data, subsection 5.2.2 on page 42).

81 6.1 Flow tracking table time-outs

6.1.2 UDP time-outs

In this situation, to establish a global time of a global time is easier than in the TCP case since it is just extracted from the PCK-PCK parameters. Below are shown the values for both the ISP_Core (see table 6.6) and the ISP_Mob (see table 6.7) traces.

group avg_c_s std_c_s time_c_s pctg_c_s avg_s_c std_s_c time_s_c pctg_s_c generic 3,427 25,744 67,785 98.50% 4,719 27,850 74,343 98.09% p2p 26,099 77,177 219,043 97.85% 18,017 70,828 195,088 97.52% gaming 295 3,816 9,836 99.33% 374 3,948 10,243 99.14% tunnel 572 8,284 21,284 99.57% 1,061 12,091 31,289 99.71% voip 509 10,211 26,037 99.67% 404 10,430 26,478 99.74% im 209 1,829 4,782 99.35% 198 3,171 8,125 99.47% streaming 7,068 47,118 124,862 98.32% 5,846 43,562 114,751 98.44% management 20,342 73,220 203,393 97.36% 21,947 79,760 221,346 96.90%

Table 6.6 – Time-out UDP ISP_Core (ms)

group avg_c_s std_c_s time_c_s pctg_c_s avg_s_c std_s_c time_s_c pctg_s_c generic 812 11,720 30,113 99.43% 1,925 17,854 46,561 98.95% p2p 7,897 46,502 124,151 98.16% 18,621 72,616 200,162 97.16% gaming 2,582 17,201 45,584 96.31% 2,305 16,018 42,350 96.58% tunnel 1,884 14,705 38,645 99.57% 1,844 15,588 40,814 99.60% voip 2,573 27,625 71,636 98.84% 2,298 26,018 67,344 98.96% im 562 5,387 14,030 98.59% 189 4,990 12,665 99.73% streaming 200 2,233 5,782 99.78% 199 1,228 3,269 97.91% mobile 24,108 81,437 227,699 99.22% 20,845 75,606 209,860 99.17% management 8,605 45,977 123,546 99.15% 11,672 54,070 146,846 97.97%

Table 6.7 – Time-out UDP ISP_Mob (ms)

Although the percentage of data used is still good it is lower than in the TCP analysis (having 96.3% of usage in the worst case). This lower rate is anyway reasonable considering that UDP nature is more unstructured than the TCP one. From these tables are extracted the final proposed values again, taking the larger value in each situation and considering the sensitivity to a variation of values. For that purpose, as done for the PCK- PCK times after a RST, is studied the variation of the PCK-PCK in the UDP scenario. The p2p, mobile and network_management groups show a strong variation. The generic, tunnel and streaming groups show a medium variation. Lastly the gaming, voip and im groups have a low variation.

82 6.1 Flow tracking table time-outs

Figure 6.2 – PCK-PCK for UDP flows CDF The variation of values for some groups as mobile is fast while for others, like voip, is softer

Although only one direction is used to set the time-out (the worst case one) both are represented. Again, the CDF can be approximated by an exponential distribution (see Gamma and exponential distributions, subsection 2.3.2 on page 13). Considering these variations are proposed the times below (see table 6.8). Again, they represent an approximation and depending on the compromise of detection and memory requirements should be adjusted.

group pck_pck time-out generic 74,343 70,000 p2p 219,043 220,000 gaming 45,584 40,000 tunnel 40,814 40.00 voip 71,636 70,000 im 14,030 10,000 streaming 124,862 120,000 mobile 227,699 230,000 management 221,346 230,000

Table 6.8 – Time-out UDP (ms)

83 6.1 Flow tracking table time-outs

6.1.3 Time-outs evaluation

Having a single value for each protocol group is useful since depending on the scenario where the detection process takes place may be convenient to set one value or other. Then, a good evaluation process would have been to set up the proposed time-outs for each protocol group and see its performance. However, PACE does not include that feature and it would imply a big modification to make it available. Thus, an alternative method is proposed following (see subsubsection 6.1.3.2) see a rough approximation of what is the impact of different time-outs in each protocol group.

On the other hand, for a generic scenario and attending to the current configuration of PACE is more appropriate to establish a single value that resumes all the considerations. It is intended to substitute the current value of 600 seconds. With this configuration it is easy to see the performance impact.

6.1.3.1 Single time-out

It is complex to set a general optimal value since there are three major variables to be considered, the detection rate, the detection effort and the requirement of memory. Thus, based on the range of values obtained is performed an evaluation of what is the real impact of modifying the time-out in terms of detection and memory.

To that purpose, the traces have been processed ranging the time-out from 20 seconds to 300 seconds in steps of 20 seconds. In addition, it is included the values obtained for the initial time-out (600 seconds). The selection of this range is done with a margin big enough to include all the values obtained in for the TCP time-out (see table 6.5) and for UDP time-out (see table 6.8). Once more, this proceeding is done over both the ISP_Core (see figure 6.3) and the ISP_Mob (see figure 6.4) traces.

Considering this results, it is interesting to notice the dependency of the detection rate and the total number of elements (which is directly proportional to the detection effort). The memory required increases linearly with that value while the detection rate improves following nearly an inverse exponential curve. That implies the detection rate has an asymptotic behaviour, which means that increasing the time-out, from a certain minimum value, has not a significant impact. Actually, that value is the one which optimizes the time-out and the most important determination of this MSc Thesis. The inclusion of the total number of elements (i.e. number of flows considered) is provided to show a complementary view of the time-out effect.

For both scenarios it is interesting to notice the relationship between the total number of elements and the detected flows. This fact highlights the need of some detections to keep a certain number of packets per flow for matching the protocol and the difficulty to detect some protocols in a flow already started.

In the graph obtained for the ISP_Core (see figure 6.3) trace the interpretation of results and the extraction of an optimal time-out is quite straightforward. In this scenario, with a high density of flows, the relation of memory elements and time-out (slope of the curve) is around 8,500 elements per second. Looking at

84 6.1 Flow tracking table time-outs

Figure 6.3 – Flow tracking table time-out variation impact core network Impact of time-out variation in the flow tracking table for the ISP_Core trace

Figure 6.4 – Flow tracking table time-out variation impact mobile network Impact of time-out variation in the flow tracking table for the ISP_Mob trace.

85 6.1 Flow tracking table time-outs the detection curve can be chosen a time-out of 200 seconds without practically affect the detection rate. Nonetheless, it implies a reduction of memory around 60% of the initial value of 600 seconds.

On the other side, the results obtained for the ISP_Mob (see figure 6.4) are not as decisive. Although the detection rate experiences the major improvement in the range below the 100 seconds (as in the ISP_Mob case) it does not tend to be stable before reaching the 600 seconds time-out. In this situation, choosing a time-out value of 200 seconds would imply a loss in that value of around 3%, passing from an absolute rate of 82.5% to 80%. The memory reduction is equivalent as in the case before, that is 60%. Is interesting to extract the slope in this scenario, which has a considerably lower density traffic. It increases with around 2,400 elements per second.

Initially, there are two reasons that could justify why in this case the detection rate does not level off. The first one could be that there is a bigger proportion of TCP flows in this scenario (see Flow usage, table 5.2 on page 40) and with a bigger relative quantity of terminations after a RST process in a blocking environment (see Termination mode proportions ISP_Mob, table 5.27 on page 69). The second may be that the most (70%) UDP flows belong to the network_management group (see Protocol groups proportions UDP ISP_Mob, table 5.11 on page 45), which has a relative high time-out and represents a relatively low proportion of data in the S2C direction (see table 6.7). To clarify this issue some more points have been added to the performance graphs of the ISP_Mob and it is found out that the stabilization of the detection rate is hardly achieved, and for a higher time-out value than the initial 600 seconds (see figure 6.5).

Figure 6.5 – Flow tracking table time-out variation impact mobile network extended Impact of time-out variation in the flow tracking table for the ISP_Mob trace adding more time-out values

86 6.1 Flow tracking table time-outs

It is in the range after the 1,000 seconds time-out when both the detection rate and the total number of flows start to be somehow asymptotic. This result illustrates that in a mobile network environment the times have a bigger variability that in a fixed network. This fact has an influence over the results extracted from the ISP_Mob trace since that limitation concerns the flow classification. However, the difference of both the detection rate and the number of flows between the time-out of 600 seconds and those seen for the maximum processed (2,000 seconds) is small and thus the possible variation in the finals values would be expected to be small too.

6.1.3.2 Theoretical profiled time-outs

From the sections above it has been proposed specific time-outs for each protocol group and for both TCP (see table 6.5) and UDP (see table 6.8) protocol transports. These are approximations and therefore would be interesting to see the real impact of using a different time-out for each of the cases.

Thus, to get an idea of that aspect it is repeated the evaluation in the same steps considered in the section above (see subsubsection 6.1.3.1). In this case, what is represented in the graph are the relative number of flows for each protocol group considering the maximum seen for each of them, which is always seen for 20 seconds of time-out. By doing this exercise, it is seen when the number of flows tends to be stable and therefore the time-out is the appropriate one for that protocol group. This exercise is only performed for the ISP_Core (see figure 6.6) and only are considered the more representative groups (excluded business, mobile, remote_control, database and conference).

Besides of the results itself, from this graph there are some peculiarities. The first and more noteworthy is the behaviour seen for the voip group, where for a time-out of 120 seconds the number of flows detected increases instead of decrease. The justification for this behaviour is given due to an internal method detection used by PACE. Notice that is also happens, lesser extent, for the p2p group. Another unusual behaviour is the one seen for the filetransfer group, where the proportion of flows is essentially the same for all the range of time-outs. This is justified due to the usage of a special time-out for FTP connections and again detection methods of PACE.

Analysing this results we see that it is difficult to correlate the behaviour seen with the time-outs seen in the sections above. This happens because the traffic here is TCP and UDP mixed, and it is not an easy task to evaluate each one separately. Moreover, there are some detections that do not work if the starting part of a flow is missing. Thus, if a flow is split into two due to the setting of a short time-out, for some protocols that second part of a flow would remain as unknown traffic.

To illustrate this disparity is good to pick an example. For instance, looking at the tunnel group curve it could be a good time-out 120 seconds, while the theoretical values obtained are 80 seconds for TCP and and 40 seconds for UDP.

87 6.2 Identity tracking table time-out

Figure 6.6 – Protocol group time-out variation impact core network Impact of time-out variation in the number of flows for each protocol group in the ISP_Core trace

6.2 Identity tracking table time-out

This section is intended to empirically see what is the relation of the memory and detection rate with the subscriber identity time-out. Although its impact is lower than the introduced by the flow tracking table is good to optimize the memory usage as well. As the flow table can be reduced, and each subscriber is supposed to generate many flows, this time is likely to be lower than the flow tracking table time-out.

As already done in the last section it has been performed an evaluation of the performance varying this parameter from 20 seconds to 300 seconds in steps of 20 seconds, and also the initial considered value of 600 seconds, for the ISP_Core (see figure 6.7) and ISP_Mob (see figure 6.8) traces. The value set for the flow tracking table time-out is 200 seconds, since it has been seen in the last section that it represents a good option (see subsubsection 6.1.3.1).

The results corroborate that the dependency between the detection rate and the time-out is softer than the seen for the flow tracking table. Instead, the total number of subscribers show a strong dependency with this time-out. However, this fact has no impact in the detection rate itself and therefore it is not important for the traffic identification (should be considered when studying subscribers usage statistics, for example).

88 6.2 Identity tracking table time-out

Figure 6.7 – Identity tracking table time-out variation impact core network Impact of time-out variation in the identity tracking table for the ISP_Core trace

Figure 6.8 – Identity tracking table time-out variation impact mobile network Impact of time-out variation in the identity tracking table for the ISP_Mob trace

89 6.2 Identity tracking table time-out

In the ISP_Core case it is seen that the detection rate improves linearly with the time-out until it reaches a value of around 60 seconds, which notice is 10% of the initial one. The memory usage for the subscribers information increases with a rate of 3,000 elements per second. That means that choosing a loose value for the time-out of 100 seconds , the detection rate is the same than for the 600 seconds one while the memory requirements decrease by 75%.

Similarly, in the ISP_Mob the detection rate improves almost imperceptibly until a time-out value of around 60 seconds. In this scenario the lineal relation between the number of subscribers in memory and the time- out is around 450 elements per second. So again, choosing a value of 100 seconds time-out it is achieved a reduction of memory required of around 70%.

90 7 Summary and conclusions

The needs faced by the modern DPI systems require most of the times a high performance. An important issue in this sense is an efficient memory usage. In current DPI applications this issue represent a limitation because of the big amount of data to process. Thus, the less memory usage the better for the system performance.

The PACE library necessitates to store information about the subscribers and about the flows to detect the different traffic protocols. Hence, it is important to characterize the traffic behaviour to optimize this memory usage, while keeping a good detection parameters. It can be achieved by knowing what is the expected time to wait for deleting a flow or a subscriber from the memory tracking table without loosing valuable information.

For that purpose two representative traces with real world traffic have been studied. Since this information can be useful for many applications, some parameters no directly related with the final purpose of this MSc Thesis have been also tracked and exposed, but without digging deeply into them.

Therefore, in Chapter 5 (see Results, chapter 5 on page 38) a broad set of parameters extracted from this two traces are exposed. They are shown under an informative point of view. Notwithstanding, they are briefly discussed and the outstanding results are commented. Hence, interesting results are found about traffic proportions, characterization and behaviour. All this information can be useful for other fields in addition to the establishment of a flow time-out. For example, it is observed the time needed for each protocol group to send the first data packet, which can be useful to prevent DoS attacks. Also, what are the differences in time between packets, how behave the RTT or what is the proportion of termination ways for each protocol group. However, the application of this information in other fields is out of the scope of this MSc Thesis and would remain as a future work.

Thus, using some of the data extracted from these parameters it has been studied what is the optimal time-out in terms of detection and memory usage (see Set up and evaluation of time-outs, chapter 6 on page 75). That is, the time after which can be considered that a flow (or a subscriber) has closed a connection (both initially set up with 600 seconds). On the one hand, it has been just considered the statistical information about the time between packets (PCK-PCK) in combination with some particular RST behaviours. By applying this procedure specific time-outs have been proposed for each protocol group (split in TCP and UDP). The times proposed are in general considerably lower than the initial. With some protocol groups where, theoretically, it could be established a 10 seconds time-out. That implies a reduction of around 98% of the initial time-out (see Flow tracking table time-outs, section 6.1 on page 75).

On the other hand, it has been assessed globally for all the protocol groups a range of time-outs to see the real impact of decreasing this parameter in the memory usage and the detection rate. Although the results

91 in this case are not as low as the obtained statistically, from them can be concluded that a reduction of the time-out to 200 seconds can make decrease around 60% the memory usage while the detection rate is almost not affected (see Single time-out, subsubsection 6.1.3.1 on page 84). In addition, although PACE does not allow setting up a different time-out for each protocol group, it has been evaluated what is the impact of them for each time-out.

As a secondarily issue it has been also evaluated the impact of reducing the subscriber or identity tracking table. In this case the empirical evaluation reveal that reducing that value to 100 seconds implies a reduction of around 75% of memory.

Further work based on this MSc thesis could arise in many areas. An interesting one would be, to study the impact on the detection and the memory usage of setting up the application time-outs found. Also, it could be evaluated the impact in terms of power saving by reducing the memory usage in almost by 60%. Furthermore, once seen that depending on the traffic statistics it can be set up a specific time-out, the next step could be to implement a dynamic time-out by taking the live statistics from each network. Finally, by knowing some of the statistical values here published, it could help to prevent and identify traffic irregularities as, for example, malware attacks.

To conclude, this MSc Thesis represents to the author knowledge, the first work that studies the behaviour of the time-outs for applications in the literature. Both the set of assessments carried out, the results obtained and their discussion are then a first step in this field showing that proper use of them can bring great benefits in terms of efficiency, and also potentially in another areas.

92 Acronyms

ACK Acknowledgement. 4–6, 16, 20, 23, 34, 52, 58, 63

C2C Client to Client. 34, 35, 58

C2S Client to Server. 6, 34, 49, 51, 52, 58, 59, 63, 65, 72, 75

CDF Cumulative Distribution Function. 11, 12, 27, 29–33, 78, 80, 83

CPU Central Processing Unit. 1

CSV Comma-Separated Value. 26, 27

DAT-ACK Data-Acknowledgement. 6, 22, 23, 27, 56, 58, 59

DAT-DAT Data-Data. 6, 23, 27, 39, 60, 63–65

DNS Domain Name Server. 66

DoS Denial of Service. 1, 16, 49, 91

DPI Deep Packet Inspection. III–V, 1, 14, 91

FIN Finalization. 5, 6, 16, 20, 63, 70, 73

FIN-ACK Finalization-Acknowledgement. 6, 22, 27, 56, 59, 67, 68, 71, 72

FTP File Transfer Protocol. 63, 76, 87

GGSN Gateway GPRS Support Node. 38

GPRS General Packet Radio Service. 39

GTP GPRS Tunneling Protocol. 39

HTTP Hypertext Transfer Protocol. 2, 16, 46, 52, 58, 59, 63, 65, 71

IDS Intrusion Detection System. 16

IP Internet Protocol. 4

ISP Internet Service Provider. III–V, 1, 16, 38

93 Acronyms

MMS Multimedia Messaging System. 65 ms milliseconds. 42, 46, 47, 51, 58, 64

MSc Master of Science. III, VI, 1, 4, 16, 17, 84, 91, 92

OS Operating System. 14

OSI Open System Interconnection. 4

P2P Peer to Peer. 16, 47, 65, 71, 72

PACE Protocol and Application Classification Engine. III–V, 1, 2, 14, 18, 21, 34, 39, 41, 73, 75, 76, 84, 87, 91, 92

PAT PACE Analysing Tool. 18, 19, 21, 26

PCK-ACK Packet-Acknowledgement. 6, 22, 23, 27, 49, 51, 56, 58, 59, 68, 71, 72

PCK-PCK Packet-Packet. 6, 23, 27, 39, 51, 60, 63–65, 73–76, 78, 80–82, 91

PDF Probability Density Function. 11, 12, 27, 29–33

RAM Random Access Memory. 8, 41

RFC Request For Comments. 4

RST Reset. 5, 6, 16, 67, 68, 70–74, 76, 78, 80–82, 86, 91

RST-END Reset-End. 6, 22, 27, 73, 74

RTT Round-Trip Time. III, IV, 16, 49, 52, 54, 56, 59, 68, 72, 91

S2C Server to Client. 6, 34, 49, 51, 58, 59, 63, 65, 71, 75, 86

SGSN Serving GPRS Support Node. 38, 59

SIP Session Initiation Protocol. 64

SMP Symmetric Multi-Processing. 14

STD Standard Deviation. 9, 12, 22–24, 26, 27, 29, 31, 33, 35, 36, 42, 46–48, 51, 58, 59, 63, 74, 75

SYN Synchronize. 4, 6, 16, 22

SYN-DAT Synchronize-Data. 6, 22, 27, 49, 51, 52

SYN-SYNACK Synchronize-Synchronize/Acknowledgement. 6, 22, 27, 37, 40, 49, 51, 52, 54, 56, 58, 59, 68, 71

SYNACK Synchronize/Acknowledgement. 4, 6

94 Acronyms

SYNACK-ACK Synchronize/Acknowledgement-Acknowledgement. 6, 16, 27, 40, 49, 51, 52, 54, 56, 58, 59, 68, 71

TCP Transmission Control Protocol. III–V, 2–6, 16, 20, 21, 23, 34, 35, 37–41, 46–49, 51, 60, 64, 65, 73, 75, 76, 82, 84, 86, 87, 91

UDP User Data Protocol. III–V, 2–4, 6, 34, 35, 37–41, 47, 48, 64–66, 76, 82, 84, 86, 87, 91

95 Bibliography

[1] University of Southern California. RFC 793 - Transmission Control Protocol. Internet Engineering Task Force, September 1981. Website: http://tools.ietf.org/html/rfc793. 4

[2] Neil J. Gunther. Analyzing Computer System Performance with Perl::PDQ. Springer, 2010. 13

[3] Ipoque. Datasheet PACE. Ipoque GmbH, 2012. Website: http://www.ipoque.com/sites/ default/files/mediafiles/documents/data-sheet-pace.pdf. 14

[4] Ipoque. SUPPORTED PROTOCOLS AND APPLICATIONS. Ipoque GmbH, 2012. Website: http://www.ipoque.com/sites/default/files/mediafiles/documents/ data-sheet-protocol-support.pdf. 14

[5] Phillipa Sessini, Anirban Mahanti. Observations on Round-Trip Times of TCP Connections. University of Calgary, 2006. Website: http://www.cs.utoronto.ca/~phillipa/ papers/SPECTS.pdf. 16, 49, 51, 58, 59

[6] Jay Aikat, Jasleen Kaur, F. Donelson Smith, Kevin Jeffay. Variability in TCP Round-Trip Times. University of North Carolina, 2003. Website: http://conferences.sigcomm.org/imc/ 2003/papers/p326-aikat111.pdf. 16, 51

[7] Jitendra Padhye, Victor Firoiuy, Don Towsley, Jim Kurose. Modeling TCP Throughput: A Simple Model and its Empirical Validation. University of Massachusetts, 1998. Website: http://an. kaist.ac.kr/courses/2009/cs540/papers/modeling_TCP.pdf. 16

[8] Yee-Ting Li, Douglas Leith, Robert N. Shorten. Experimental Evaluation of TCP Protocols for High- Speed Networks. Hamilton Institute, 2006. Website: http://www.hamilton.ie/net/eval/ ToNfinal.pdf. 16

[9] Jörgen Olsén. Stochastic modeling and simulation of the TCP protocol. PhD thesis, Uppsala University, 2003. 16

[10] Haining Wang, Danlu Zhang, Kang G. Shin. Detecting SYN Flooding Attacks. University of Michigan, 2002. Website: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1. 11.5580. 16

[11] Changhua Sun, Jindou Fan, Bin Liu. A Robust Scheme to Detect SYN Flooding Attacks. Tsinghua University, 2007. Website: http://s-router.cs.tsinghua.edu.cn/~sunchanghua/ publication/sun_ChinaCom07.pdf. 16

96 Bibliography

[12] Wolfgang John, Sven Tafvelin, Tomas Olovsson. Trends and Differences in Connection-behavior within Classes of Internet Backbone Traffic. Chalmers University of Technology, 2008. Website: http://www.sjalander.com/wolfgang/publications/PAM08.pdf. 16, 47, 71

[13] Kimberly C. Claffy, Hans-Werner Braun, George C. Polyzos. A Parameterizable Methodology for Internet Traffic Flow Profiling, October 1995. 17, 75

[14] British Columbia Environment. OUTLIERS A guide for data analysts and interpreters on how to evaluate unexpected high values , 2001. Website: http://www.env.gov.bc.ca/epd/ remediation/guidance/technical/pdf/12/gd08_all.pdf. 32, 75

[15] V. Paxson, M. Allman, J. Chu, M. Sargent. RFC 6298 - Computing TCP’s Retransmission Timer. Internet Engineering Task Force, June 2011. Website: http://tools.ietf.org/html/ rfc6298. 73

[16] R. Braden. RFC 1122 - Requirements for Internet Hosts – Communication Layers. Internet Engineering Task Force, October 1989. Website: http://tools.ietf.org/html/rfc1122. 73

97