Rethinking the Building Block: a Profiling Methodology for UDP Flows
Total Page:16
File Type:pdf, Size:1020Kb
Rethinking The Building Block: A Profiling Methodology for UDP Flows 123Jing Cai 13Zhibin Zhang 13Peng Zhang 13Xinbo Song 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2Graduate University of Chinese Academy of Sciences, Beijing, China 3National Engineering Laboratory for Information Security, Beijing, China [email protected] Abstract—With the increase of network bandwidth, more and analysis on the 24-hour data we collected from a backbone more new applications such as audio, video and online games router, we found that the proportion of the UDP packets is have become the main body in network traffic. Based on real- around 40%. It could sometimes reach 80% at most. From the time considerations, these new applications mostly use UDP as transport layer protocol, which directly increase UDP traffic. view of the proportion of UDP bytes, we also found that the However, traditional studies believe that TCP dominates the proportion of the UDP bytes fluctuated greatly, but generally Internet traffic and previous traffic measurements were generally exceeded 40%, and sometimes was even above 80%. based on it while UDP was ignored. Since the increase of UDP traffic, more and more people In view of this, we mainly discuss the profiling methodology of have started to pay attention to the traffic of UDP. However, UDP flows in this paper. First, from the view of the network layer, we present a profiling methodology on the basis of the Claffy’s compared with the TCP, we found there at least exist two big parameter flow model. Next, due to the significant differences differences. among the different applications, we found that using a unified Firstly, TCP is a connection-oriented protocol, it has con- methodology and ignoring the information of application layer is trolling flags such as FIN and RST to explicitly identify the not appropriate for UDP flows. At last, we get the conclusion end of flow. But for UDP, it is a connectionless protocol. In that for different applications, such as dns, qq[16], we must use corresponding profiling methodology based on their specific essence, there is no concept of ”flow” in udp. At present, characteristics. people commonly refer to UDP flows as a series of packets which are required to complete a data transfer from the view I. INTRODUCTION of the application layer. Therefore, Claffy’s flow model is still The main purpose of the network traffic measurement is to applicable for UDP. But in this case, the parameters setting is enhance people’s awareness about traffic characteristics. The not the same as before. traffic measurement that works based on the network layer The second, compared with TCP, the composition of UDP started from the 1980s. Earlier studies took the packet as the is more complicated. In the past, in measurement of TCP Building Block. But due to its small granularity, it could not flows and from the view of the network layer, people do not meet the needs in many ways. According to the locality of care about the specific application services, so the parameters network traffic in time and space, Claffy et al.[1][2] firstly of flow model are generally identical. However in UDP, proposed a parameters flow model. Since then, the traffic the characteristics of different applications often demonstrate measurement based on flow had gradually become a hot issue significant differences. Using uniformed parameters to profile in the field of network measurement. flow without considering the assumption on application layer However, in the past, during the process of network traffic is feasible for TCP flows, but will bring many uncertainties to measurement, people generally believed that TCP traffic oc- the connectionless UDP flows. cupied the main body of the network traffic, and UDP traffic Due to these two great differences, earlier network measure- is negligible, and therefore ignored the measurements of the ment based on flows mostly focus the TCP flows, while UDP UDP flow. However, the situation has undergone tremendous flows was ignored. The study on the UDP flows is nearly changes at present. With the increase of network bandwidth, in the blank stage. In view of these, we mainly discuss the the traditional networking services based on images and text profiling methodology of UDP flow in this paper. To the best could no longer satisfy people’s needs. More and more audio, of our knowledge, we are the first to do so. There are two video, and online games, have gradually become the main main contributions in our paper. body of the network traffic. These applications mostly use ∙ Through our comprehensive considerations on each eval- UDP as their transport layer protocol[3], which directly results uation criteria, we built the appropriate profiling method- in the increase of UDP traffic. The organization of CAIDA[4] ology on the basis of Claffy’s parameter flow model from analyzed the trace collected in the period 2002-2009 on several the view of the network layer. backbone links located in the US and Sweden, found the ratio ∙ Taking the complexity of UDP into account, due to the between the UDP and TCP in packets, bytes, and flows have significant differences between the application protocols, increased greatly. In China, as shown in Fig.1, after statistical we want to know whether it is enough to use a unified The proportion of the udp’s packets and bytes TABLE I: The basic information of the trace 1 UDP−pkt / IP−pkt 0.9 UDP−byte / IP−byte Id Begin time End time Bytes Packets I 2009.5.5,14:57 2009.5.6,00:30 275G 2805(million) 0.8 0.7 0.6 However, earlier studies believe that TCP dominates the 0.5 Internet traffic, thus their study generally based on TCP flows 0.4 while UDP flows was ignored. Therefore, we will present our 0.3 profiling methodology of UDP flows on the basis of Claffy’s the ratio between udp and ip 0.2 parameter flow model in this paper. 0.1 III. PROFILING METHODOLOGY FROM THE VIEW OF THE 0 0:05:57 4:25:57 11:15:57 14:35:58 18:05:58 21:35:58 NETWORK LAYER time A. Date set Fig. 1: The proportion of the UDP’s packets and bytes We collected the data trace from a backbone router in China. The basic information of this trace is in Table I. The reason for not using the data set of CAIDA is that its payload information profiling methodology to measure the characteristics of has been encrypted. And in this paper, we must use the payload different applications. After our statistics and analysis, we information to classify the different applications. found there indeed exist significant differences between the profiling methodologies of different applications. We B. profiling methodology were unable to use a unified methodology to process UDP During the measurement based on flows, how to decide the flow as a whole. In contrast, we must use corresponding boundary of flow is critical. There exist two main approaches. profiling methodology based on the characteristics of The first, controlling flags(etc., FIN and RST) are used to applications. terminate the flow[7]. The second is the timeout strategy. If a The remainder of this paper is organized as follows: Section flow become inactive beyond a given timeout, it is deemed to 2 presents some related work on the profiling methodologies be ended and should be removed from memory. The timeout- of flows. In section 3, we discuss the profiling methodology based method does not rely on the explicit protocol labels in from the view of the network layer. In section 4, from the packet header, thus it can deal with the TCP flows and UDP view of the application layer, considering the great differences flows. In this case, although UDP is a connectionless protocol, between the different applications, we want to know whether Claffy’s parameter flow model is still applicable. What we exists a unified profiling methodology to process UDP flows. need to do is to find out the most appropriate parameters. At last, we conclude this paper and give some suggestions in We formally define UDP flows as a series of continuous Section 5. arriving UDP packets which have the same five-tuple(source address, destination address, source port, destination port, II. RELATED WORKS transport protocol), and their packets inter-arrival do not The studies of earlier traffic measurement took the packets exceed a specified fixed timeout. In this paper, we assume as the Building Block. But as D.Clark[5]pointed out that, our scenario is located in a backbone router. the traffic measurements based on packets can not reflect the Among the above six parameters, the selection of timeout relation between the packets and the higher-level information. is most difficult and important. The timeout determines the Thus, it can not meet the need for understanding network boundary of the flow, so it directly impact on the accuracy traffic in many ways. of flow measurement. If a smaller timeout value is selected, Jain[6] proposed a model named Packet Train. In this model, it can cause more long flows to be segmented into multiple a series of packets which have the same source and destination short flows. If a larger timeout is selected, it may result in more address will be identified as a packet train. If the interval unnecessary measurement overhead. For TCP, because it has between two packets exceeds a specified fixed timeout value, controlling flags which can make up the impact of setting then these packets are said to belong to separate train. a inappropriate timeout. But for UDP, the selection of the Claffy et al.[1][2] first propose a parameters flow model, in timeout is critical.