<<

Classification of Internet Video Traffic Using multi-Fractals Pingping TANG1,2, Yuning DONG1, Zaijian WANG 2, Lingyun YANG 1 1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China. 210003 2. College of Physics and Electronic Information, Anhui Normal University, Wuhu, China. 2410002 e-mail: [email protected]; [email protected]; [email protected]; [email protected]

Abstract—Video traffic is booming in the Internet, and the machine (SVM), decision tree, or neural network, etc. types of video traffic are numerous. So it is necessary and The above methods have achieved some preliminary results. imminent to effectively classify video traffic. The existing methods For example, Literature [7] was concentrating on classifying of classifying video traffic depend heavily on extracted features, P2P-TV with SVM, in which the extracted feature is the which are statistically accessed from given samples, and thus are destined to be ineffective for other types of video traffic. Therefore, number of packets during a given period of time and then in this paper, we propose a novel classification method based on P2P-TV is divided into four categories: PPlive, Sop Cast, the theory of multi-fractals, and it relies on fractal characteristics TVAnts and Joost. Literature [8] provided deeper insight into rather than statistical features to classify video traffic. A number flows by utilizing active and passive technology; then bit of experiments are performed to demonstrate the feasibility of the rate, interval time of packets and size of packets are designated proposed method and its adaptability to new environments. The as significant features to distinguish different videos. Literature results show that video traffic classification with multi-fractals, can effectively mitigate the defects of statistical features, and [9] also designated packet size as features to use when achieve a superior performance. analyzing a large volume of video flows, such as HTTP video flows generated by YouTube. With machine learning and the Index Terms—classification; feature; multi-fractals; video Gaussian mixture model, HTTP video flows are eventually traffic. divided into five categories: cartoons, news, advertising, music and sports. However, some issues are exposed with in-depth research by I. INTRODUCTION investigators: 1) Sometimes, a large number of features can Video has become one of the most popular network services identify only a few categories. In Literature [10], more than 12 with the innovation of 4G and 5G technologies and is growing features were extracted to identify HTTP flows with the method rapidly on a tremendous scale at present [1]. There are so many of fast feature selection algorithm (FFSA). 2) The features, varieties of video traffic that it is necessary to build an effective which can effectively identify the previous set of flow samples, classification system to execute resources management and to generally may not do so for the next set, so those features need provide technical support to guarantee quality of service (QoS) to be properly adjusted against actual situations, which will [2]. For example, video conferencing and telemedicine systems lead to a series of problems. For example, Literature [11] tried have critical requirement of real-time, and any unexpected to classify flows by KNN(k-Nearest Neighbor). However, its K delay may result in considerable economic loss and decision factor, which determines the number of categories, can only be errors [3]. Therefore, it is imminent to classify video traffic, and manually adjusted as features are changed. 3) Features it has become a research focus in the fields of multimedia extracted from given flow samples are destined to be communications and network traffic classification [4]. ineffective for unknown flows. 4) Generally, better In the current research on traffic classification [5], traffic is performance can be achieved by adding more features; firstly divided into flows, which is defined as a set of packages however, computation and storage costs will grow that have the same properties, and usually described by a exponentially [12]. Moreover, there is evidence of a strong five-tuple: [6]. correlation between the features, and more features will lead to Then flows are classified into video conference, telemedicine more redundancy, which will greatly reduce the classification system, electronic commerce, and so on. That is, classification accuracy and efficiency [13]. 5) A large number of test samples of video traffic is actually classification of flows. are needed to extract effective features based on statistics, and The existing methods of classifying video flows now are the process is time-consuming [9]. mostly focusing on statistical methods, and the process can be The above statistical features has obstructed the generally described as follows: first, flow samples are observed classification of video flows from developing. So it is necessary and analyzed, and then useful features, such as flow size, to have further research on video flows and to explore new transmission rate, duration time, packet number and average methods. Accordingly, multi-fractals are introduced in this size of packets, are extracted based on statistics, and we call it paper to cope with the issue of statistical features. the statistical features later on. Consequently, flows can be The major contributions of this paper are summarized as classified by these features accompanied with support vector follows: 1) based on multi-fractals, we propose a novel

classification method using fractal characteristics to classify According to (1) and (2), if the sequence of k [X ] video flows; 2) the proposed method does not require statistical N/m features, so as to avoid the long-term process of extracting satisfies: features based on statistics and other limitations caused by n m k (4)  N  lim  ln[( N [X ])] nN statistical features; 3) multi-fractals are theoretically improved m i N m m in this paper, which would be helpful for the theory to be used then the multi-fractal spectrum can be described as: in other research fields.  The rest of the paper is organized as follows. Section II 1 n f ( ) lim lim ln N( ) n (5) introduces fractal concepts for video flow classification. G    N α (  ,  )  0 n  n N Section III presents a detailed description of our method based m m on multi-fractals. Section IV contains a series of experiments where fG(α) is the multi-fractal spectrum of flows sequence. and comparisons. Finally, Section V concludes this paper with The multi-fractal spectrum of flow sequence, fG(α), which a discussion of the future work. can describe the complex characteristics of flows, are expected to distinguish different video flows. II. BACKGROUND KNOWLEDGE Our proposed method does not rely on statistical features and Multi-fractals involve an infinite set of singular exponents classifies flows without feature extractions, which lead to a which describe the variability of random, non-stable data. It is superior performance in a real network with high dynamic useful to explore ordering rules from a state of disorder and variation. reveal specific principles from complex, broken and chaotic phenomena. III. FRACTAL CLASSIFICATION METHOD According to multi-fractal theory, different regions of the In this section, we present a detailed description of our same fractal material generally have the same fractal classification method based on multi-fractals. characteristics, and these characteristics describe the complex relationship in multiple dimensions, so fractal characteristics A. The estimated fractal spectrum are expected to distinguish materials. For example, in In the field of multi-fractals, the multi-fractal spectrum fG(α) Literature [14] fingerprints were identified by fractal theory. is the mathematical representation of complex fractal Leland and others also introduced multi-fractals to analyze characteristics, which is difficult to accurately calculate by (5). network traffic [15], such as predicting traffic, controlling Generally, the estimated value of fG(α) can be relatively easy traffic congestion, investigating traffic variability, etc. [16]. to obtain with the method of numerical analysis[15], and Just as fractal characteristics of materials can help to moreover, flows are in a discrete sequence after sampling in a distinguish them, we expect fractal characteristics of traffic can computer, so the estimated spectrum based on Legendre be used to classify video flows. transformation is explored to model the multi-fractal spectrum According to multi-fractal theory, μk (ε), which represents of flows. the measurement of unit k of the scaling value of ε, satisfies the Normalize flows sequence {X (k;m,N)} according to (3): relation: k  N [X ] k ( ) (1) k m (6) k     [X ] = N j [X ] where  is the Holder exponent or the singularity exponent. m   N j The objective space is divided into several subsets m Define partition function: according to α k, and, if each subset has the fractal N q characteristics of f (αk), then fractal spectrum fG(α) can be  m m described as: (m(k 1)j ) S m(q)  N [X ] (7) f ( )   N( ) G (2) k 1 j 1      m where N(α) represents the number of subsets that have the Define scaling function: value of α.  1 Note that flow {X(t)} is a stochastic process, which is (q) lim log S m(q) (8) sampled in the computer and converted into a discrete sequence. m  m An increment process of flow {X(t)} has the same sampling According to (4) and (7), the partition function of Sm(q) can process and the discrete sequence can be described as (defining be defined as: n n nq k N as resolution, and N=2 ): 2 1  α N N N m n q - - S (q)  2  (2 ) k m m (3) m   (9)  N [X ] = X(( k + 1)2 )-X(k2 ) k 0 m n(q fG ( )) where k=0,1,...2n-1, m=0,1,...n.  2 According to (8) and (9), the relationship between τ(q) and

fG(α) can be obtained as:

 * Now, it's a question of striking the balance point of q (q) f ( ) inf q f ( ) (10)   G      G   between v(q) and c(q). Specifically, c(q) is expected to be as where *( ) denotes the Legendre transformation. And small as possible, that is, q should be small; on the other hand, v(q) is also expected to be as small as possible, that is, v(q)→ε, ccording to (13), the relationship between τ(q) and fG(α) can be further explored as: which means q should be large. Therefore, with the weighted sum of squares(WSOS), an optimization model such as (15) is f () = q - (q)  G proposed in order to achieve a balance between v(q) and c(q):  d(q) (11)  = 2 2  q  arg minj(v(q)  V )  k(c(q)  C )  (15)  dq   where λ and λ are weighted coefficients, and satisfies λ +λ =1, According to (11), τ(q), the same as f (α), can be used to j k j k G λ >>λ 。V and C are the desired target values of v(q) and c(q), describe fractal characteristics of flows. So in this article, the j k which come from experience. The optimal value of q thus will estimated spectrum of τ(q) obtained by (6-8) is used to achieve be deduced by (15), and, therefore, the core domain of the flows classification, scince fG(α) is very difficult to calculate estimated multi-fractal spectrum Q|(-q,+q) is exactly determined by (5). by q. B. core domain C. Calculating the value of differences between spectra In (11), after making a limit on the derivative of τ(q) with In this section, the value of differences between spectra is respect to q, the extreme value of the Holder exponent α will calculated by grey correlation, which is generally used to be obtained: quantitatively measure the similarity between curves. In this  d(q) research field, the coefficient of difference between spectra is  =  min dq q   thus defined as:  1 (12) n n   1  d(q) (k ) (16)  =    2    ij  max q   n  dq  i 1 j 1  According to (12), the range of q ought to be (-∞, +∞); where however, in practical calculation, the workload increases max maxij(k)   max maxij(k)  (k)  1i m 1k m 1i m 1k m (17) exponentially when increasing the absolute value of q. ij ij(k)   max maxij(k) Moreover, when q is higher than a certain level, the 1i m 1k m corresponding calculation has no significant effect on the L* a b  (k)   (q )   (q ) (18) results. Therefore, the range of q can be reasonably narrowed, ij k  i j  l i l j although, of course, not overly narrowed. If the range of q is too l 1 small, then the rate of Δτ(q) caused by Δq changes drastically, β is resolution factor with a range of [0,1], which represents the which will lead to serious curve defects, and the fragment of proportion of the difference, and generally β is adopted by 0.5. curve τ(q) cannot describe enough details of the fractal φ is the coefficient of difference between spectra with a range characteristics carried by the flows sequence. of space (1,+∞), and the smaller the value of φ, the greater the Therefore, the core domain is defined as: Q|(-q,+q), where the q similarity of the two spectra. value should be properly selected to seek a sharp fall in the Above all, φ is calculated by (16) , and if φ falls within the workload of the calculation, as well as, to maintain enough threshold, then the two flows are restricted to the same category; details of the fractal characteristics carried by the flows otherwise, if φ exceeds the threshold, then the two flows end up sequence. different categories. v(q), which describes the amount of the changing rate of Δτ(q) The threshold is determined by the maximum coefficient of caused by Δq, is defined by: difference between spectra across the given samples: (19) d 2(q) d 2(q) v(q)  q  q  q  q (13) where is the coefficient of difference between the spectra 2 2 dq dq of given sample q and sample p from class v. According to (13), the function of v(q) is even. Moreover, by (11), which shows the relationship between the scaling function of τ(q) and the fractal estimated spectrum of fG(α) and (13), it IV. EXPERIMENTS can be deduced that: when q→±∞, then v(q)→0. There are three types of datasets used in the experiment. The Meanwhile, the workload of the calculation increases NJUPT dataset is captured by Wireshark 1 in the campus exponentially when increasing the absolute value of q: network of Nanjing University of Posts and q c(q)  e (14) where c(q) is the workload of the calculation, and the parameters of c(q) can be obtained by curve fitting. 1 http://www.wireshark.org;

Telecommunications. The UCI dataset2 and the ITA dataset3 B. Classification performance contains massive types of data, which are widely used for 3000 flows are randomly selected from the NJUPT dataset academic research in the field. The detailed information about set, including QQ video flows, PPlive flows, GAME flows, the three datasets are listed in TABLE I: TVant flows, Tudou video flows, and Skype video flows4, 500 TABLE I flows for each class. In this experiment, a 10-fold DATA SETS Dataset Video classes cross-validation is carried out on these flows; that is, 3000 Flows flows are divided into 10 parts, one is chosen as testing samples, NJUPT 50k QQ, PPlive, GAME, TVant, Skype, Tudou and the others as training samples, followed by recycling. UCI 8k Skype , eBuddy, FTP, YahooVideo, GAME, The experimental results can be described by a confusion FTP, BT ITA 9k e Donkey, AOL Messenger, Lime , . As shown in TABLE II, the number of QQ video flows Gnutella, Fast Track, Direct Connect that are identified as QQ video flows, PPlive flows, GAME flows, TVant flows, Tudou video flows and Skype video flows, A. Evaluating the τ(q) spectrum of a single flow are 478, 4, 3, 3, 4, 8, and the corresponding ratios are 95.6%, This experiment needs QQ video flows4, which are obtained 0.8%, 0.6%, 0.8%, 0.8%, 1.6%. The number of PPlive flows by Wireshark from the campus network of Nanjing University that are identified as QQ video flows, PPlive flows, GAME of Posts and Telecommunications. During the capture, duration flows, TVant flows, Tudou video flows and Skype video flows, is set to 1000ms, and resolution N is set to 10000. Then, the are 4, 475, 5, 8, 3, 5, and the corresponding ratios are 0.8%, curve of lnSm(q)~lnm is drawn according to the different 95.0%, 1.0%, etc. aggregation lengths of m and the higher order moment of q, as TABLE II shown in Fig. 2(a). CONFUSION MATRIX QQ PPlive GAME Tudou TVant Skype

10 20 QQ 0.956 0.008 0.006 0.008 0.008 0.016 PPlive 0.008 0.950 0.010 0.016 0.006 0.010 5 10 GAME 0.010 0.016 0.922 0.016 0.022 0.014 0 Tudou 0.008 0.014 0.008 0.952 0.010 0.008

0

q) -5 TVant 0.012 0.010 0.014 0.010 0.946 0.008

(

q) m q=-4 -10 τ( Skype 0.018 0.010 0.012 0.014 0.012 0.934 lnS q=-2 -10 q=2 -15 q=4 QQ-flow -20 PPlive-flow -20 q=6 q=8 Game-flow C. Horizontal comparison with other methods -25 -30 2 4 6 8 10 -5 0 5 10 In this experiment, another 3000 flows are selected from the ln m q (a) (b) NJUPT dataset to train and test different classification methods, Fig. 1. (a) lnSm(q)~lnm lines; (b) τ(q)~q curves including Bayes [13], SVM [7], HMM [2] and DT [8]. In the When q assumes different values, the approximate straight methods of Bayes, SVM, HMM and DT, the features are ines with different slopes are presented, which exactly carefully selcted according to Literature [2], [7], [8] and [13] represent multi-fractal characteristics of the flow sequence. If for the given NJUPT data set. For example, six statistical the flow sequence is not fractal, then there are no such straight features including packet number, average packet interval, lines. It is the slopes of these lines that form the scaling function transmission rate, average size of packets and maximum of space, and according to (11), the curve of τ(q)~q can be plotted packet size are used in the method of Bayes [13]. In the by the least square method (LSM), as shown in Fig. 2(b). methods of SVM, five statistical features are selected to In this paper, flows are normalized by (9) to simplify classify NJUPT dataset. 10-fold cross-validation is also carried calculations, and the corresponding influence on the curve of out here. Classification results are compared and recognition lnSm(q)~lnm and τ(q)~q can be described as follows: 1) when q rates are listed in TABLE II. →1, the approximate straight of lnSm(q)~lnm is extremely close to the the horizontal coordinate-axis; 2) no matter what TABLE III COMPARISON OF RECOGNITION RATES WITH OTHER METHODS the value of q is equal to, when m→N, lnSm(q)→0, the slope of QQ PPlive GAME Tudou TVant Skype the line is positive when q is positive, and the slope of the line is Bayes 93.2% 92.7% 88.4% 87.5% 91.5% 89.1% negative when q is negative; 3) regardless of the type of flows, SVM 91.7% 89.3% 90.4% 89.8% 92.5% 88.9% HMM 89.9% 89.3% 92.1% 92.3% 88.6% 93.4% when q=0, the slope of the line lnSm(q)~lnm is the same; that is, the curves of τ(q)~q of all flows intersect on q=0. DT 92.7% 88.5% 92.9% 93.7% 89.8% 89. 8% Fractals 95.5% 95.3% 92.9% 94.8% 94.6% 93.5% The monotone curves τ(q)~q of QQ video flows, PPlive flows, which are quite different from each other, and GAME According to TABLE III, the average recognition rate of our flows are drawn in Fig. 2(b). These different monotone curves method using fractals can reach around 94.6%, which is slightly indicate different fractal spectra of flows. higher than other methods. Generally, all of the above methods show excellent performance with their own optimal statistical

4 http://www.qq.com/; http://www.pptv.com/; http://www.pcgames.com.cn/; 2 http://www.ics.uci.edu/; https://videos.en.softonic.com/; http://www.tudou.com/; http://www.skype. 3 http://ita.ee.lbl.gov; com/;

features, due to the rare classes of NJUPT dataset (see TABLE SVM, HMM, and DT always show a lack of recognition to I). However, those features extracted from the NJUPT dataset some video flows; for example, the recognition rates of Bayes might be less effective for the different dataset. and HMM to Category 6 are less than 75%. Next, we randomly select new flows from the UCI dataset to test whether these methods have the ability of responsiveness. 1 D. The ability of responsiveness 0.8 In this experiment, 6000 flows are selected from the NJUPT and UCI dataset, and the amount of NJUPT and UCI samples are the same, 500 flows for each class, including QQ, PPlive, 0.6 fractals GAME, TVant, Skype, Tudou, Skype, eBuddy, FTP, Bayes

YahooVideo, FTP, BT. rate recognition 0.4 SVM TABLE IV HMM OVERALL RECOGNITION RATES UNDER DIFFERENT FEATURE SETS DT Bayes SVM HMM DT Fractals 0.2 5 10 15 Feature set A 81.9% 84.6% 85.7% 83.1% 93.9% categories Feature set B 89.4% 90.7% 90.2% 88.5% Fig. 2. Statistics of tests on the ability of responsiveness As shown in TABLE IV, this experiment is respectively In this paper, the theory of multi -fractals is introduced to done with the Feature set A and Feature set B. The feature set classify video flows. The new method uses fractal A is obtained from the flow sample set of Experiment C, and characteristics rather than statistical features to classify video the feature set B is obtained from the flow sample set of flows, and, therefore, performs better in responsiveness to new Experiment D. environment. The features, which can effectively identify the previous set of flow samples, generally may not do so for the next set, so the V. CONCLUSIONS overall recognition rates are generally declined for the methods The existing methods for classification of video flows fall of Bayes, SVM, HMM and DT. For example, the overall short because of statistical features. Thus multi-fractals are recognition rate of Bayes in Experiment C is 92.2% according introduced in this paper to break through the limitations of to TABLE III, and it falls to 81.9% with the same features. feature extraction. The results show that classification with Those features need to be properly adjusted against the flow multi-fractals, which takes data variability as content, can samples into Feature set B, and result in a considerable increase achieve a superior performance when classifying video flows. in the overall recognition rates by 89.4%. Still there are some issues that need to be explored in the However, even if the feature set is adjuseted according to future, and the next step of our work is to study multiple new flow samples, the overall recognition rates is declined dimensions of fractals, including fractal flows and fractal from 92.2% to 89.4% , due to the the amount of classes is packets, which may help to establish an accurate classification shooting up. Moreover, the more classes the greater the decline, model and thus further improve the performance. even if the feature set is adjuseted according to new flow samples. Because the statistical features will lead to more ACKNOWLEDGMENT redundancy, which will greatly reduce the classification This work was supported in part by the National Natural accuracy and efficiency, as shown in the next Experiment. Science Foundation of China (No.61271233, 61401004); the Here, totally new video flows are selected from the above Huawei Innovation Research Program (HIRP); the Innovation three types of datasets, including e Donkey, AOL Messenger, 5 Project of Jiangsu Province (No. KYLX16_0653) Lime Wire, Gnutella, Fast Track, Kazaa, Direct Connect, etc . The dataset is composed of 1500 flows belonging to 15 REFERENCES categories. For methods of Bayes, SVM, HMM and DT, here we adjust the features according to Literature [2], [7], [8] and [1] Q. M. Qadir, “Mechanisms for QoE optimisation of Video Traffic: A review paper,” Analytical Biochemistry. 2015, 1(1):40-42. [13]. [2] Kim J, Hwang J, Kim K, “High-Performance Internet Traffic As shown in Fig. 2, the horizontal coordinate-axis represents Classification Using a Markov Model and Kullback-Leibler Divergence,” the label of 15 categories, and the longitudinal coordinate-axis Mobile Information Systems, 2016, 2016(1):1-13. [3] N. Khan and M. G. Martini, “Hysteresis based rate adaptation for scalable represents recognition rates of the classification methods for video traffic over an LTE downlink,” IEEE International Conference on the corresponding categories. It is clear that the recognition Communication Workshop. IEEE, 2015:1434-1439. rates of our method with fractals for 15 categories are generally [4] T. P. Fries, “Fuzzy clustering of network traffic features for security,” Large Data Analysis and Visualization, IEEE, 2015:127-128. higher than others, and the average recognition rate of our [5] R. Alshammari, A. N. Zincir-Heywood, “Identification of VoIP method is maintained around 92%, while methods of Bayes, encrypted traffic using a machine learning approach,” Journal of King Saud University - Computer and Information Sciences, 2015, 27(1):77-92. [6] Mohd A B, Nor S B M, “Towards a flow-based internet traffic

5 http://www.donkey.org; http://www.corp.aol.com/; http://www.limewire.com classification for bandwidth optimization,” International Journal of http://www.oschina.net/p/gnutella/; http://fasttrack.microsoft.com/; http:// Computer Science & Security, 2014, 3(2):146-153. kazaa.brothersoft.com/; http://directconnectps.com/;

[7] S. Hao, J. Hu, S. Liu, et al, “Improved SVM method for internet traffic classification based on feature weight learning,” International Conference on Control, Automation and Information Sciences. IEEE, 2015. pp. 102-106. [8] L. T. Hu and L. J. Zhang, “Real-time internet traffic identification based on decision tree,” World Automation Congress (WAC), IEEE, 2012:1-3. [9] H. H. Aghdam, E. J. Heravi and D. A. Puig, “Practical approach for detection and classification of traffic signs using Convolutional Neural Networks,” Robotics & Autonomous Systems, 2016, 84:97-112. [10] M. Iliofotou, H. C. Kim, M. Faloutsos, et al, “Graption: A graph-based P2P traffic classification framework for the internet backbone,” Computer Networks the International Journal of Computer & Telecommunications Networking, 2014, 55(8):1909-1920. [11] T. Wiradinata and P. A.Suryaputra, “Clustering and principal feature selection impact for internet traffic classification using K-NN,” in Proceedings of Second International Conference on Electrical Systems, Technology and Information 2015 (ICESTI 2015), pp.75-81 [12] A. Hajjar, J. Khalife and J.Díaz-Verdejo, “Network traffic application identification based on message size analysis,” Journal of Network & Computer Applications, 2015, 58:130-143. [13] J. Zhang, C. Chen, Y. Xiang, et al, “Internet traffic classification by aggregating correlated naive bayes predictions,” IEEE Transactions on Information Forensics & Security, 2013, 8(1):5-15. [14] Ezeobiejesi J, Bhanu B, “Latent Fingerprint Image Segmentation Using Fractal Dimension Features and Weighted Extreme Learning Machine Ensemble[C],” IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE Computer Society, 2016:214-222. [15] W. E. Leland, M. S. Taqqu, W. Willinger, et al, “On the self-similar nature of Ethernet traffic (extended version),” IEEE/ACM Transactions on Networking, 1994, 2(1):1-15. [16] H. He, J .Wang, H. Wei, et al, “Fractal behavior of traffic volume on urban expressway through adaptive fractal analysis,” Physica A Statistical Mechanics & Its Applications, 2016, 7:518-525. [17] Li Y, Wang L F, Zeng S D, et al, “Local Fractional Laplace Variational Iteration Method for Fractal Vehicular Traffic Flow,” Advances in Mathematical Physics, 2014, 2014(2014):7.