Towards An Explanatory Model for Network Traffic
Jorge Gonzalez Joshua Clymer Chad Bollmann Dept. of Mathematical Sciences SEAP Intern Dept. of Electrical & Computer Engineering Florida Atlantic University Naval Postgraduate School Naval Postgraduate School Boca Raton, FL, USA Monterey, CA, USA Monterey, CA, USA [email protected] [email protected]
Abstract—This work presents two explanatory mathematical tailed distributions per the Generalized Central Limit Theorem models explaining how network traffic features that display Gaus- (GCLT). sian tendencies in single devices and small networks aggregate The primary contribution of this work is outlining our to alpha-stable processes in larger networks. The first model shows how self-similarity originates from an impulsive-noise- explanation of device traffic aggregation to an alpha-stable based representation of individual processes. A second model or Gaussian result. We use this theory to develop a basic uses renewal processes to justify impulsive process aggregation aggregation model that leads to alpha-stable distributed traffic. to alpha-stable or Gaussian end states and permits estimating network traffic alpha-stable rates of convergence. We develop II.THEIMPULSEMODEL a model based on this first method to empirically validate this aggregation approach. The goal of this model is to explain the aggregation of traffic Index Terms—alpha-stable, computer network traffic model, by looking inside the sub-windows where aggregations occur. heavy-tail, self-similar We define an impulse as a group of time stamps (packets) within a sub-window that are related by a unique source and I.INTRODUCTION destination IP pair (ip0, ip1). Different impulses are assumed We propose explanatory, complementary models for ag- to be independent. gregated network traffic, starting from the device level. Our Impulses are ordered in such a way that P(Yi = a) does purpose is to understand how network traffic aggregates from not depend on i, where Yi is the volume of the ith impulse. individual sources and investigate the tendency of certain The total volume of traffic in a sub-window is given by network traffic features including packet rate to trend towards the sum of the impulses within it, the number of which is alpha-stable [1], [2] in larger networks, while the same features described by a distribution E. V denotes the distribution of exhibit non-parametric or Gaussian distributions in smaller the volumes of all impulses. Traffic in a generic sub-window networks. is given by S = Y1 + Y2 + ... + Ye, where e sampled The central theme of our approach is an impulse-based from E. The Yi’s are assumed to be independent since they model of traffic from individual processes on a single device. represent the volume of different processes within a sub- We observe that these traffic processes frequently are relatively window and are also identically distributed by construction. periodic and consistent in terms of volume, as illustrated in Figure 1, a rate plot of traffic to a single, centrally- managed device. The impulse characterization of traffic is 9000 justified by the fact that even long transmissions (e.g., a video on YouTube) are sent as short bursts of traffic rather than 7500 constant-rate, lengthier transmissions. Intuitively, the aggregation and perturbation of these traffic 6000 impulses from hundreds of devices leads to the frequently- observed self-similarity in aggregated network traces [3], [4]. Rate [packets/s] Rate 4500 Present throughout the trace: Music Buffering These models have historically relied on Levy´ processes Network and Windows Backups 3600-5200 pps 80, 200 pps {Xt : t ≥ 0}, including Poisson and fractional Brownian motion [5]; we propose an alternative Levy´ process, the alpha- 3000 stable, due to its deep theoretical connections with well- YouTube Buffering 600 - 1200 pps Local Routines documented characteristics of traffic. More specifically, Levy´ 1500 1200 and 200 pps processes are SS if and only if they are alpha-stable [6] and are the domain of attraction of random variables with heavy- 0 750 900 0 150 300 Time [s] 450 600
This work was funded by the NPS Naval Research Program under proposal NRP-18-039A. Joshua Clymer is with the ASEE Science and Engineering Figure 1. Rate plot of several minutes of traffic received at local host with Apprenticeship Program. periods of music and video streaming. Their common distribution is approximated by V . We provide and inactivity periods are shadowed by Power distributions empirical evidence that V is well approximated by heavy- for a considerably long period of time, nevertheless a shard tailed functions belonging to the domain of attraction of alpha deviation from this trend is clearly expected at some point. stable distributions. The asymptotic behavior of the aggregation S can now be studied using the GCLT. Specifically, we can estimate the convergence rate under the stronger assumption that Y1 lies in the strong domain of attraction of a stable distribution [7]. 1 The convergence rate serves to estimate the expected error 1 1 in the stable fits.