Queuing Theory with Heavy Tails and Network Traffic Modeling
Total Page:16
File Type:pdf, Size:1020Kb
Queuing theory with heavy tails and network traffic modeling Yu Li To cite this version: Yu Li. Queuing theory with heavy tails and network traffic modeling. 2018. hal-01891760 HAL Id: hal-01891760 https://hal.archives-ouvertes.fr/hal-01891760 Preprint submitted on 9 Oct 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Queuing theory with heavy tails and network traffic modeling Yu Li* *Faculty of Science, Technology and Communication , University of Luxembourg Abstract Traditional queuing theory fails to model network traffic because of the different nature of Internet. To be more precise, Internet traffic exhibits heavy tail phe- nomenon, the inter arrival time is not exponential and the traffic volume is not poissonian. Most of network traffic models are empirically established in absence of mathematical description. In this paper we establish a queuing theory with heavy tails and present a mathematical model of network traffic. In this model traffic ratio is Pareto distributed and volume traffic has a shifted logarithm Erlang distribution, or more generally, a logarithm Gamma distribution. Furthermore, we derive the distribution of inter arrival times. Keywords. heavy-tailed distribution, truncated power law, Pareto distribution, loga- rithm Erlang distribution, maximal entropy, queuing theory, traffic models 1 Introduction Computer network has been intensively studied for decades. The goal of network traffic modeling is to provide simple but accurate methods for the purposes of network analysis, network design, network management and services evaluation and protocols improvement. Because of high complexity and high randomness, traditional models fail to capture the behavior of internet traffic and the traditional queuing theory does not apply: the process of packets arrivals is not Poissonian, the inter-arrival times are not exponentially distributed [8]. Heavy-tail phenomenon in network traffic has been observed in various studies and is of great importance for network capacity planning and traffic characteristics analyz- ing. To describe to heavy tailedness of network traffic, the self-similarity model is the mostly widely used model which captures the feature of the heavy tail [6, 13, 11]. In the self-similar model, traffic process is time scale independent. However, real world internet traffic exhibits burstiness in a statistical sense only over several time scales, and hence self-similar model can not capture the essential nature of network traffic [8]. Autoregres- sive Integrated Moving Average (ARIMA) process is used to model traffic process[9]. 1 However, ARIMA model is short tail. FARIMA model can capture both short tail and heavy tail, but it is difficult to reproduce [12]. Recently, an empirical study of LogPh model was established, based on the study of Wifi network [4]. In [2] the double Pareto Lognormal model was proposed which exhibit double Pareto lognormal distributions. Most of the studies of network traffic are empirically established and are based on empirical observations rather than on mathematical explanation. They seem to be cred- ible but do not explain the mechanism of heavy tail phenomenon in network traffic. In this paper we present a mathematical modeling of network traffic, based on two simple assumptions and derive that in this model traffic ratio is Pareto distributed and volume traffic is logarithm Erlang distributed. Furthermore we derive the distribution of inter arrival time. 2 Queuing theory and heavy tails In queuing theory, inter arrival time and traffic volume are two most important concepts and they form a natural duality. Inter arrival time is a measurement used in queuing theory is understood as the time interval between the arrival of two consecutive packets. It is calculated for each data packets after the first and is often averaged to get the mean inter arrival time. In classical queuing theory, inter arrival time τ is modeled with exponential distribution P [τ < x] = 1 − e−λx But problems have appeared over time with this model. Network traffic has exhibits bursty phenomenon over a wide range of time scales. Various investigations demon- strated that packet inter arrival time follows truncated power law. For small time interval it follows a power law and it can be modeled with Pareto distribution. Over large time scale, Lognormal distribution fits the real world data better than Pareto distribution [1, 3, 10]. Volume process xt is modeled with Poisson process in classical queuing theory and the defining characteristic of such a process is the exponentially distributed time intervals between two consecutive packets (λt)n P [x = n] = e−λt t n! However, recent studies have shown that the statistical assumptions underlying this queuing theory may not always be satisfied in practice and traditional queuing theory fails to model network traffic. In real world, the volume process exhibits heavy tailedness. Heavy tailedness is a long observed phenomenon in network traffic and numerous studies provide evidence of heavy tail in network traffic. Roughly speaking, heavy tail distribution are those distributions which have no exponential decay. In other words they have heavier tail than exponential distribution. Mathematically speaking, a random variable X is said to have a heavy tail if there exists a positive parameter α such that P [X > x] = x−αL(x) (1) 2 where α is called tail index and L denotes a slowly varying function L(tx) lim = 1; 8t > 0 x!+1 L(x) 3 A stochastic model of network traffic Central limit theorem and lower bound The central limit theorem (CLT) is the most important theorem in probability. It states the sum of a large number of independent, identically distributed variables from a finite-variance distribution will tend to be normally distributed. The mean of all samples from the same population will be approximately equal to the mean of the population. However, the central limit theorem \erases" the trace of lower bound [7]. For a sequence of independent and identically distributed random variables fXngn, which are all bounded from below Xn ≥ −a. Due to the central limit theorem, the random variable Pn X kp=1 k n is asymptotically normally distributed and is not bounded from below even if all com- ponents Xk are lower bounded by −a. Nevertheless, the mean is lower bounded by −a even if n is arbitrarily large Pn X k=1 k ≥ −a; 8n > 0 n In order to rediscover this lost lower bound and we present a stochastic model with a certain lower bound. Traffic volume We fix a time unit and let fxtgt denote the traffic process with respect to time t and define x = 1 and r denote the traffic ratio r = xt . Furthermore, we make two simple 0 t t xt−1 assumptions for the logarithm traffic ratio h = ln xt : t xt−1 Assumption 1. h = fhtgt are independent and identically distributed with finite mean. Assumption 2. h = fhtgt are all lower bounded by a negative number, −a ≤ ht ≤ +1. The assumption of bound is not very restrictive because the lower bound b can be arbitrarily chosen. We shall see that the characteristics of heavy tail in network traffic can be derived from these two simple assumptions. The only maximal entropy distribution of ht is exponential distribution of the form (see appendix) −λ(x+a) P [ht < x] = 1 − e (2) 3 and the traffic ration rt −λ −λa P [rt < x] = P [ht < ln x] = 1 − x e (3) with density function f(x) = λe−λax−λ−1 The mean and variance of ratio are λe−a E[r ] = t λ − 1 λe−2a σ2(r ) = ; λ > 2 (4) t (λ − 2)(λ − 1)2 As the sum of independent exponential distributed random variables has a Erlang dis- Pt tribution, so i=1 hi is Erlang distributed from (2) with a shift term bt " t # t−1 i X X λi(x + at) P h < x = 1 − e−λ(x+at) i i! i=1 i=0 Then for traffic volume xt " t ! # h Pt i ( hi) X P [xt > x] = P e i=1 > x = P hi > ln x i=t t−1 i X λi(ln x + at) = x−λ e−λat (5) i! i=0 | {z } L(x) Pt Since ln xt = i=1 hi is Erlang distributed, the xt can be called logarithm Erlang. Obviously, the function L(x) in (5) is a slow varying function satisfying (1) and xt has a heavy tail. As known, Erlang distributions can be expressed in terms of Gamma functions (see appendix B) and formula (5) is equivalent to γ (t; λ(ln x + at)) P [x < x] = (6) t Γ(t) where γ is an incomplete Gamma function Z x γ(k; x) = tk−1e−tdt 0 and Γ a complete Gamma function Z 1 Γ(k) = tk−1e−tdt 0 4 0:1 LogNormal distribution LogGamma distribution 8 · 10−2 6 · 10−2 4 · 10−2 2 · 10−2 0 0 10 20 30 40 50 60 x Figure 1: LogNormal distribution and LogGamma distribution So the traffic volum xt is logarithm Gamma distributed with mean and second order moment " t # t Y λe−a E[x ] = E r = t i λ − 1 i=1 " t #2 t Y λ E[x2] = E r = e−2at ; λ > 2 (7) t i λ − 2 i=1 Now we have reached the following theorem: Theorem 3.1.