<<

Cognitive Network Management and Control with Significantly Reduced State Sensing

Arman Rezaee, Student Member, Vincent W.S. Chan, Life Fellow IEEE, Fellow OSA Claude E. Shannon Communication and Network Group, RLE Massachusetts Institute of Technology, Cambridge MA, USA Emails: armanr, chan @mit.edu { }

Abstract—Future networks have to accommodate an increase should be used to connect different origin-destination (OD) of 3-4 orders of magnitude in data rates with very heterogeneous pairs. In practice, various shortest path algorithms (Bellman- session sizes and sometimes with strict time deadline require- Ford, Dijkstra, etc.) are used to identify the optimal shortest ments. The dynamic nature of scheduling of large transactions and the need for rapid actions by the Network Management path. The correctness of these algorithms requires the assump- and Control (NMC) system, require timely collection of network tion that after some period of time the system will settle state information. Rough estimates of the size of detailed net- into its steady state. We refer interested readers to [4] for a work states suggest large volumes of data with refresh rates thorough explanation of these issues. With the dynamic nature commensurate with the coherence time of the states (can be of current and future networks, the steady state assumption is as fast as 100 ms), resulting in huge burden and cost for the network transport (∼300 Gbps/link) and computation resources. particularly not valid, nonetheless the basic functional unit of Thus, judicious sampling of network states is necessary for these algorithms can be adapted to address the new challenges. a cost-effective network management system. In this paper, Shortest path routing algorithms assign a length/weight to we consider a construct of an NMC system where sensing each link of the network; this length is usually a proxy for and routing decisions are made with cognitive understanding traffic congestion on the link but can also incorporate other of the network states and short-term behavior of exogenous offered traffic. We have studied a small but realistic example factors. Depending on the specific implementation, the length of adaptive monitoring based on significant sampling techniques. may depend on the number of packets waiting in the queue, This technique balances the need for accurate and updated state loading of the output line, or the average packet delay on information against the updating cost and provides an algorithm the link during a pre-specified amount of time. Neighboring that yields near optimum performance with significantly reduced nodes exchange their estimated shortest distances to all other burden of sampling, transport and computation. We show that our adaptive monitoring system can reduce the NMC overhead nodes periodically and new information will be disseminated by a factor of 100 in one example. The essential spirit of the throughout the network after a few rounds of exchanges. cognitive NMC is that it collects network states ONLY when Suppose the routing table for each node contains the they matter to the network performance. available paths to each destination and the latest estimates of their lengths, as shown in Table I. Given this table, a I.INTRODUCTION node can identify the shortest path to all destinations, but We are in the midst of a major technological storm that more importantly it can use the relative difference between will change the landscape of networking for years to come. the length of the shortest path and that of other paths to According to Cisco, IP video traffic will be 82 percent of determine the likelihood that the identity of the shortest path all IP traffic by 2021, up from 73 percent in 2016 [1], [2]. changes in the near future. As a result, as opposed to The same reports forecast live video to grow 15-fold while virtual reality and augmented reality traffic will increase 20- Table I: Partial routing table maintained by node A fold in the same period. The bursty and dynamic nature of the Destination Path Length B → C → D → Y 2 traffic generated by these applications require quick (100 ms- Y E → F → G → Y 3 1 s) network adaptation to maintain quality of service and H → I → Y 8 experience [3]. Unfortunately, current Network Management J → K → Z 4 Z and Control (NMC) systems are much too slow and their L → M → Z 7 operational paradigm does not scale well with network size N → O → P → Z 9 and traffic intensity. updating the whole table at a minimum rate required for all To illustrate the efficacy of our cognitive management dynamic situations, each entry of the routing table will be approach, we will use an example that primarily focuses on updated only when it can be of significant value to the optimal challenges involved with ‘shortest path’ routing in dynamic operation of the network. It is in this sense that we refer networks. In this context, an NMC system has to monitor the to our technique as significant sampling. To illustrate this state of queues throughout the network to decide which path(s) principle, we will focus primarily on a simplified model where This work is sponsored by NSF NeTS program, Grant No. #6936827 and the an OD pair is connected via exactly two independent paths HKUST/MIT programs. with time varying lengths/weights. Within this framework, we develop an adaptive monitoring system that determines the X t value of updating the length/weight of a given path as well X(t) ( n) the cost associated with the updating process. This allows X t the NMC system to optimally allocate its resources to collect ( 0) and disseminate information regarding the state of network elements according to their importance. The rest of the paper is organized as follows: Section II introduces the general model and establishes the framework through which the monitoring process is optimized. t t τ

Section III-A assumes a Wiener process as the end-to-end t0 t1 t2 ··· tn−1 n delay process and evaluates the benefits and shortcomings ··· T1 T2 Tn of this model. Section III-B applies the framework to the n τ much richer Ornstein-Uhlenbeck process. Section IV extends Figure 2: Illustration of samples taken during a period . the results to general networks. A summary and concluding X(ti−1) > 0 and suppose that the OD pair uses P1 as the route remarks are provided in Section V. during [ti−1,ti]. Then the cost of error during this period is simply the of X(t) after the process experienced a II. PROBLEM SETUP –GENERAL MODEL sign change ti Suppose that an OD pair is connected via two independent − C[t ,t ] = − X (t) dx P P i−1 i paths 1 and 2 as illustrated in Figure 1, and denote the ti−1 stochastically evolving weight of by Xi(t). 1 where X−(t) is the negative part of the function defined as X−(t)=(X(t) −|X(t)|)/2 3; W . When the stochastic nature of the underlying process is known, a distribution is induced over possible sample paths of X(t), and we can use this 2 D distribution to compute the expected cost of error during this E C period, denoted by [ti−1,ti] . 3; W We shall now introduce the notion of updating cost to Figure 1: An OD pair with two independent paths. capture the efforts required to collect and disseminate state Let us use X(t) to denote the that information throughout the network so that all routing tables results from subtracting the weights of the two paths, i.e., are up-to-date. Google Maps offers a great example of various X(t)=X2(t) − X1(t). Clearly, continuous optimal routing costs associated with such efforts, as the system gathers can be achieved if we know whether or not X1(t) 0 P2 Recall that is the optimal route if and is the For a given time horizon [0,τ], an optimal offline sampling X(t) < 0 optimal route if . So the communicating OD pair strategy will identify the optimal number of samples and their X(t) will experience an excess cost if the process changes corresponding epochs to minimize the expected total cost. sign between two sampling epochs and the transmission route Alternatively, we may choose to minimize the cost per unit is not adapted. Let us use C[t ,t ] to denote the cost of i−1 i time for an infinite time horizon. These possible approaches [ti−1,ti] such errors during . Without loss of generality, assume can be summarized as 1X (t) i can be the rate of transmission (messages/s) times the expected argmin E CTotal Pi delay/message on , which would give it units of delay/s. n,{t1<···

C[ti−1,ti](x) to denote the cost of error during [ti 1, ti] given (i.e., as ρ 1) can be approximated by a one-dimensional − → that X(ti 1) = x. Then the most informative formulation reflected Wiener process, also known as the reflected Brownian − ∗ computes the optimal updating period Ti as motion [6], [7]. Modeling the waiting time of customers as   a Wiener process satisfies our modeling criteria because 1) ∗ c + E C[ti−1,ti](x) T (x) = argmin Wiener process is stable, hence the waiting time of customers i T Ti>0 i in cascaded series of independent links would itself be a Furthermore, if X(t) is Markovian, then the Wiener process, 2) uncertainty in the realization of a sample distribution of its trajectory is only a function of path of a Wiener process grows with time. its last observed value. Hence, for Markov processes We shall define and note a few properties of the Wiener     process and refer the interested reader to [8] for a thorough E C[ti,ti+τ](x) = E C[0,τ](x) . As a result, for such processes we can treat every sampled value as if it had investigation of general properties of the Wiener process. occurred at time zero. Using C (x) as a shorthand for T Definition 1. A real valued stochastic process W (t): t 0 C (x), we can rewrite our optimization as { ≥ } [0,T ] is a Wiener process with a start at x R if the following hold:   ∈ ∗ c + E CT (x) W (0) = x T1 (x) = argmin (1) • T >0 T the process has . • for all t 0 and h > 0, the increments W (t+h) W (t) This concludes our discussion of the general setting of • ≥ − the problem and the relevant optimization formulations. The are normally distributed with zero mean and h. the function W (t) is continuous almost surely. following section focuses on appropriate delay models. • A Wiener process with a start at III.DELAY MODELS Lemma 1. W (t): t 0 0, has following pdf and cdf functions{ ≥ } Queues constitute one of the basic building blocks of a  2  communication network and have been extensively studied 1 x fW (t)(x) = exp − for decades [5]. While queuing theory has provided immense √2πt 2t insight into the operation of data networks, it has struggled 1   x  FW (t)(x) = 1 + erf in providing tractable expressions that deal with transient 2 √2t behavior of queues (which is of immense importance to us!). 2 where 2 R β e z . Referring to the transient behavior of an M/M/1 queue, erf(β) = √π 0 − dz Kleinrock notes: “This last expression is disheartening. What Recall that we used X(t) to denote the stochastic process it has to say is that an appropriate model for the simplest that results from subtracting the weights of the two paths, i.e. interesting queueing system [i.e. the queue] leads to M/M/1 X(t) = X (t) X (t). If each X (t) is approximated as an an ugly expression for the time-dependent behavior of its state 2 1 i independent Wiener− process, we can see that X(t) is another probabilities” [5]. Hence, we advocate an approach whereby Wiener process with a start at X(0) = X2(0) X1(0) and a operational insights can be provided through judiciously cho- variance equal to the sum of the of the− two processes. sen alternative models which lend themselves to easier analysis Without loss of generality, suppose that X(t) has a variance and can be insightful as engineering guidelines for the design of α2t for some α R, and assume that X(0) = x > 0. This of future networks. Desirable transient models of delay in data is equivalent to assuming∈ X(t) = x + αW (t). As before, we networks should have the following characteristics: will assume that the OD pair uses P1 as the shortest route Stability - the stochastic model of delay should be stable • until the next update time at t1. Noting that Wiener process so that linear combination of two or more such processes is Markovian and using CT (x) as a shorthand for C[0,T ](x), will remain in the same family of distributions. we can see that routing through P1 for T seconds will incur Stochasticity - capture the rise in uncertainty about state • the following cost of error of the network as more time passes from last observation. Simplicity - provide a formulation that is amenable to Z T Z T • analysis and/or numerical computation. CT (x) = X−(t) dx = min X(t), 0 dt − 0 − 0 { } It is often beneficial to approximate the process with an ap- Z T  x propriate process. The inherent structure of diffusion = xT α min W (t), − dt − − 0 α with an of h i Z T   x E CT (x) = xT α E min W (t), − dt 10 − − 0 α Since x/α is a constant and is independent of W (t), we can 5 compute the distribution of the minimum as t      x 20 40 60 80 100 120 140 x FW (t)(x) for x < −α Pr min W (t), − x = x α ≤ 1 for x − ≥ α -5 Hence, -10   x √t  x2  E min W (t), − = exp 2 α −√2π −2tα -15 x   x  1 + erf − 2α α√2t Figure 3: Sample path of a Wiener process, and associated Putting it all together we have samples for c = 1. √ 2 2  2  ∗ h i T 2T α + x x which can be solved numerically to obtain the optimal T1 E CT (x) = exp 3α√2π −2α2T for any given x. It can be shown that solving Eq. (3) is  x  xT x3  approximately equal to solving the following equation for T erfc + − α√2T 2 6α2  T 3  x2 = T ln which can be used with Eq. (1), as restated below, to get the 18πc2 optimal sampling period Figures 3 depicts a sample path of X(t) = 1 + W (t) (i.e.   ∗ c + E CT (x) X(0) = 1 and α = 1). Red vertical lines are drawn to specify T1 (x) = argmin T >0 T the updating epochs as computed numerically according to Eq. (3) for a sampling cost of c = 1. Notice that we should Interestingly, the shortest possible sampling period can be update more frequently when the process is close to zero. This computed by noting that C (x) is strictly decreasing in x, E T is because when X(t) 0 the delay on both paths are very and thus for any T > 0 the expected cost of error is minimized similar and any small≈ perturbation can change the identity at x = 0. Solving the first order optimality condition gives us of the shortest path. Much more insight can be gained from 1  2  3 ∗ ∗ 18πc Figure 3, but for brevity and to avoid repetition we’ll postpone min T1 (x) = T1 (0) = (2) x α2 this discussion to Section III-B. This is an important quantity as it constitutes a maximum up- B. Ornstein-Uhlenbeck Process dating frequency for all “fixed-period” (i.e. uniform) updating The Wiener process, W (t), discussed in Section III-A strategies. In other words, if the NMC updates occur any faster, lacks stationarity and rapidly wanders to infinity. The absence the cost of updates will be larger than the potential gains from of stationarity makes it impossible to reason about long- identifying the optimal route. Furthermore, Eq. (2) shows that term average behavior of our algorithm. To address this increasing the updating cost c by a factor of γ reduces the ∗ deficiency, we will leverage the heavy-traffic results of Halfin frequency of updates (i.e. ) by a factor of 2/3; while 1/T1 γ and Whitt [7], which show that when service times are expo- increasing parameter has the inverse of that effect. α nentially distributed, the sequence of appropriately normalized Noting the complexity of the general optimization, let us queue lengths will converge to the Ornstein-Uhlenbeck (OU) use a simple first order method to approximate the optimal ∗ process. It is well known that the OU process is the only non- updating period . Furthermore, to simplify our notation, we T1 trivial process that is simultaneously Gaussian, Markov, and will assume that in the remainder of this section. α = 1 stationary; all of which are ideal for our purposes. Ornstein-  2  √T 2T + x2 exp x Uhlenbeck process is the continuous time analogue of the   − 2T E CT (x) α = 1 = discrete time auto-regressive AR(1) process and satisfies the | 3√2π following stochastic :  x  xT x3  erfc + − √ 2 6 dX(t) = θ(µ X(t)) dt + σ dW (t) 2T − hence, where σ > 0 is the standard deviation of the process and  ! ∂ c + E CT (x) α = 1 W (t) denotes a standard Wiener process. Parameter µ is the = ∂T T long-term mean of the process, and θ > 0 signifies the mean-

2 reversion speed. Parameter θ may seem obscure at first glance, 2 x 3  ! 1 √T T x e− 2T x x and the reader may wonder which network characteristic is c + − + erfc = 0 (3) T 2 − 3√2π 6 √2T captured by this parameter. To answer this question, note that many mechanisms are employed to steer the network towards Using the same process used in Section III-A to compute the a desirable stable state. As an example, consider the role expectation, we obtain of congestion control in TCP, which regulates the rate of  2  h n oi √y −h h  h  packet transmission to achieve a high level of link utilization E min W (y), h = e 2y + erfc −√2π 2 √2y while maintaining fairness and low delay. The magnitude of θ roughly captures the strength and speed of such mechanisms which simplifies the expected cost of error to and the behavior of exogenous traffic. We should note that θT    x e− 1 E CT (x) = − linear combination of independent OU processes with the θ 2θT same θ results in another OU process with the same θ, while e 1  2  σ Z − y θx variance and long-term mean parameters would be added in a √ + 3 exp 2 dy linear fashion. 4θ√θπ 0 (1 + y) 2 −yσ e2θT 1 ! Let us suppose that weights of paths P1 and P2 can be x Z − 1 √θx approximated by independent OU processes with the same θ; + 3 erfc dy 4θ (1 + y) 2 −√yσ then X(t) = X (t) X (t) is another OU process. A full 0 2 − 1 treatment of the general OU process is possible but due to It comes as no surprise that E[CT (x)] is the central quantity length limitations and to simplify our notation we shall focus that dictates the behavior of the system, yet it presents a on the case where the difference process, X(t), has a long- formidable challenge to intuition. Fortunately, E[CT (x)] lends term mean of µ = 0. This corresponds to the case where itself to a piecewise linear approximation that sheds light on the long-term mean delays of both paths are similar/equal. its behavior. Let us start by evaluating E[CT (x)] when x = 0, Without loss of generality suppose X(0) = x > 0 and assume     θT 2θT 2θT σ ln e +√e 1 √1 e− that the OD pair use P1 as the optimal route until the first   − − − E CT (0) = update epoch at T . Then the solution to the aforementioned 2θ√θπ stochastic differential equation can be written recursively as despite the relative simplicity of this expression, we cannot Z t find an analytic solution to the following objective function, X(t) = xe θt + σ e θ(t s) dW − − − s   0 ∗ c + E CT (0) which can be represented (conditioned on X(0) = x) via a T1 (0) = argmin T >0 T time-scaled Wiener process as θt to obtain the first order optimality condition, let us differentiate θt σe− 2θt  X(t) = xe− + W e 1 the aforementioned expression and expand it as a Taylor series, √ 2θ −  ! ∂ c + E CT (0) c σ p As a result, the cost of error associated with mis-identifying = + + (T ) the shortest path during this period can be computed as ∂T T −T 2 3√2π√T O Z t1 Z T setting the sum of the first two terms equal to zero gives us, CT (x) = X−(t) dx = min X(t), 0 dt 1 − 0 − 0 { }  2  3 θT  ∗ 18πc x e− 1 T (0) = = − 1 σ2 θ Z T  θt  incidentally, this expression is identical to Eq. (2) which σe− 2θt  θt min W e 1 , xe− dt captured the shortest sampling period of a Wiener process. − √2θ − − 0 This should not be surprising because at x = 0, the OU process and is not experiencing any mean reversion and is effectively Z T  θt  σe− 2θt  θt indistinguishable from a Wiener process. min W e 1 , xe− dt − 0 √2θ − − The aforementioned result can be used to create a piecewise n o 2θT x√2θ linear approximation to [CT (0)]. We can avoid the need for Z e 1 min W (y), − E σ − σ an excessive number of linear segments by noting that we are = − 3 3 dy (4) ∗ 2 2 (2θ) 0 (1 + y) only interested in the behavior of the function near T1 (0), and e2θT 1 thus the simplest such approximation can be stated as σ Z − min W (y), h = − 3 { 3 } dy (5)   ∗  (2θ) 2 0 (1 + y) 2   0 for T 0,T E CT (0) ∗  ∈ ∗ Equation (4) is the result of a simple change of variable ≈ m T T for T > T 2θt − for y = e 1, and (5) simplifies the notation by defining ∗ −2θT ∗ ∗ ∂E[CT (0)] σ√1 e √ − where m = = − and T = T (0). h = x 2θ/σ. Hence, ∂T 2√θπ 1 − T =T ∗ θT    x e− 1 E CT (x) = − To generalize our results to non-zero values of x, note that θ 2θT ! e 1  ∗  Z ∂E CT (x) σ − E [min W (y), h ]  ∗   ∗  dy E CT (x) E CT (0) + x 3 { 3 } ≈ ∂x − (2θ) 2 0 (1 + y) 2 x=0 where ∗   θT ∂E CT ∗ (x) e− 1 6 = − ∂x 2θ x=0 4

Note that E[CT (x)] is a non-negative quantity, and is 2 decreasing in x. As a result, we only need to consider the t effects of x on the non-zero portion of the approximation, and 20 40 60 80 100 120 140

find its corresponding interception point with the horizontal -2 axis. Putting it all together we have:  -4   0 for T [0, f(x)] E CT (x)  ∈ ≈ m T f(x) for T > f(x) -6 − ∗ ∗ 1 e−θT √π where f(x) = T + − ∗ σθ x √1 e−2θT Figure 4: Sample path of an OU process with σ = 0.5, θ = This approximation shows− that the initial routing decision 0.025, x = 1, c = 0.1. will remain correct for approximately f(x) seconds; con- 0 sequently the expected cost of error during this period is lines are drawn to specify the updating epochs as suggested approximately 0. As we pass the threshold of f(x) seconds, by our algorithm and the red dots denote the sampled values it becomes likely that the originally selected path is no longer of the function. Notice that when the process is far from zero, optimal and routing erroneously through it will incur a cost indicating that the delay of one of the paths is significantly of m units per second. Continuing with our approximation, less than the other, we do not need to sample their weights  c frequently. This matches our intuition because in this scenario   for T [0, f(x)] c + E CT (x)  T ∈ it is unlikely for the shorter/better path to get worse than the c mf(x) T ≈  m + − for T > f(x) second path in a short period of time. On the other hand, T when the process is close to zero, indicating that the weight of The approximate cost shows that if the update both paths are very similar, we require more frequent samples. epoch T occurs at or before f(x), we will only incur the The varying frequency of updates is revealed via the changing updating cost c, amounting to a cost rate of c/T . However, if density of the vertical red lines in the figure. the updating epoch occurs after f(x), then the cost includes the We should additionally note that the shortest updating updating cost c, as well as the cost of error which accrues at period occurs when the two paths have identical weights, a rate of m units per second. Mathematically speaking, when i.e., x = 0. This corresponds to an updating period of ∗ 2 21/3 T [0, f(x)], the objective function is decreasing with T and T1 (0) = 18πc /σ . Comparing our adaptive updating reaches∈ its minimum at T = f(x). On the other hand, when method to a uniform one that samples at this rate, we see T > f(x), an increase or decrease in the objective function a gain of depends on the sign of c mf(x). If this expression is greater ∗ ∗   R ∞ − E T1 (x) 0 T1 (x)f X(t) (x) dx than zero, it will reach its infimum at infinity; otherwise the G = ∗ = ∗ | | minimum will occur at T = f(x). Putting it all together we T1 (0) T1 (0) have where f X(t) (x) denotes the pdf of the (one-sided) OU    process.| Recalling| that the OU process is Gaussian we have c + E CT (x) f(x) if c < mf(x) argmin r  2  T >0 T ≈ else θ θx ∞ f X(t) (x) = 2 exp for x 0 This gives us a clear description of the approximate al- | | πσ2 − σ2 ≥ gorithm: if the cost of updating routing information is small After some algebra we get ∗ θT enough, i.e., if c < mf(x), then we should update the routing 1 1 e− 1 G = 1 + − (7) tables by sampling the process at T = f(x). Otherwise, the ∗ p ∗ T 1 e 2θT θ√θ cost of updating is too large and we should continue our − − 1 σ 1/3  1 1/6 previous routing decisions without any new samples. 1 + As a result, if we constrain ourselves to cases were the ≈ θ c 144π ∗ updating cost c < mf(x), we get a simple expression for the where the approximation is accurate when θT is small. time until the next updating epoch, T , as a function of the last Note that the gain is inversely proportional to θ, and observed value of the process, x, namely as such it improves unboundedly as θ goes to zero. This ∗ θT is due to the fact that θ represents the strength of mean- ∗ ∗ 1 e √π − (6) T1 (x) = T + p − ∗ x reversion for this process (and is inversely proportional to the 1 e 2θT σθ − − coherence time of the process). As a result, when θ is small the Figures 4 depicts a sample path of an OU process with σ = process is not strongly attracted to its long-term mean, which 0.5, θ = 0.025 and an initial value of X(0) = 1. Red vertical significantly reduces the chances of crossing the horizontal 100 pairs, the algorithm simply picks the smallest sampling time from those computed for all OD pairs. 50 Not only is the proposed algorithm simple to understand and σ = 0.5 implement, it also addresses an often-overlooked byproduct of traditional routing protocols. It was shown in [9] that routing σ = 5 10 updates can inadvertently become synchronized, causing insta-

Gain bility as well as untimely and unmanageable bursts of traffic. 5 To address such issues, a whole host of ad-hoc randomization procedures have been incorporated into commercial routers. In contrast, our algorithm requires each node to independently 1 measure the variance of delay on each of its outgoing links. Given the unique geographical location of each node within the 0.01 0.05 0.1 0.5 1 5 θ network and expected difference in their measurements, it is Figure 5: Gain of our adaptive sampling strategy (c = 1). unlikely for the routing updates to synchronize, thus avoiding the need for explicit randomization procedures. axis. For processes with small θ, our adaptive algorithm will automatically adopt a lower sampling rate resulting in V. CONCLUSION a significant reduction in sampling and updating costs. This In this paper, we introduced a new algorithm that allows makes sense since processes with long coherence time are very us to capture the tradeoff between monitoring and optimal predictable and do not require much sampling. The gain will operation of a network. We introduced the concept of sam- also increase, though at a lower rate, when σ increases, which pling/updating cost to capture the cost associated with the amplifies the wandering behavior of the process. Figure 5 collection and dissemination of routing information within a depicts the gain as computed by Eq. (7). network. We further studied two stochastic models of delay, namely the Wiener process and the Ornstein-Uhlenbeck pro- IV. GENERALIZATIONS TO LARGE NETWORKS cess and showed that we can dynamically adjust the sampling Our analysis has so far focused on a single OD pair with times of each link based on their instantaneous significance two independent paths. In this section we will show that the to network management. The gain (as reduction in number basic operation can be extended to general networks. Recall of samples) over the traditional uniform sampling was 100 that the shortest path can be identified through a sequence in one example. We concluded our remarks by extending our of pairwise comparisons of available paths. Furthermore, the notions to general networks and suggest that a network can length of each path can be updated at a time determined by be operated in a decentralized fashion at high performance our algorithm. To see the operation of such a system, let us using significant sampling and reporting of its network states, augment the routing table of I with an additional column to allowing the NMC system to be scalable. The essential spirit track the variance of delay on each path, as shown in Table II. of the cognitive NMC is that it collects network states ONLY when they matter to the network performance.

Table II: Partial routing table maintained by node A REFERENCES [1] “The zettabyte era: Trends and analysis,” June 2017. [Online]. Avail- Destination Path Length Variance able: https://www.cisco.com/c/en/us/solutions/collateral/service-provider/ B → C → D → Y 2 4 visual-networking-index-vni/vni-hyperconnectivity-wp.html Y E → F → G → Y 3 5 [2] “Cisco vni: Forecast and methodology, 2016-2021,” June H → I → Y 8 1 2017. [Online]. Available: https://www.cisco.com/c/en/us/ J → K → Z 4 2 solutions/collateral/service-provider/visual-networking-index-vni/ Z L → M → Z 7 4 complete-white-paper-c11-481360.html N → O → P → Z 9 7 [3] V. W. Chan and E. Jang, “Cognitive all-optical fiber network architecture,” in Transparent Optical Networks (ICTON), 2017 19th International Conference on. IEEE, 2017, pp. 1–4. Given the routing table in Table II, node A can query nodes [4] D. Bertsekas and R. Gallager, Data Networks (Second Edition.). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1992. B and E, which are the first hops on the two shortest paths to [5] L. Kleinrock, Queueing Systems Volume I: Theory. New York: John node Y , about their respective distances to node Y according Wiley & Sons, 1975, vol. 1. to our sampling formula [6] J. F. C. Kingman, “The single server queue in heavy traffic,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 57, no. 4, pp. ∗ 1 θT  2  3 902–904, 1961. ∗ 1 e− √π ∗ 18πc [7] S. Halfin and W. Whitt, “Heavy-traffic limits for queues with many T1(x) = T + p − ∗ x, T = 1 e 2θT σθ σ2 exponential servers,” Operations research, vol. 29, no. 3, 1981. − − [8] P. Morters¨ and Y. Peres, . Cambridge University Press, where x = 3 2 = 1, σ = √4 + 5 = 3, and θ is a global 2010. − [9] S. Floyd and V. Jacobson, “The synchronization of periodic routing parameter that depends on network protocols and network size. messages,” IEEE/ACM transactions on networking, vol. 2, no. 2, pp. 122– The same procedure can be used to compute the updating time 136, 1994. for all other paths. When a link is shared between multiple OD