arXiv:1804.07051v1 [cs.SY] 19 Apr 2018 hn,dsrbtdotmzto,sohsi approximati stochastic optimization, distributed chine, oto F y3%adrdc h uu egh(rdly by delay) (or length benchmarks. queue existing the to reduce compared and -aver as 30% the 83%, by reduce NFV to able of is cost method proposed the that show nrsrcue[] 3.Ti a vi ipootoa ha disproportional provide physi avoid can shared to This optimally [3]. of able [2], top infrastructure on is functions (NFV) network (VMs), critical Virtualization machines Function virtual programmable Network with them replacing n-dp taeyi ute eindt pe h placeme the speed tradeoff to improved designed an with lear further A lar up is VNFs. the of strategy placement optimality, interrupt optimal and-adapt asymptotic the service down the alleviate slow can preserve hence timescale (re)pla to and VNFs proved have timescale While to larger timescales a multiple into relaxed be can oto F,adsaiie h akoso ewr services network of of tradeoff backlogs cost-backlog the stabilizes a and NFV, time-a new the of minimizes cost a asymptotically VMs, on of th Building decouples decisions approach time VNFs. our method, of gradient dual VNFs. operations stochastic of approa and online operations placement decentralized and fully optimal placement new between a the couplings presents on paper strong This VMs in the result of can decisions This (VM order. machines virtual correct v different a different at running (VNFs) by functions network services network provide efficiently ntepaeeto Nsa ifrn ieclsadgeneral VM. per and installed timescales VNFs different multiple be at can VNFs there where of scenarios placement s has the paper this details, on the providing from Apart analyses. est fTcnlg yny yny S 07 Australia. 2007, NSW Sydney, Sydney, [email protected] Technology of versity [email protected] Email: Australia. 2109, NSW Sydney, University Emails: Center, USA. Technology 55455 Digital MN Minneapolis, the and Engineering 2 NSW Shangh Sydney, University, Road, Macquarie Communication Engineering, Handan of of School 220 Dept. University, the Emails: Fudan (MoE), Engineering, Informa for Waves and Laboratory Electromagnetic Key Science, of Data and Communication 71700;U S rns1005 589,1236 n 1 and Inno 1423316, the 1508993, Commi an 1509005, and Technology grants Research and NSF 2017YFB04034002, Science Key US grant 17510710400; Municipal National Shanghai China the of of Program 61671154, Program grant opment China of dation eerhOgnzto CIO,Sde,NW22,Australi 2122, NSW Sydney, (CSIRO), [email protected]. Organization Email: Research oki hspprwsspotdb h ainlNtrlScie Natural National the by supported was paper this in Work ato hspprhsbe rsne n[]wtotdetailed without [1] in presented been has paper this of Part .P i swt h colo lcrcladDt Engineerin Data and Electrical of School the U with Macquarie is Engineering, of Liu P. School Industrial the R. with is and Collings B. Scientific I. an Electrical of Dept. Commonwealth the with are Giannakis the B. G. and Chen with T. is Ni W. .Ce n .Wn r ihteSaga nttt o Advanc for Institute Shanghai the with are Wang X. and Chen X. ne Terms Index eopigddctdhrwr rmntoksrie and services network from hardware dedicated Decoupling Abstract ut-iecl nieOtmzto fNetwork of Optimization Online Multi-Timescale ioigCe,WiN,Tay hn anB Collings, B. Iain Chen, Tianyi Ni, Wei Chen, Xiaojing { 31709,xwang11 13210720095, ucinVrulzto o evc Chaining Service for Virtualization Function NtokFnto itaiain(F)cncost- can (NFV) Virtualization Function —Network NtokFnto itaiain ita ma- virtual Virtualization, Function —Network e igLiu, Ping Ren .I I. NTRODUCTION [ ǫ, } fdneuc. .Ce sas ihthe with also is Chen X. @fudan.edu.cn.; 1 /ǫ [ ǫ, ] o any for , { log eirMme,IEEE Member, Senior hn87 georgios chen3827, 2 ( ǫ ) / √ > ǫ ǫ ] ueia results Numerical . 0 gicn extensions ignificant u approach Our . .au. } @umn.edu. 0,Australia. 109, fMinnesota, of inScience tion mi:Ren- Email: Computer d on. so grant ssion application rosand proofs i China. ai, c Foun- nce 442686. Devel- d niversity, rdware verage ,Uni- g, Science real- e hfor ch n erisB Giannakis, B. Georgios and , irtual vation ions. with )in s) age ced ger the cal nt n- ed a. riaso yai rcn noaccount. into pricing servi heuristic dynamic random limitin or These taken thereby arrivals have manner, [13]. them centralized of network none a Moreover, in scalability. maximize the run or of to need cost, snapshot methods or a flowtime at minimize revenue to algorith network Greedy developed [9]. optimality of were sacrificing develo by delay was MILP approach solve the genetic to heuristic minimize A [9]. to chains service formulated programmi was servi linear integer persistent (MILP) mixed of NP-complete assumption [12]. routin the arrivals and under processing services, on network decision-makings optimal studied optimization. routing account flow cannot or [11] ordering in function for model the con However, bi-criteria problem; with approximations. Assignment provided were Generalized solutions and of near-optimal ad- generalization Location a [11] Facility as in formulated the network was cloud work problem capacitated placement a The in The VNFs network supported. of and placement be the were them dressed cannot admitting services chains VMs network the service where at persistent processed services, of instantly assumption network the of under focused VNFs [10] arrivals of in work placement The the placement. on VNF on works previous eiin vrtm 7,[] te hlegsas include also [9 challenges designs centralized Other from [8]. resulting scalability [7], limited opt time of leverag pricing over couplings of the the decisions possibility on implies The depending variability providers. resource VM, capacity temporal service a link the of of service the policy the and can for Prices vary services, [6]. traffic also concurrent network from stemming of VMs prevai between arrivals stochasticity hand, be the other the to in On VNFs coupled. indi of are of decisions VMs sequence optimal ual the the service, of given network per routing hand, executed and one netw large-scale processing On in on especially platforms. the VM, decisions each and at [5]. services VNFs, online network software) of optimal placement (i.e., making the instances different from at arise VNF order predefined Challenges different a in running run be VMs to need soft- which VNFs, into and devices hardware network [4]. dedicated moves by ware of which out out carried hardware, formerly functions dedicated task and netw virtualized as virtual a proprietary a is quickly Particularly, (VNF) adapt [4]. function evolve and functions functions, network short-lived on investments elw IEEE Fellow, nti ae,w rps e praht distributed to approach new a propose we paper, this In works recent account, into chains service network Taking in captured been not have and problems open are These ewr evc o evc hi)cncnito multiple of consist can chain) service (or service network A i Wang, Xin , elw IEEE Fellow, eirMme,IEEE Member, Senior , ]. stant imal vid- ped ork ork ing ms ng ce ce ls g g 1 . 2 online optimization of NFV, where asymptotically optimal decisions on the placement and operation of NFV are sponta- #4*'2#%!3 neously generated at individual VMs. Accounting for random service arrivals and time-varying prices, a stochastic dual %!# # $','&4+/.#/(# $','&4+/.#/(# gradient method is developed to decouple optimal decision- ( (/2#2/54+.)## ( !)* . $)* .(/2#2/54+.)## makings across different VMs and different time slots. It +"%$ "&$ "', minimizes the time-average cost of NFV while stabilizing +"& $ "' $ "%,

+" $ "&$ "', the queues of VMs. The gradients can be interpreted as the %!##7+4*# % +"& $ "'$ "%, %":# " backlogs of the queues at every VM, and updated locally % 9

9 ( by the VM. With a proved cost-delay tradeoff [ǫ, 1/ǫ], the $','&4+/.#/(# !)* .(/2# 02/&'33+.)# proposed method is able to asymptotically approach the global .&/-+.)#15'5'3 !( #54)/+.)#15'5'3 $( optimum which would be generated offline at a prohibitive )* )* complexity, by tuning the coefficient ǫ. +"& $ "' $ "% , $'26+&'#/5405438 Another important contribution is that we extend the distributed online optimization of NFV to two timescales, where the placement of VNFs is carried out at the VMs Fig. 1. An illustration on the processing and routing procedure of network services within VM n, where f ∗ indicates processed VNFs. at a much larger time interval than the operations of the VNFs. This can effectively alleviate the interruptions that the TABLE I placement/installation of VNFs can cause to network service VARIABLES provisions. We prove that the optimality loss of the two- timescale placement and operation of NFV is upper bounded, Basic Variables and the asymptotic optimality of the proposed distributed i Qkn Queue length of -i network online optimization is preserved. Other contribution is that services to be processed by VNF we further speed up the placement of VNFs and improve the fk at VM n i 2 qkn Queue length of type-i network cost-delay tradeoff to [ǫ, log (ǫ)/√ǫ] by designing a learn- services after processed at VM n and-adapt approach, where the statistics of network dynamics and to be further processed by VNF is learned from history with increasing accuracy. Corroborated fk at downstream VMs Ri,t i by numerical results, the proposed approach is able to reduce kn Arrival rate of new type- network services to be processed by VNF the time-average cost of NFV by 30% and reduce the queue fk at VM n t lengths (or delays) by 83%, as compared to existing non- αn Time-varying price for service pro- stochastic approaches. Accounting for service chaining in cessing at VM n t β[a,b] Time-varying price for service stochastic NFV scenarios, the proposed algorithm is important routing over link [a, b] and practical. Decision Variables

In a different yet relevant context, stochastic optimization ekn Ability of VM n to process fk i i has been developed for single- or multi-timescale resource uk,ab Transmit rate of Qka over link [a, b] allocation [14]–[20], routing [21], [22], and service i i vk,ab Transmit rate of qka over link [23] in queueing systems to deal with stochastic arrivals of [a, b] i workloads or energy. In particular, backpressure routing algo- pkn Processing rate of VM n for type- rithms were developed in [21], [22] to maximize throughput i network services to be processed or minimize time-average costs in distributed (wireless) mesh by VNF fk networks, while stabilizing the queues across the networks. Yet, the backpressure algorithms were only focused on the online optimization of the placement of VNFs and the pro- routing of workloads, and did not involve any workload cessing/routing of network services is developed. Optimal processing. An energy-efficient offloading algorithm was pro- placement and operation of NFV are investigated at two posed for mobile computing in [23], which can dynamically different timescales in Section IV to prevent its interruptions offload part of an application’s computation request to a to network service provisions. Numerical tests are provided dedicated server. None of the existing approaches [14]–[23] in Section V, followed by concluding remarks in Section VI. have taken sequentially chained network services into account. Notations in the paper are listed in Table I. Distinctively different from the existing methods, our approach is able to process and offload/forward sequentially chained network services which strongly couple the optimal decisions II. SYSTEM MODEL of VMs on routing and processing in time and space (i.e., Consider a platform consisting of N VMs, supporting K among VMs). In particular, our approach decouples the strong VNFs, and operating in a (possibly infinite) scheduling horizon coupling, optimizes both the processing and offloading of consisting of T slots with a normalized slot duration “1”. The chained network services, and substantially reduces the queue VMs can be located separately at different host servers, or lengths (or delays) by taking a learn-and-adapt strategy. co-located at the same host server. Assume that every VM The rest of this paper is organized as follows. In Section II, can admit network services, and output the results; see Fig. 1. the system model is described. In Section III, the distributed This is consistent with existing designs of NFV systems. Let 3

= 1,...,N collect the N VMs. Let fk (k =1,...,K) be processed by VNF fk at time slot t. We have N { } denote the -th VNF which can only be processed at the VMs ∗ k i i i max p (t) 0, p (t)ekn(t)= p ∗ (t) p , i, k, n, running the corresponding software. kn ≥ kn k n ≤ n ∀ i,k Assume that every VM runs a single VNF. (The extended X (2) scenario with a VM running multiple VNFs will be discussed max where pn is the maximum processing rate of VM n, and a in Section V-B). Let ekn(t) 0, 1 , k, n, t, denote the type- service which is to be processed by ∗ , is selected to ∈ { } ∀ i fk ability of VM n to process fk at time t. Let ekn(t)=1, be processed. if VM n is installed with VNF fk; and ekn(t)=0, otherwise. Therefore, the queue length of type-i network services to We have k ekn(t)=1, n,t. be processed by VNF fk at VM n follows, i, k, n, t, ∀ ∀ Let collect all possible types of network services, and i i i i I P Q (t +1) = max Q (t) u (t) p (t)ekn(t), 0 each type of network service is to be processed by a permuted kn { kn − k,nb − kn } b∈N sequence of f1,...,fK or its subset. A network service X { } i i i,t needs to traverse among multiple VMs, until the service are + uk,an(t)+ vk,cn(t)+ Rkn. processed by the related VNFs in the correct order. a∈N c∈N X X (3) We design up to 2 K service queues at each VM n. The queue length of type-i network services after processed (For analytic tractability,|I| we assume here that the VMs have by VNF fk′ at VM n, and to be processed by VNF fk at sufficiently large memories, and the queues do not overflow. downstream VMs follows, i, k, n, t, Nevertheless, one of the constraints we consider in this paper ∀ i i i i is that the time-average lengths of the queues are finite. As q (t +1) = max q (t) v (t), 0 + p ′ (t)ek′n(t), kn { kn − k,nd } k n will be proved in Section III-A, the time-average lengths of d∈N X the queues can be adjusted and reduced through parameter (4) i reconfiguration.) Half of the queues buffers network services where qkn(t)=0 (k ) in the case that the type-i network services complete processing∈∅ at the terminal VM n and are to be processed by fk (k = 1,...,K); and the other half output from the platform. buffers the results of the first half after processed by fk and to be routed to downstream VMs for further processing. Let We define the network is stable if and only if the following i is met [24]: Qkn(t) denote the queue lengths of type-i network services to be processed by VNF f at VM n per slot t. Let qi (t) T k kn 1 denote the queue lengths of type-i network services after lim E[ A(t) ] . (5) T →∞ T | | ≤ ∞ processed by VNF f ′ at VM n, and to be processed by t=1 k X VNF fk at downstream VMs per slot t. Here, fk′ denotes Considering the processing and routing cost of network the VNF which needs to be run prior to fk for type-i network services in the platform, we define the total cost of routing i services; Q(t) = Qkn(t), n ,i , k = 1,...,K , services over all links and running VNFs on all VMs per slot i { ∀ ∈ N ∈ I } q(t) = qkn(t), n ,i , k = 1,...,K , and t, as given by: { ∀ ∈ Ni,t ∈ I } A(t)= Q(t), q(t) . Let R Rmax denote the arrival rate { } kn ≤ t t t t t t t t t t t (in the number of services per slot) of new type-i network Φ (e , p , u , v ) := φ1(e , p )+ φ2(u , v ), (6) max services to be processed by VNF fk at VM n, where R is t t i t where e = ekn(t), k,n , p = pkn(t), i,k,n , u = the maximum arrival rate. i { ∀ } t {i ∀ } uk,ab(t), i, k, [a,b] , and v = vk,ab(t), i, k, [a,b] . In We assume that any VM n or directional link [a,b]( a,b { t t ∀ t } t i { 2 ∀ t t }t ∀ ∈ (6), φ1(e , p )= i,k,n αn(pkn(t)ekn(t)) and φ2(u , v )= ) can only process or transmit a single type of network βt [(ui (t))2 + (vi (t))2] are the costs that the N i a,b,i,k [a,b] k,abP k,ab service per slot. Let uk,ab(t) denote the transmit rate (in the network service provider charges for usages of VMs and links, i number of services per slot) of Qka(t), over directional link respectively,P e.g., following a quadratic pricing policy [25], i [a,b] at time slot t. Let vk,ab(t) denote the transmit rate of [26]. For non-elastic network services, e.g., video streaming i qka(t), over directional link [a,b] at time slot t. It is easy to and multimedia applications, these with urgent demand for see that at any time t, we have high resources (i.e., bandwidths and CPUs) would charge for t high prices [25]. Here, αn is the time-varying price for service t i i processing at VM n and β[a,b] is the time-varying price for uk,ab(t) 0, vk,ab(t) 0, i, k, [a,b], ≥ ≥ ∗∀ ∗ service routing over link [a,b]. i i i i max [u (t)+ v (t)] = u ∗ (t)(or v ∗ (t)) l , k,ab k,ab k ,ab k ,ab ≤ ab i,k III. DISTRIBUTED ONLINE OPTIMIZATION OF PLACEMENT X (1) AND OPERATION OF NFV The objective of this paper is to minimize the time-average max where lab is the maximum transmit rate of link [a,b], and a cost of NFV on a network platform while preserving the type-i service which is to be processed by VNF fk∗ , is selected stability of the platform [cf. (5)], under random network to be forwarded. service arrivals and prices. This can be achieved by mak- i Let pkn(t) denote the processing rate of VM n (in the ing stochastically optimal decisions on processing or routing number of services per slot) for the type-i network services to network services at every VM and time slot in a distributed 4 fashion. Let xt := et, pt, ut, vt and := xt, t . The case can be obtained by generalizing the delayed Lyapunov problem of interest is{ to solve } X { ∀ } drift techniques [24]). Then, we can rewrite (9) as Φ˜ ∗ = min E Φt( (st); st) (10a) T X { X } ∗ 1 t t Φ = min lim E Φ (x ) s.t. ui (st) 0, X T →∞ T { } (7) k,ab t ≥ ∗ ∗ =1 i t i t i t i t X ∗ ∗ s.t. (1), (2), (3), (4), (5), t [uk,ab(s )+ vk,ab(s )] = uk ,ab(s )(or vk ,ab(s )) ∀ i,k X max (10b) where the expectation of t xt is taken over all random- lab , Φ ( ) ≤ ∗ i,t i t i t t i t max nesses. The service arrival rate R , i,k,n,t and the p (s ) 0, p (s )ekn(s )= p ∗ (s ) p , { kn ∀ } kn ≥ kn k n ≤ n routing and processing prices αt ,βt , [a,b],n,t are all i,k n [a,b] X random. { ∀ } (10c) E i t i t i,t [ uk,an(s )+ vk,cn(s )+ Rkn a c X∈N X∈N i t i t t A. Dual gradient and asymptotic optimality u (s ) p (s )ekn(s )] 0, (10d) − k,nb − kn ≤ b X∈N It is difficult to solve (7) since we aim to minimize the aver- i t t i t E[p ′ (s )ek′n(s ) v (s )] 0, (10e) age cost over an infinite time horizon. In particular, the queue k n − k,nd ≤ d∈N dynamics in (3) and (4) couple the optimization variables in X i t i t i t time, rendering intractability for traditional solvers. Combining where pkn(s ) := pkn(t),ekn(s ) = ekn(t),uk,ab(s ) := ui (t), vi (st) := vi (t), [a,b],i,k,n, and (3) and (4) with (5), however, it can be shown that in the long k,ab k,ab k,ab ∀ Φt( (st); st) := Φt(xt). Formulation (10) explicitly term, the service processing and routing rates must satisfy the X following necessary conditions of queue stability [24] indicates the dependence of the decision variables et, pt, ut, vt on the realization of st. { t } t t t t T Let denote the set of e , p , u , v satisfying con- 1 E i i i,t straintsF (1) and (2) per t,{ while λi } and λi de- lim [ uk,an(t)+ vk,cn(t)+ Rkn kn,1 kn,2 T →∞ T t=1 a c note the Lagrange multipliers associated with the con- X X∈N X∈N i i straints (10d) and (10e). With a convenient notation λ := u (t) p (t)ekn(t)] 0, i, k, n. − k,nb − kn ≤ ∀ λi , λi , i,k,n , the partial Lagrangian function of (10) b∈N { kn,1 kn,2 ∀ } X (8a) is given by T L( , λ) := E[Lt(xt, λ)] (11) 1 i i E ′ ′ X lim [pk′n(t)ek n(t) vk,nd(t)] 0, i,k,k , n. T →∞ T − ≤ ∀ t=1 d where the instantaneous Lagrangian is given by X X∈N (8b) t t t t i i L (x , λ) := Φ (x )+ λkn,1(t)( uk,an(t) i,k,n a∈N As a result, (7) can be relaxed as X X i i,t i i + v (t)+ R u (t) p (t)ekn(t)) k,cn kn − k,nb − kn c b T X∈N X∈N ˜ ∗ 1 E t t i i i Φ = min lim Φ (x ) , + λkn, (t)(pk′ n(t)ek′n(t) vk,nd(t)). (12) X T →∞ T { } (9) 2 − t=1 i,k,k′,n d X X X∈N s.t. (1), (2), (8), t. ∀ Notice that the instantaneous objective Φt(xt) and the in- stantaneous constraints associated with λ are parameterized Compared to (7), problem (9) eliminates the time coupling by the observed state st at time t; thus the instantaneous among variables A(t), t by replacing (3), (4) and (5) with Lagrangian can be written as t xt λ st λ st , { ∀ } L ( , ) = L( ( ), ; ) (8). Since (9) is a relaxation of (7) with its optimal objective and L( , λ)= E[L( (st), λ; st)]. X Φ˜ ∗ Φ∗, if one solves (9) instead of (7), it is prudent to As aX result, the LagrangeX dual function is given by derive≤ the optimality bound of Φ∗, provided that the solution D(λ) := min L( , λ), (13) for (9) is feasible for (3), (4) and (5), as will be shown in t t {x ∈F }t X TheoremX 1. and the dual problem of (9) is: maxλ D(λ), where “ ” We can take a stochastic gradient approach to solv- ≥0 is defined entry-wise. ≥ ing (9) with asymptotical optimality guarantee. Concate- t For the dual problem, we can take a standard gradient nate the random parameters into a state vector s := ∗ i,t method to obtain the optimal λ [27]. This amounts to running [R , αt ,βt , [a,b],i,k,n]. For analytic tractability, st is kn n [a,b] the following iterations slot by slot assumed to be independent∀ and identically distributed (i.i.d.) t i i + across slots. (In practice, s can be non-i.i.d. and even corre- λkn,1(t + 1) = [λkn,1(t)+ ǫgλi (t)] , i, k, n, (14a) kn,1 ∀ lated. In such case, the algorithm proposed in this paper can i i + λkn,2(t + 1) = [λkn,2(t)+ ǫgλi (t)] , i, k, n. (14b) be readily applied. Yet, performance analyses of the non-i.i.d. kn,2 ∀ 5 where ǫ > 0 is an appropriate stepsize. The gradient g(t) := Assume that there exists a stationary policy i i [g i (t),g i (t), i,k,n] can be expressed as and E[ u (t) + v (t) + λkn,1 λkn,2 a∈N k,an c∈N k,cn ∀ Xi,t i i Rkn b∈N uk,nb(t) pkn(t)ekn(t)] ζ, and E i i i,t i − P − i P ≤ − gλi (t)= [ uk,an(t)+ vk,cn(t)+ R E ′ ′ kn,1 kn [pk n(t)ek n(t) d∈N vk,nd(t)] ζ, where ζ > 0 a c P − ≤ − X∈N X∈N is a slack vector constant. Then all queues are stable, and i i P u (t) p (t)ekn(t)], (15a) the time-average queue length satisfies: − k,nb − kn b X∈N E i ′ i gλi (t)= [pk′n(t)ek n(t) vk,nd(t)], (15b) kn,2 − T d∈N 1 E i i 1 X lim [Qkn(t)+ qkn(t)] = ( ). (18b) t t t t t T →∞ T O ǫ where x := e , p , u , v is given by t=1 i,k,n { } X X t t t x = arg min L (x , λ). (16) Proof. See Appendices A and B. xt∈F t Note that a challenge associated with (15) is sequentially taking expectations over the random vector st to compute the Theorem 1 asserts that the time-average cost of (10) ob- gradient g(t). This would require high-dimensional integration tained by the stochastic dual gradient approach converges to an over an unknown probabilistic distribution function of st; or (ǫ) neighborhood of the optimal solution, where the region Oof neighborhood vanishes as the stepsize ǫ 0. The typical equivalently, computing the corresponding time-averages over → an infinite time horizon. Such a requirement is impractical tradeoff from the stochastic network optimization holds in this case [24]: an (1/ǫ) queue length is necessary, when an (ǫ) since the computational complexity could be prohibitively O O high. close-to-optimal cost is achieved. Different from [24], here the To bypass this impasse, we propose to rely on a stochastic Lagrange dual theory is utilized to simplify the arguments, as dual gradient approach, which is able to combat randomness shown in Appendices A and B. in the absence of the a-priori knowledge on the statistics of Remark 1. The asymptotic approximation of the proposed E variables. Specifically, dropping from (15), we propose the distributed online approach to the cost lower bound achieved following iterations offline in a posterior manner is rigorously proved. The lower ˜i ˜i i i bound corresponds to the assumption that all the random- λkn,1(t +1)= λkn,1(t)+ ǫ[ uk,an(t)+ vk,cn(t) a c nesses are precisely known in prior and the optimal decisions X∈N X∈N i,t i i + over infinite time-horizon are all derived. This lower bound + R u (t) p (t)ekn(t)] , (17a) kn − k,nb − kn would violate causality and be computationally prohibitive to b X∈N achieve, even in an offline fashion, given an infinite number ˜i ˜i i ′ i + λkn, (t +1)= λkn, (t)+ ǫ[pk′n(t)ek n(t) vk,nd(t)] , of variables. Theorem 1 indicates that the proposed approach 2 2 − d X∈N can increasingly approach the lower bound by increasing the (17b) tolerance to queue backlogs or delays. where λ˜t = λ˜i (t), λ˜i (t), i,k,n collects the stochas- { kn,1 kn,2 ∀ } tic estimates of the variables in (14), and xt(λ˜) is obtained by solving (16) with λ replaced by λ˜t, i,k,n. B. Distributed online implementation Note that the interval of updating (17)∀ coincides with slots. The dual iteration (17) coincides with (3) and (4) for In other words, the update of (17) is an online approximation λ˜i (t)/ǫ = Qi (t) and λ˜i (t)/ǫ = qi (t), i,k,n,t; this of (14) based on the instantaneous decisions xt(λ˜t) per kn,1 kn kn,2 kn can be interpreted by using the concept of virtual∀ queue of slot t. This stochastic approach becomes possible due to the this parallelism [24]. With λ˜i (t) substituted by ǫQi (t) decoupling of optimization variables over time in (9). kn,1 kn ˜i i and λkn,2(t) substituted by ǫqkn(t), we can obtain the desired Relying on the so-called Lyapunov optimization technique t in [24], we can formally establish that: x (A(t)) by solving the following problem: t Theorem 1. If s is i.i.d. over slots, then the time-average 1 t t i i i min Φ (x )+ Qkn(t)[ uk,an(t)+ vk,cn(t) cost of (10) with the multipliers updated by (17) satisfies xt∈F t ǫ i,k,n a c X X∈N X∈N T −1 i,t i i ∗ 1 t t ∗ + R u (t) p (t)ekn(t)] Φ lim E Φ (x )) Φ + ǫB (18a) kn − k,nb − kn ≤ T →∞ T ≤ b∈N t=0 X X i i i i   + q (t)[p ′ (t)e ′ (t) v (t)]. where 9 max max 2 3 max2 max2 , max kn k n k n − k,nd B = 2 (N l ) + 2 (R + p ) N i,k,k′,n d∈N max max X X is the maximum degree of VMs, l = max[a,b] lab and (19) max max ∗ p = maxn pn ; Φ is the optimal value of (7) under Through rearrangement, (19) is equivalent to any feasible control policy (i.e., the processing and routing decisions per VM), even if that relies on knowing future t t t t min [f1(e , p )+ f2(u )+ f3(v )] (20) xt∈F t realizations of random variables. i,k,n,b X∈N 6 where Algorithm 1 Distributed Online Optimization of NFV t 1: for t =1, 2 ... do t t αn i 2 i i i f1(e , p ) = [ (p (t)) (Q (t) q ′′ (t))p (t)]ekn(t); ǫ kn − kn − k n kn 2: Each VM n observes the queue lengths of its own and (21a) its one-hop neighbors. t t β[n,b] i 2 i i i 3: Install VNF fk at VM n based on (25). f2(u ) = [ (u (t)) (Q (t) Q (t))u (t); (21b) ǫ k,nb − kn − kb k,nb 4: Repeatedly send network services to the VM processor t t β[n,b] i 2 i i i or outgoing links with the minimum non-zero queue-price f3(v )= (v (t)) (q (t) Q (t))v (t). (21c) ǫ k,nb − kn − kb k,nb objectives in (24) and (27), using the optimal rates derived Here, fk′′ denotes the VNF to be processed after fk for type-i in (23) and (26), until either the processor and all outgoing network services. links are scheduled or the remaining objectives are all Problem (20) can be readily solved by decoupling between zero. i i i 5: Update i and i for all nodes and services ekn(t), pkn(t), uk,nb(t) and vk,nb(t), and between the VMs. Qkn(t) qkn(t) Specifically, (20) can be decoupled into the following subprob- via the dynamics (3) and (4). lems per VM or per inter-VM link: 6: end for t t min f1(e ; p ), (22a) et,pt t " $( " $) min f2(u ); (22b) ut t min f3(v ). (22c) "!#$ !" "!#$ !" vt #$%&$' #$%&$' Problem (22a) is a mixed integer programming. Its solution t t "!#$ ! "!#$ !! can be obtained by comparing the minimums of f1(e , p ) separately achieved under ekn(t)=0 and 1. In the case of t t ekn(t)=0, f1(e , p )=0. In the case of ekn(t)=1, (22a) becomes the minimization of a quadratic function of pi (t), kn "!#$ ! where the optimal solution is given by #$%&$' i i ∗ ǫ(Q (t) q ′′ (t)) i kn k n max "!#$ ! pkn (t) = min max −t , 0 ,pn , i, k. ! { { 2αn } } ∀ (23) Then, the optimal objective of (22a) can be obtained by " $* t t substituting (23) into f1(e , p ), as given by

i i 2 Fig. 2. An illustration on VMs running multiple VNFs, where VNFs can ǫ(Q (t)−q ′′ (t)) kn k n i i be interpreted as “VMs” and VMs can be interpreted as “clusters of VMs.” i αt , if Qkn(t) qk′′n(t) > 0; P := − 4 n − Then the VM-based online optimization developed in this paper can be readily kn i i ( 0, if Qkn(t) qk′′n(t) 0. applied to the “VMs.” − ≤ (24)

Since every VM only runs a single VNF (i.e., k ekn(t) = 1, n,t), we have (27) ∀ P i ∗ 1, if k = arg mink Pkn; ekn (t)= (25) Recall that any VM n or directional link [a,b] can only (0, otherwise. process or transmit a single network service per slot. At each Problems (22b) and (22c) are the minimizations of quadratic slot, a VM can prioritize the queues of different service types t t to be processed by different VNFs, and process or functions of u and v , respectively. Like (22a) under ekn(t)= 1, the optimal solutions for (22b) and (22c) are services from the queue with the highest priority. The priority is ranked based on the objectives i , i and i in (24) i i Pkn Uk,nb Vk,nb i ∗ ǫ(Qkn(t) Qkb(t)) max and (27). For this reason, we refer to P i , U i and V i as uk,nb (t) = min max t− , 0 ,lab , i, k; kn k,nb k,nb { { 2β[n,b] } } ∀ queue-price objectives. The processing and routing decisions i i can be made by one-to-one mapping between the queues and ∗ ǫ(q (t) Q (t)) vi (t) = min max kn − kb , 0 ,lmax , i, k; outgoing links/processor to minimize the total of selected non- k,nb { { 2βt } ab } ∀ [n,b] zero objectives, as summarized in Algorithm 1. (26) Note that Algorithm 1 is decentralized, since every VM with their corresponding objectives given by only needs to know the queue lengths of its own and its im- mediate neighbors. Optimal decisions of a VM, locally made ǫ Qi t Qi t 2 ( kn( )− kb( )) i i by comparing the queue-price objectives, comply with (17) i 4βt , if Qkn(t) Qkb(t) > 0; Uk,nb := − [n,b] − and therefore preserve the asymptotic optimality of the entire  0, if Qi (t) Qi (t) 0;  kn − kb ≤ network, as dictated in Theorem 1. With decentralized decision i i 2 makings, Algorithm 1 can readily provide improved flexibility ǫ(qkn(t)−Qkb(t)) i i  t , if qkn(t) Qkb(t) > 0; i 4β n,b and scalability, alleviate signaling burden, and reduce service Vk,nb := − [ ] −  0, if qi (t) Qi (t) 0. latency for practical NFV systems.  kn − kb ≤  7

ˆi i Also note that Algorithm 1 can be readily extended to where Qkn(t) and qˆkn(t) are the approximated queue back- general scenarios where a VM runs multiple VNFs; see Fig. 2. logs. Taking the approximation of (29) and that the stochastic t i,t t t In this case, all VNFs can be first interpreted as separate variables s := [Rkn, αn,β[a,b], [a,b],i,k,n] to be invariant “VMs” in the context of the baseline case of a VNF per VM, in the coming time window, (28)∀ can be reduced to the per- and colocated VNFs at a VM then become a cluster of multiple slot problem, as given in (22a). Therefore, as per time slot “VMs.” No cost incurs on the connections between the “VMs” τ = mT∆, the VNFs can be installed at the VMs based on within the cluster. The only difference from the baseline (25), as summarized in Algorithm 1. scenario of a VNF per VM, as described in Algorithm 1, is that, between the clusters, only a pair of “VMs” which are B. Optimality loss of VNF placement respectively from the two clusters and the most cost-effective to transmit workloads, can be activated. This can be achieved We next analyze the optimality loss due to the approxima- by comparing the price weights of the links to pick up the most tion of (29). We can prove that the optimality loss is bounded cost-effective link between clusters. The optimal decisions of and does not compromise the asymptotic optimality of the processing at each of the “VMs” stay unchanged. proposed approach. To improve tractability, an upper bound of the optimality loss is evaluated in the case where all the variables et, pt, ut, vt are optimized under the assumption IV. OPTIMAL PLACEMENT AND OPERATION OF NFV AT { } DIFFERENT TIMESCALES of the availability of the full knowledge on the future T∆ time slots, rather than taking the approximation of (29). In this section, we consider a practical scenario where Based on (29), we first establish the following lemma: the placement of VNFs is carried out at the VMs at a much larger time interval, i.e., at time τ = mT∆(m = 1, 2,...), Lemma 1. At any slot t, the differences between the approx- rather than on a slot basis. This is because the installation of imated and actual queue backlogs in (29) are bounded by VNFs at the VMs could cause interruptions to network service i ˆi provisions. We prove that if the placement and the operation Qkn(t) Qkn(t) T∆ωQ, (30a) i − i ≤ of NFV are jointly optimized at two different timescales, qkn(t) qˆkn(t) T∆ωq, (30b) − ≤ the aforementioned asymptotic optimality of the proposed max max where the constants ωQ = max N l + approach can be preserved. max max max max { p , 2N l + R , and ωq = max N maxlmax,pmax . } A. Two-timescale placement and operation { } Proof. For any two consecutive slots t and t + 1, the dif- By evaluating (22) at two different timescales, the placement i ference of the queue backlogs is bounded, i.e., Qkn(t + of VNFs, and the processing and routing of network services, i 1) Q (t) ωQ, where ωQ is the maximum differ- can be carried out as follows: − kn ≤ ence between the departure and the arrival of Q(t) , denoted • Placement of VNFs at a T -slot interval: At time slot ∆ by max N max lmax + pmax, 2N maxlmax + Rmax . According τ = mT∆, each VM n decides on the VNF to install { } t t to (29) and the inequality a + b a + b , we have to minimize the expectation of the sum of f1(e , p ) in i ˆi i | i| ≤ | | |t | Qkn(t) Qkn(t) = Qkn(t) Qkn(τ) = t =τ [Q(t0 + (22a) over the time window t = τ,...,τ + T∆ 1 , − − 0 τ+T∆−1 t t { − } 1) Q(t0)] (t τ)ωQ T∆ωQ, where τ = mT∆ and i.e., E f1(e , p ) , as given by − ≤ − ≤ P t=τ t = τ,...,τ + T 1 . Therefore, (30a) is proved. Likewise, ∆ −  P (30b) can be proved. τ+T∆−1 t E αn i 2 i min [ (pkn(t)) (Qkn(t) Based on Lemma 1, we are ready to obtain the following et ǫ − t=τ i,k theorem: n X X i i q ′′ (t))p (t)]ekn(τ) . (28) Theorem 2. The optimality loss of the solution for (20) due to − k n kn the approximation of the queue backlogs in (29) is bounded, • Processing and routing of network serviceso per slot t: Per i.e., slot t, each VM processes and routes network services, t t t t following Algorithm 1, given the placement decisions of [f1(eˆ , pˆ )+ f2(uˆ )+ f3(vˆ )] VNFs given by (28). i,k,n,b∈N i X • Queue update: Each VM updates its queues Qkn(t) and t t t t i [f1(e , p )+ f2(u )+ f3(v )] ǫC, (31) qkn(t) at every slot t based on (3) and (4). − ≤ i,k,n,b∈N Note that the optimal solutions to (28) require future knowl- X i,t 2 2 1 1 2 edge on service arrivals R ,t = τ,...,τ + T 1 , and where C := ǫT∆ N K[( max + max )(ωQ + ωq) + { kn ∆ − } |I| 2α 2β the prices of service processing and routing αt ,βt ,t = 2 ω2 ]; αmax = max αt , and βmax = max βt . { n [a,b] βmax Q n,t n [a,b],t [a,b] τ,...,τ + T 1 . This would violate causality. We propose ∆ Proof. See Appendix C. to take an approximation− } by setting the future queue backlogs as their current backlogs at slot τ = mT , as given by We can see from Theorem 2 that the optimality loss of ˆi i the two-timescale control increases quadratically with the time Qkn(t)= Qkn(τ); (29a) interval of VNF placement, T∆. This allows us to balance qˆi (t)= qi (τ), t = τ,...,τ + T 1, (29b) kn kn ∀ ∆ − between the optimality loss and the cost of VNF placement. 8

Algorithm 2 Distributed Learn-and-Adapt NFV Optimization We call ǫA(t) (which is exactly λ as shown in (17)) the online 1: for t =1, 2 ... do adaptation term, since it can track the instantaneous change of 2: Online processing and routing (1st gradient): system statistics. The control variable ǫ tunes the weights of 3: Construct the effective dual variable via (34), observe these two factors. the current state st, and obtain placement, processing and The second gradient is designed to simply learn the stochas- routing decisions xt(γt) by minimizing online Lagrangian tic gradient of (13) at the previous empirical dual variable λˆt, (33). and implement a gradient ascent update as 4: Update the instantaneous queue length Q(t + 1) and q(t + 1) with xt(γt) via queue dynamics (3) and (4). ˆi ˆi i ˆi i,t λkn,1(t +1)= λkn,1(t)+ η(t)[ uk,an(λkn,1(t)) + Rkn 5: Statistical learning (2nd gradient): a∈N t t X 6: Obtain variable x λˆ by solving online Lagrangian i i i i ( ) + v (λˆ (t)) u (λˆ (t)) minimization with sample st via (36). k,cn kn,1 − k,nb kn,1 c∈N b ˆt+1 X X∈N 7: Update the empirical dual variable λ via (35). i i i + p (λˆ (t))ekn(λˆ (t))] 8: end for − kn kn,1 kn,1 ˆi ˆi i ˆi ′ ˆi λkn,2(t +1)= λkn,2(t)+ η(t)[pk′ n(λk′ n,2(t))ek n(λk′n,1(t)) vi (λˆi (t))]+ (35) As dictated in Theorems 1 and 2, the total optimality loss of − k,nd kn,2 d∈N the two-timescale approach for problem (7) is upper bounded, X as given by where η(t) is a proper diminishing stepsize, and the “virtual” allocation xt λˆt can be found by solving Φ(xˆt) Φ∗ + ǫ(B + C), (32) ( ) ≤ t t t t t where Φ(xˆt) is the time-average cost under the two-timescale x (λˆ ) = arg min L (x , λˆ ). (36) xt∈F t approach. In other words, the two-timescale placement and operation of NFV preserves the asymptotic optimality with With learn-and-adaption incorporated, Algorithm 2 takes approximated queue backlogs. an additional learning step in Algorithm 1, i.e., (35), which adopts gradient ascent with diminishing stepsize η(t) to find C. Learn-and-adapt for placement acceleration the “best empirical” dual variable from all observed network states. In the transient stage, the extra gradient evaluations While proved to preserve the asymptotic optimality, the and empirical dual variables accelerate the convergence speed larger timescale can slow down the optimal placement of of Algorithm 1; while in the steady stage, the empirical dual VNFs. We propose to speed the placement up through a variable approaches the optimal multiplier, which significantly learning and adaptation method [26]. The Lagrange multipliers ˜i ˜i reduces the steady-state queue lengths. λkn,1(t) and λkn,2(t) play the key roles in the proposed distributed online optimization of NFV in (17). We can in- Using the learn-and-adapt approach, we are ready to arrive crementally learn these Lagrange multipliers from observed at the following theorem [26, Theorems 2 and 3]. data and speed up the convergence of the multipliers driven Theorem 3. Suppose that the assumptions in Theorem 1 are by the learning process. satisfied. Then with γt defined in (34) and θ = (√ǫ log2(ǫ)), In the proposed learn-and-adapt scheme, with the online Algorithm 2 yields a near-optimal solution for (7)O in the sense ˜i ˜i learning of λkn,1(t) and λkn,2(t), n,i at each slot t, two that stochastic gradients are updated using∀ the current st. The T first gradient γt is designed to minimize the instantaneous 1 Φ∗ lim E Φt xt(γt) Φ∗ + (ǫ). (37) Lagrangian for optimal decision makings on processing or ≤ T →∞ T ≤ O t=1 routing network services, as given by [cf. (16)] X   xt(γt) = arg min Lt(xt, γt) (33) The long-term average expected queue length satisfies xt∈F t T t 2 which depends on what we term effective multiplier γ := 1 E i i log (ǫ) lim [Qkn(t)+qkn(t)] = , (38) γi (t),γi (t), n,i , as given by T →∞ T O √ǫ kn,1 kn,2 t=1 i,k,n { ∀ } X X   γt = λˆt + ǫA(t) θ , − where xt(γt) denotes the real-time operations obtained from effective multiplier statistical learning online adaptation the Lagrangian minimization (33). (34) | {z } where|λˆt{z:= }λˆi (t), λˆi (t), i,k,n is| the empirical{z dual} Theorem 3 asserts that by setting θ = (√ǫ log2(ǫ)), Algo- { kn,1 kn,2 ∀ } variable, and θ controls the bias of γt in the steady state, and rithm 2 is asymptotically (ǫ)-optimal withO an average queue can be judiciously designed to achieve the improved cost-delay length (log2(ǫ)/√ǫ). ThisO implies that the algorithm is able tradeoff, as will be shown in Theorem 3. to achieveO a near-optimal cost-delay tradeoff [ǫ, log2(ǫ)/√ǫ]; For a better illustration of the effective multiplier in (34), see [26]. Comparing with the standard tradeoff [ǫ, 1/ǫ] un- we call λˆ(t) the statistically learnt dual variable to obtain the der Algorithm 1, the learn-and-adapt design of Algorithm 2 exact optimal argument of the dual problem maxλ0 D(λ). remarkably improves the delay performance. 9

180 4000 180 3500

160 3500 170 3000 140 3000 160 120 2500 2500 100 150 2000 2000 80 Heu 140 1500 Algorithm 1 Average Cost 60 Algorithm 2 1500 Heu 1000 Steady−state cost 130 40 Algorithm 1 Heu Algorithm 2 Instantaneous queue length Heu Steady−state queue length 1000 20 Algorithm 1 500 120 Algorithm 2 Algorithm 1 0 0 Algorithm 2 0 1 2 3 4 0 1 2 3 4 110 500 Time slot 4 Time slot 4 0 2 4 6 8 0 2 4 6 8 x 10 x 10 Price variance Price variance (a) (b) (a) (b)

Fig. 3. Comparison of time-average costs and instantaneous queue lengths, Fig. 5. Comparison of steady-state costs and queue lengths under different where N = 7, ǫ = 0.1 and average arrival rate is 14 services/sec. price variances, where N = 7, ǫ = 0.1 and average arrival rate is 14 services/sec.

4 x 10 180 4 Heu 41.4 34 Algorithm 1 3.5 32 170 Algorithm 2 41.3 3 30 41.2 160 28 2.5 41.1 Heu 26 Algorithm 1 Heu 150 2 41 24 Algorithm 2 Algorithm 1 Algorithm 2 22 1.5 40.9 140 Heu Steady−state cost 20 Algorithm 1 1 40.8 Average routing rate 18 Algorithm 2 Steady−state queue length 130 Average processing rate 0.5 40.7 16

120 0 40.6 14 0.02 0.04 0.06 0.08 0.1 0.12 0.02 0.04 0.06 0.08 0.1 0.12 0 2 4 6 8 0 2 4 6 8 Parameter ε Parameter ε Price variance Price variance (a) (b) (a) (b)

Fig. 4. Comparison of steady-state costs and queue lengths under different Fig. 6. Comparison of average processing and routing rates for all network ǫ, where N = 7 and average arrival rate is 14 services/sec. services on all VMs under different price variances, where N = 7, ǫ = 0.1 and average arrival rate is 14 services/sec.

V. NUMERICAL TESTS and processing/routing rates similarly as Algorithm 1, but with Numerical tests are provided to validate our analytical the prices in (23) and (26) replaced by their means. Therefore, claims and demonstrate the merits of the proposed algorithms. the decisions are made only based on queue differences, with Two types of network services are considered on the plat- no price considerations. form with N = 7 VMs. The first type of network service Fig. 3 compares the three algorithms in terms of the time- is f ,f ,f and the second type of network service is average cost and the instantaneous queue length. It can be { 1 2 3} f3,f1,f2 . Suppose that each service has a size of 1 KB. seen from Fig. 3(a) that the time-average cost of Algorithm 2 { } t t The processing and routing prices αn and β[a,b] are uniformly converges slightly higher than that of Algorithm 1, while the max max distributed over [0.1, 1] by default; pn and lab are gener- time-average cost of Heu is about 30% larger. Furthermore, ated from a uniform distribution within [10, 20]. The default Algorithm 2 exhibits faster convergence than Algorithm 1 arrival rate of network services is uniformly distributed with and Heu, as its time-average cost quickly reaches the op- a mean of 14 services/sec. The stepsize is η(t)=1/√t, t, timal steady-state value by leveraging the learning process. the tradeoff variable is ǫ =0.1, and the bias correction vector∀ Fig. 3(b) shows that Algorithm 2 incurs the shortest queue is chosen as θ = 2√ǫ log2(ǫ). Algorithms are evaluated in lengths among the three algorithms, followed by Algorithm a two-timescale scenario, where the placement of VNFs is 1. Particularly, the aggregated instantaneous queue length of carried out every T∆ = 5 sec. In addition to the proposed Algorithm 2 is about 76% and 83% smaller than those of Al- Algorithms 1 and 2, we also simulate a heuristic algorithm gorithm 1 and Heu, respectively. Clearly, the learn-and-adapt (Heu) as the benchmark, which decides the placement of VNFs procedure reduces delay without markedly compromising the 10

processing and routing rates of Algorithms 1 and 2 rise with the growth of price variance, since the algorithms either choose a lower priced link with a higher routing rate, or a lower priced Arrival rate: 28 services/sec VM with a higher processing rate. An interesting finding is that the average backlog of Al- gorithm 2 is insusceptible to price variances; see Fig. 5(b). This is due to the fact that the algorithm, aiming to reduce the backlog of unfinished network services, achieves the aim by avoiding routing network services within the same type Arrival rate: 14 services/sec of VMs. This can also be evident from Figs. 6(a) and (b), 2 10

Steady−state cost where VNFs are processed typically at the first encountered corresponding VMs, even at higher processing rates, hence reducing routing rates. Heu Fig. 7 plots the steady-state costs of Algorithms 1 and Algorithm 1 Algorithm 2 2, and Heu, as the network size N (i.e., the number of

8 10 12 14 16 18 20 22 VMs) increases. It can be observed in Fig. 7 that under the Network size same arrival rate of services, the costs decline as the network becomes large. This is due to the increased connectivity of Fig. 7. Comparison of steady-state costs under different network sizes, where each VM, which helps increase the diversity of choosing cost- ǫ = 0.1. effective routing links and neighboring VMs with low prices and, in turn, reduce the costs. The costs increase as the average

4 arrival rate of services increases, since more resources are 10 required to accommodate the increased arrivals. Arrival rate: 28 services/sec In Fig. 8, we plot the steady-state queue lengths of Algo- rithms 1 and 2, and Heu, as the network size grows. We can see that Algorithms 1 and 2 are able to reduce the queue length

Arrival rate: 14 services/sec of the network, as compared to Heu. The reduction of Algo- rithm 1 is increasingly large, especially when the arrival rate of services is large. In addition, the queue length of Algorithm2 under the arrival rate of 28 services/sec can become lower than that under the arrival rate of 14 services/sec, as the network Arrival rate: 14 services/sec becomes large. We can conclude that the gain of Algorithm 2 3

Steady−state queue length 10 diminishes, as the network size grows with a relatively light arrival rate of services. Nevertheless, Algorithm 2 is more Arrival rate: 28 services/sec Heu Algorithm 1 efficient under heavier traffic arrivals in a large network. Algorithm 2

8 10 12 14 16 18 20 22 Network size VI. CONCLUSIONS In this paper, a new distributed online optimization was Fig. 8. Comparison of steady-state queue lengths under different network developed to minimize the time-average cost of NFV, while sizes, where ǫ = 0.1. stabilizing the function queues of VMs. Asymptotically opti- mal decisions on the placement of VNFs, and the processing time-average cost. and routing of network services were instantly generated at Fig. 4 compares the steady-state cost and queue length of the individual VMs, adapting to the topology and stochasticity of three algorithms, under different stepsize (tradeoff coefficient) the network. A learn-and-adapt approach was further proposed to speed up stabilizing the VMs and achieve a cost-delay trade- ǫ. It is observed that as ǫ grows, the steady-state costs of all 2 three algorithms increase and the steady-state queue lengths off [ǫ, log (ǫ)/√ǫ]. Numerical results show that the proposed declines. This validates our findings in Theorems 1 and 3. method is able to reduce the time-average cost of NFV by The steady-state cost and queue length are also compared 30% and reduce the queue length by 83%. under different price variances in Figs. 5(a) and (b). Here, processing and routing prices are generated with the mean of APPENDIX 0.55 and variance from 3.3 10−5 to 8.3 10−2. The costs A. Proof of (18a) in Theorem 1 and queue lengths of Algorithms× 1 and 2 decrease× as the price Proof. From the recursions in (3), we have variance increases, while those of Heu remain unchanged. i 2 i i i This is because Heu adopts price-independent processing and (Qkn(t + 1)) = [Qkn(t)+ uk,an(t)+ vk,cn(t) routing rates, while Algorithms 1 and 2 are able to minimize a c X∈N X∈N the cost by taking advantage of price differences among VMs i,t i i 2 + R u (t) p (t)ekn(t)] and links. As further shown in Figs. 6(a) and (b), the average kn − k,nb − kn b X∈N 11

(Qi (t))2 +2Qi (t)[ ui (t)+ vi (t) which leads to ≤ kn kn k,an k,cn a c X∈N X∈N T −1 i,t i i 1 t t ∗ Υ(0) ∗ Υ(0) + R u (t) p (t)ekn(t)] E[Φ (x )] Φ˜ + ǫ(B + ) Φ + ǫ(B + ). kn − k,nb − kn T ≤ T ≤ T b∈N t=0 maxXmax 2 max2 max2 X + 8(N l ) +3R +2p (18a) follows by taking T . → ∞ 2B1 where| N max is the{z maximum degree} of VMs, lmax = B. Proof of (18b) in Theorem 1 max max max max[a,b] lab and p = maxn pn . Similarly, we also have Assume that there exists a stationary policy , un- E i i X i,t i 2 i i i 2 der which [ a∈N uk,an(t) + c∈N vk,cn(t) + Rkn ′ ′ − (qkn(t + 1)) = [qkn(t)+ pk n(t)ek n(t) vk,nd(t)] i i E i ′ − b∈N uk,nb(t) pkn(t)ekn(t)] ζ, and [pk′ n(t)ek n(t) d∈N P− ≤−P − X vi (t)] ζ, where ζ > 0 is a slack vector constant, i 2 i i i d∈N k,nd (q (t)) +2q (t)[p ′ (t)ek′n(t) v (t)] P ≤− ≤ kn kn k n − k,nd we have the following lemma. d∈N P t 2 X Lemma 2. If the random state s is i.i.d., there exists a + (N maxlmax)2 + pmax . stationary control policy stat, which is a pure (possibly B P 2 2 randomized) function of the realization of st, satisfying (1) Considering| {z now the} Lyapunov function Υ(t) := and (2), and providing the following guarantees per t: 1 i 2 i 2 [ (Q (t)) + (q (t)) ], it readily follows that t 2 i,k,n kn i,k,n kn E[Φstat(x )] = Φ˜ ∗, P Υ(t) := Υ(t +P 1) Υ(t) E[ ui,stat (t)+ vi,stat(t)+ Ri,t △ − k,an k,cn kn i i i i,t a∈N c∈N Qkn(t)[ uk,an(t)+ vk,cn(t)+ R X X ≤ { kn i,stat i,stat stat i,k,n a c u (t) p (t)e (t)] ζ, X X∈N X∈N − k,nb − kn kn ≤− i i b∈N u (t) p (t)ekn(t)] X − k,nb − kn } E i,stat stat i,stat b [p ′ (t)ek′ n(t) v (t)] ζ, (39) X∈N k n − k,nd ≤− i i i d∈N + qkn(t)[pk′ n(t)ek′n(t) vk,nd(t)] + B, X { − } stat t i,k,n d∈N where Φ (x ) denotes the resultant cost, X X estat(t),pi,stat(t),ui,stat(t), vi,stat(t), [a,b],i,k,n denote where B := B + B is a constant. Taking expectations and kn kn k,ab k,ab 1 2 the{ routing and processing rates∀ under policy} stat, and adding 1 E[Φt(xt)] (xt is the optimal policy by solving (16)) ǫ expectations are taken over the randomization ofP st and to both sides, we arrive at (possibly) stat. 1 P E[ Υ(t)] + E[Φt(xt)] Proof. The proof argument is similar to that in [24, Theorem △ ǫ 1 4.5]; hence, it is omitted for brevity. B + E Φt(xt)+ ǫ [Qi (t)( ui (t)) ≤ ǫ kn k,an n,i a∈N It is worth noting that (39) not only assures that the  X X stat i i,t i i stationary control policy achieves the optimal cost for + vk,cn(t)+ Rkn uk,nb(t) pkn(t))ekn(t)] P − − (9), but also guarantees that the resultant expected cost per c∈N b∈N X X slot t is equal to the optimal time-averaged cost (due to the i i i + ǫ [q (t)(p ′ (t)ek′n(t) v (t))] t stat kn k n − k,nd stationarity of s and ). n,i d P X X∈N  Now from (27) we have 1 = B + L( (ǫA(t)),ǫA(t)) 1 ǫ X E[ Υ(t)] + E[Φt(xt)] 1 1 △ ǫ = B + D(ǫA(t)) B + Φ˜ ∗ 1 ǫ ≤ ǫ B + E Φstat(xt)+ ǫ [Qi (t)( ui,stat (t)) ≤ ǫ kn k,an n,i a where we use the definition of L( , λ) in (11); (ǫA(t))  X X∈N X X i,stat i,t i,stat i,stat stat denotes the optimal primal variable set given by (16) for λ = + vk,cn(t)+ Rkn uk,nb(t) pkn (t))ekn (t))] ∗ − − ǫA(t) (hence, L( (ǫA(t)),ǫA(t)) = D(ǫA(t))); Φ˜ denotes c∈N b∈N X X X the optimal value for problem (9); and the last inequality is i i,stat stat i,stat + ǫ [q (t)(p ′ (t)e ′ (t)) v (t))] due to the weak duality: D(λ) Φ˜ ∗, λ. kn k n k n − k,nd ≤ ∀ i,k,n d∈N Summing over all t, we then have X X  1 ∗ E i i B + Φ ζ [Qkn(t)+ qkn(t)], (40) T −1 T −1 ≤ ǫ − 1 i,k,n E[ Υ(t)] + E[Φt(xt)] X △ ǫ t=0 t=0 where the first equality holds since Algorithm 1 minimizes X X T t 1 −1 1 the instantaneous Lagrangian L in (12) among all policies = E[Υ(T )] Υ(0) + E[Φt(xt)] T (B + Φ˜ ∗) satisfying (1) and (2), including stat; and the last inequality − ǫ ≤ ǫ P t=0 is due to Lemma 1. X 12

Summing over all t, we then have The second inequality holds due to (23) and the inequality a b a + b . Likewise, we can get the optimality loss T −1 T −1 1 of| − (21b)| ≤ and | | (21c),| | as given by E[ Υ(t)] + E[Φt(xt)] △ ǫ t t 2ǫ =0 =0 ut ut 2 X X T −1 f2(ˆ ) f2( ) t (T∆ωQ) , (44b) 1 | − |≤ β[n,b] = E[Υ(T )] Υ(0) + E[Φt(xt)] t t ǫ 2 − ǫ f (vˆ ) f (v ) (T ωQ + T ωq) . (44c) t=0 | 3 − 3 |≤ 2βt ∆ ∆ T −1 X [n,b] Φ∗ T (B + ) ζ E[Qi (t)+ qi (t)] ≤ ǫ − kn kn Adding up (44), we prove the theorem. t=0 i,k,n X X which leads to REFERENCES

T −1 1 1 Φ∗ Υ(0) [1] X. Chen, W. Ni, T. Chen, I. B. Collings, X. Wang, R. P. Liu, and G. B. E[Qi (t)+ qi (t)] (B + )+ . Giannakis, “Distributed stochastic optimization of network function T kn kn ≤ ζ ǫ ζT t=0 i,k,n virtualization,” in Proc. IEEE GLOBECOM, Singapore, Dec. 2017. X X [2] Y. Li and M. Chen, “Software-defined network function virtualization: (41) A survey,” IEEE Access, . 3, pp. 2542–2553, Dec. 2015. [3] B. Han, V. Gopalakrishnan, L. Ji, and S. Lee, “Network function virtual- (18b) follows by taking T . ization: Challenges and opportunities for innovations,” IEEE Commun. → ∞ Mag., vol. 53, no. 2, pp. 90–97, Feb. 2015. [4] R. Mijumbi, J. Serrat, J. L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network function virtualization: State-of-the-art and re- C. Proof of Theorem 2 search challenges,” IEEE Commun. Surveys Tuts., vol. 18, no. 1, pp. 236–262, 2016. From (21a), we can get [5] V. Eramo, E. Miucci, M. Ammar, and F. G. Lavacca, “An approach for service function chain routing and virtual function network instance t t f1(eˆ , pˆ A) migration in network function virtualization architectures,” IEEE/ACM t | Trans. Netw., pp. 1–18, Mar. 2017. αn i 2 i i i [6] R. Riggio, A. Bradai, D. Harutyunyan, and T. Rasheed, “Scheduling = [ (ˆp (t)) (Q (t) q ′′ (t))ˆp (t)]ˆekn(t) ǫ kn − kn − k n kn wireless virtual networks functions,” IEEE Trans. Netw. Service Manag., t vol. 13, no. 2, pp. 240–252, June 2016. αn i 2 ˆi i i = [ (ˆpkn(t)) (Qkn(t) qˆk′′ n(t))ˆpkn(t)]ˆekn(t) [7] L. Mashayekhy, M. M. Nejad, D. Grosu, and A. V. Vasilakos, “An online ǫ − − mechanism for resource allocation and pricing in clouds,” IEEE Trans. ˆi i i i i Comput., vol. 65, no. 4, pp. 1172–1184, Apr. 2016. + [(Qkn(t) Qkn(t)) (ˆqk′′ n(t) qk′′ n(t))]ˆpkn(t)ˆekn(t). − − − [8] W. Chen, I. Paik, and Z. Li, “Cost-aware streaming workflow allocation (42) on geo-distributed data centers,” IEEE Trans. Comput., vol. 66, no. 2, pp. 256–271, Feb. 2017. Recall that eˆt, pˆt is the optimal solution under the ap- [9] L. Qu, C. Assi, and K. Shaban, “Delay-aware scheduling and resource proximate Aˆ , we{ have} optimization with network function virtualization,” IEEE Trans. Com- mun., vol. 64, no. 9, pp. 3746–3758, Sept. 2016. t t ˆ [10] F. Z. Yousaf, P. Loureiro, F. Zdarsky, T. Taleb, and M. Liebsch, “Cost f1(eˆ , pˆ A) analysis of initial deployment strategies for virtualized mobile core t | network functions,” IEEE Commun. Mag., vol. 53, no. 12, pp. 60–66, αn i 2 ˆi i i = [ (ˆpkn(t)) (Qkn(t) qˆk′′ n(t))ˆpkn(t)]ˆekn(t) Dec. 2015. ǫ − − [11] R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “Near optimal t t placement of virtual network functions,” in Computer Communications, f1(e , p Aˆ ) 2015, pp. 1346–1354. ≤ t | αn i 2 i i i [12] B. Addis, D. Belabed, M. Bouet, and S. Secci, “Virtual network = [ (p (t)) (Qˆ (t) qˆ ′′ (t))p (t)]ekn(t) ǫ kn − kn − k n kn functions placement and routing optimization,” in Proc. IEEE CloudNet, t Canada, 5–7 Oct. 2015. αn i 2 i i i [13] R. Mijumbi, J. Serrat, J. L. Gorricho, N. Bouten, F. De Turck, = [ (p (t)) (Q (t) q ′′ (t))p (t)]ekn(t) ǫ kn − kn − k n kn and S. Davy, “Design and evaluation of algorithms for mapping and i i i i i scheduling of virtual network functions,” in Proc. IEEE Conf. Netw. + [(Q (t) Qˆ (t)) (q ′′ (t) qˆ ′′ (t))]p (t)e (t). kn kn k n k n kn kn Softwarization (NetSoft), London, UK, 13–17 Apr. 2015. t t − − − = f1(e , p A) [14] X. Chen, W. Ni, T. Chen, I. B. Collings, X. Wang, and G. B. Gian- | nakis, “Real-time energy trading and future planning for fifth-generation i ˆi i i i + [(Qkn(t) Qkn(t)) (qk′′ n(t) qˆk′′ n(t))]pkn(t)ekn(t). wireless communications,” IEEE Wireless Commun., vol. 24, no. 4, pp. − − − (43) 24–30, Aug. 2017. [15] X. Wang, Y. Zhang, T. Chen, and G. B. Giannakis, “Dynamic energy management for smart-grid-powered coordinated multipoint systems,” Combining (42) and (43), we have IEEE J. Sel. Areas Commun., vol. 34, no. 5, pp. 1348–1359, May 2016. [16] X. Wang, X. Chen, T. Chen, L. Huang, and G. B. Giannakis, “Two-scale t t t t ˆi i f1(eˆ , pˆ A) f1(e , p A) [(Qkn(t) Qkn(t)) stochastic control for integrated multipoint communication systems with | i | −i |i | ≤ | i − renewables,” IEEE Trans. Smart Grid, vol. PP, no. 99, pp. 1–1, 2016. (ˆq ′′ (t) q ′′ (t))][ˆp (t)ˆekn(t) p (t)ekn(t)] − k n − k n kn − kn | [17] X. Wang, T. Chen, X. Chen, X. Zhou, and G. B. Giannakis, “Dynamic i i i i resource allocation for smart-grid powered MIMO downlink transmis- [ Qˆ (t) Q (t) + qˆ ′′ (t) q ′′ (t) ] ≤ | kn − kn | | k n − k n | · sions,” IEEE J. Sel. Areas Commun., vol. 34, no. 12, pp. 3354–3365, ǫ i i i i ˆ ′′ ′′ Dec. 2016. t [ Qkn(t) Qkn(t) + qˆk n(t) qk n(t) ] 2αn | − | | − | [18] Y. Yao, L. Huang, A. B. Sharma, L. Golubchik, and M. J. Neely, “Power cost reduction in distributed data centers: A two-time-scale approach for ǫ 2 t (T∆ωQ + T∆ωq) . (44a) delay tolerant workloads,” IEEE Trans. Parallel Distrib. Syst., vol. 25, ≤ 2αn no. 1, pp. 200–211, Jan. 2013. 13

[19] S. Sun, M. Dong, and B. Liang, “Distributed real-time power balancing in renewable-integrated power grids with storage and flexible loads,” IEEE Trans. Smart Grid, vol. 7, no. 5, pp. 2337–2349, Sept. 2016. [20] R. Alihemmati, M. Dong, B. Liang, G. Boudreau, and S. Seyedmehdi, “Multi-channel resource allocation towards ergodic rate maximization for underlay device-to-device communication,” IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 1011–1025, Feb. 2018. [21] M. J. Neely, “Optimal backpressure routing for wireless networks with multi-receiver diversity,” Ad Hoc Networks, vol. 7, no. 5, pp. 862–881, 2009. [22] L. Huang and M. J. Neely, “The optimality of two prices: Maximizing revenue in a stochastic communication system,” IEEE/ACM Trans. Netw., vol. 18, no. 2, pp. 406–419, Apr. 2010. [23] D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithm for mobile computing,” IEEE Trans. Wireless Commun., vol. 11, no. 6, pp. 1991–1995, June 2012. [24] M. J. Neely, “Stochastic network optimization with application to communication and queueing systems,” Synthesis Lectures on Commu- nication Networks, vol. 3, no. 1, pp. 1–211, 2010. [25] Q. Zhu and R. Boutaba, “Nonlinear quadratic pricing for concavifiable utilities in network rate control,” in Proc. IEEE GLOBECOM, New Orleans, LA, Dec. 2008. [26] T. Chen, Q. Ling, and G. B. Giannakis, “Learn-and-adapt stochastic dual gradients for network resource allocation,” IEEE Trans. Contr. Netw. Syst., Available online: https://arxiv.org/pdf/1703.01673v1.pdf, 2017. [27] D. M. Himmelblau, Applied nonlinear programming. McGraw-Hill Companies, 1972.