Latency-Driven Cooperative Task Computing in Multi-User Fog-Radio Access Networks

Ai-Chun Pang1,2,3, Wei-Ho Chung2, Te-Chuan Chiu1, and Junshan Zhang4 1Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 2Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan 3Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan 4 School of Electrical, Computer and Energy Engineering, , Tempe, AZ 85287, USA E-mail: [email protected], [email protected], [email protected], [email protected]

Abstract—Fog computing is emerging as one promising so- bile edge computing [4], FP7 European Project (TROPIC) [5], lution to meet the increasing demand for ultra-low latency and OpenFog Consortium [6]. In our recent work [7], we services in wireless networks. Taking a forward-looking perspec- have proposed the Fog-Radio Access Network (F-RAN), which tive, we propose a Fog-Radio Access Network (F-RAN) model, which utilizes the existing infrastructure, e.g., small cells and leverages the resources of the current infrastructures in the macro base stations, to achieve the ultra-low latency by joint radio access network (RAN), such as base stations and small computing across multiple F-RAN nodes and near-range com- cells, to promptly respond to low latency requests from mobile munications at the edge. We treat the low latency design as an devices. In the F-RAN, those F-RAN nodes handle wireless optimization problem, which characterizes the tradeoff between connectivity as well as application service provisioning, which communication and computing across multiple F-RAN nodes. Since this problem is NP-hard, we propose a latency-driven creates a potential new business model for telecommunication cooperative task computing algorithm with one-for-all concept operators to cooperate with application/service providers. for simultaneous selection of the F-RAN nodes to serve with There are a number of critical challenges in the Fog proper heterogeneous resource allocation for multi-user services. system [8]–[13]. Ottenwalder et al. propose a placement and Considering the limited heterogeneous resources shared among migration scheme to guarantee end-to-end latency and reduce all users, we advocate the one-for-all strategy for every user taking other’s situation into consideration and seek for a “win- the network overhead by making migration decision earlier in win” solution. The numerical results show that the low latency a cloud and fog coexisting environment [9]. Sardellitti et al. services can be achieved by F-RAN via latency-driven cooperative design a computation offloading algorithm to minimize the task computing. overall users’ energy consumption by shifting the workloads Index Terms—Fifth-generation (5G) cellular networks, Fog to the remote powerful cloud server [10]. Deng et al. consider computing, Ultra-low latency. the cooperation between the cloud and fog, and tackle the workload allocation problem to minimize power consump- I.INTRODUCTION tion of the cloud server [11]. Intharawijitr et al. propose to Recently the 5G wireless technology for the next generation minimize the blocking probability ratio among all requested cellular system has garnered much attention, which aims workloads in the entire system by analyzing feasible selection at fulfilling the requirements of massive machine-type com- policies of assisting fog nodes [12]. Nishio et al. also propose munications, enhanced mobile broadband, ultra-reliable and a framework for heterogeneous resource sharing by taking low latency communications. Specifically, many applications all heterogeneous resources such as CPUs, communication (such as augmented/virtual reality and vehicle automation) are bandwidth, and storage into consideration from the perspective demanding in terms of high bandwidth and low latency. These of “time” [13]. However, the above related studies consider applications need intensive computations to accomplish object hybrid cloud and fog scenario only. tracking, content analytics and intelligent decision for better In our prior work [14], F-RAN is proposed to achieve ultra- accuracy, performance and user experiences. Cloud computing low latency by joint computing across multiple F-RAN nodes can utilize abundant computing resources for handling com- and near-range communications among F-RAN nodes. With plex tasks, but one significant challenge therein is to achieve high-bandwidth wireless access such as millimeter wave, the the ultra-low latency due to possible large network delay in large amount of application data delivered from one F-RAN traversing the time-sensitive data traffics through the Internet node to another do not need to traverse through backhaul backbone [1]. links, leading to significant reduction on the network latency. To tackle these challenges, a new paradigm, called fog By distributing computing-intensive tasks to multiple F-RAN computing, is emerging. It is an architecture by extending nodes, the computing latency can be substantially reduced, cloud computing to the edge of the network so that ultra-low which nevertheless comes at the cost of communication delay. latency can be achieved at the edge [2]. Indeed, there have Intuitively, the more F-RAN nodes are selected for the com- recent efforts on by certain academic/industry puting task, the smaller the computing latency, but the larger projects and standardization activities, e.g., Cloudlet [3], Mo- the communication delay would be. The joint consideration of distributed computing and wireless networking naturally gives rise to the computing and communication tradeoff. Worth noting is that [14] considers the single-user scenario only and presented some preliminary results on F-RAN cooperative computing.

In this paper, we turn our attention to a multi-user F-RAN, IoT Device where the computing and communications resources are Master F-RAN Node inherently heterogeneous, making it challenging to quantify F-RAN Node the tradeoff therein. To achieve the ultra-low latency in such a scenario, we propose to consider the framework where multiple F-RAN nodes jointly execute distributed computing F-RAN after receiving the assigned computing tasks from one coordinator, called master F-RAN node, which communicates Fig. 1. Scenario of ultra-low latency service with F-RAN. with each F-RAN node wirelessly. This architecture targets a team work scenario so that there is a joint computing task where every cooperative F-RAN node is responsible for a sub task. In this way, the master F-RAN node should intelligently The reminder of this paper is organized as follows. Sec- decide which F-RAN node to be selected considering the tion II presents the system model and the formal formulation limited computing power and communication resources for of the optimization problem. In Section III, we show that the each F-RAN node. Specifically, more cooperative F-RAN problem is NP-hard and propose an efficient algorithm for nodes provide higher computing power and hence reduce total the special/general case with evaluated time complexity. Sim- computing latency. However, each cooperative F-RAN node ulation results and useful insights are discussed in Section IV. obtains fewer radio resources from the master F-RAN node Section V concludes this work. and as a result total communication latency will increase. Therefore, one main issue of cooperative task computing is II.SYSTEM MODELAND PROBLEM FORMULATION FOR how to strike a good balance between computing power and LATENCY MINIMIZATION communication resources, contributing to total service latency. A. System Model Moreover, our target F-RAN scenario aims to serve multiple users simultaneously, which requires heterogeneous resource We consider a scenario with densely deployed F-RAN nodes allocation among all users. The latency-driven cooperative to serve ultra-low latency and computing-intensive services, task computing problem is firstly cast as an optimization e.g., Augmented Reality (AR). Since a single F-RAN node has problem, and an algorithm based on dynamic programming, only limited computing power, and often requires longer time namely, CTC-DP, is proposed for the cooperative task to complete extensive computing tasks, one potential solution computing in the special case with a single user. Next, we is to execute the tasks via distributed computing by multiple design a heuristic algorithm, CTC-All, which combines the F-RAN nodes. With this motivation, we propose to utilize CTC-DP approach with “one-for-all” concept to provide multiple F-RAN nodes to accelerate joint data processing and an approximate solution for both heterogeneous resource transmission for the ultra-low temporal latency. allocation and cooperative task computing in the general case In the scenario of multiple F-RAN nodes as shown in Fig. 1, with multiple users. In the multi-user CTC-All algorithm, the target users first send their data to the closest F-RAN node, the communication and computing resources for each user also known as the master F-RAN node which coordinates are pre-allocated by heterogeneous resource allocation, and with other F-RAN nodes. The master F-RAN node decides then the single-user CTC-DP with “one-for-all” concept which F-RAN node to be selected for service provision and is applied to solve cooperative task computing among all assign individual processing data/computing tasks. Upon the users based on the assigned heterogeneous resources. Since task completion on all F-RAN nodes, the master F-RAN node the total service latency is decided by the bottleneck of collects, unifies, and sends back the outcomes to the target the last user finishing his/her cooperative task computing, users. Finally, the target users execute the applications in their every user should be considerate of each other and seeks end-device within ultra-low latency. Compared with the input for a “win-win” solution as the strategy of “one-for-all”. We data size for each F-RAN node, the output data size is smaller conduct a series of experiments, based on practical parameter and its transmit time can be omitted. Consequently, our first settings, to evaluate the proposed algorithm, in comparison priority is to deal with the most time-consuming part for all with four baseline approaches. The simulation results show computing-intensive and dividable tasks to be distributively that the proposed scheme can significantly reduce the total conducted among multiple F-RAN nodes. Specifically, total service latency of the cooperative task computing operation data are split into different fragments and are transmitted to and properly deal with heterogeneous resource allocation different F-RAN nodes through wireless transmissions. Thus while handling the tradeoff between communication resource the total service latency consists of the two main parts: the allocation and computing task assignment in time domain. communication delay and the computing delay. In this paper, the design goal is to pursue ultra-low service number of computing resource units, represented as θf , whose latency in completing the cooperative computing task among computing ability ρf represents the number of instructions per multiple users, including the transmission delay from the second per single computing unit θf . master F-RAN node to each associated F-RAN node and the For cooperative task computing of user n , the master F- computing delay for each associated F-RAN node. Due to the RAN node decides which F-RAN node f to be selected or not n distributed architecture, the total service latency of cooperative selected via an indicator function If , i.e., one being selected task computing for a single user is dominated by the longest and zero being not selected for serving user n. Besides, the service time in the last F-RAN node to complete its assigned master F-RAN node also decides 1) the number of radio n computing task. Besides, the master F-RAN node should deal resource blocks δf allocated to each associated F-RAN node n with multiple users’ service requests simultaneously, and the f and the amount of delivered processing data Df for F-RAN n total service latency of cooperative task computing for multiple node f; 2) the number of computing resource units θf from users is dominated by the longest service time in the final each joined F-RAN node f allocated to each user n and the n user to complete his/her service by multiple F-RAN nodes. amount of assigned computing tasks Cf for F-RAN node f. Since each F-RAN node may join different user’s cooperative The output for a set of F-RAN nodes in the associated states task computing at the same time, we propose a conceptual needs to be a feasible solution which meets the following unit, i.e., “computing resource unit”, to quantify each F-RAN constraints. node’s computing power such that each F-RAN node can 1) Communication Resource Feasibility: For the master F- decide amounts of efforts dedicated to different users, i.e., the RAN node, the sum of allocated radio resource blocks for all computing resource allocation. of cooperating F-RAN nodes cannot exceed the total available Therefore, the master F-RAN node needs to recruit the radio resource blocks, i.e., suitable combination of F-RAN nodes with consideration of all X X n n possible radio resource allocation, processing data distribution, If × δf ≤ δ. (1) computing resource allocation and computing task assignment. ∀n∈N ∀f∈F In fact, there exists a tradeoff between communication and 2) Computing Resource Feasibility: For each cooperating computing delay. To pursue the min-max total service latency F-RAN node, the sum of allocated computing resource units of the cooperative task computing among multiple users is for all of serving users cannot exceed the total available an interesting and non-trivial challenge, which is the major computing resource units, i.e., focus of this paper. The system model under consideration is X n n formulated as follows. If × θf ≤ θf , ∀f. (2) ∀n∈N B. Problem Formulation 3) Processing Data Assurance: For all of the F-RAN nodes In this paper, we study joint heterogeneous resource allo- involved in the cooperative task computing of user n, the cation and cooperative task computing for ultra-low latency sum of received processing data should be more than original service provision in Fog-Radio Access Networks. The objec- processing data, i.e., tive is to minimize the total service latency among multiple X n n n users via offloading dividable computing tasks to multiple F- If × Df ≥ D , ∀n. (3) RAN nodes, while meeting the data requirements of each user ∀f∈F and the availability constraints in communication/computing 4) Computing Tasks Assurance: For all the F-RAN nodes resources of each F-RAN node. For brevity, we omit “∀” involved in the cooperative task computing of user n, the sum wherever no confusion arises. of assigned computing tasks should be more than original total In a network, the set of users in the service area is denoted computing tasks, i.e., as N. Each user n first sends his/her latency service request X n n n to the closest serving F-RAN node which by default is its If × Cf ≥ C , ∀n. (4) master F-RAN node. Then, the user n sends its processing ∀f∈F data, represented as Dn, which is to be transformed into total Above two constraints ensure that cooperative multiple F- dividable computing tasks (e.g., unit as per cpu instruction), RAN nodes receive sufficient data information to complete all denoted as Cn, and to be executed by multiple F-RAN nodes. required computing tasks. In the viewpoint of the master F-RAN node, the set of F-RAN nodes in the coverage area is denoted as F and the master F- The Latency-driven Cooperative Task Computing Problem RAN node has at most δ radio resource blocks. When the Input instance: Among the set of users N, let the user n master F-RAN node is associated with F-RAN node f, the transmit processing data Dn (which is transformed into total master F-RAN node always adopts the achievable highest-rate Cn computing tasks) to the master F-RAN node which has δ modulation-coding scheme that F-RAN node f can receive, radio resource blocks. Consider the set of F-RAN node F in depending on the signal-to-noise ratio; thus, a radio resource which each F-RAN node f has θf computing resource units block can provide data rate γf for F-RAN node f. As for F- (with computing rate ρf ), and the master F-RAN node can RAN node f, its total computing power is measured by total provide data rate γf in a radio resource block for F-RAN node f when the F-RAN node f is associated with the master TABLE I F-RAN node and the highest modulation-coding scheme is SUMMARY OF NOTATIONS adopted. Objective: Our objective is to pursue the min-max of the total Symbol Description service latency via finding a feasible set of F-RAN nodes F N The set of users n for each user n, i.e., If = 1 or 0, ∀f ∈ F, ∀n ∈ N, the F The set of F-RAN nodes n number of allocated radio resource blocks δf , the amount n n D The amount of total processing data of user n of delivered processing data Df , the number of allocated n Cn The amount of total computing tasks of user n computing resource units θf and the amount of assigned n The number of total radio resource blocks of the master F- computing tasks Cf of each associated F-RAN node f for δ cooperative task computing of user n in the network. We state RAN node our objective function formally as The data rate per radio resource block for F-RAN f received γf from the master F-RAN node with highest modulation-coding n n ! scheme n Df Cf min max min maxIf × ( n + n ) , The number of total computing resource units of the F-RAN ∀n∈N ∀f∈F δ × γf θ × ρf f f θf node f n n Df Cf The computing rate per computing resource unit for F-RAN subject to constraints (1) to (4), where ( n + n ) ρf δf ×γf θf ×ρf f indicates the service latency for F-RAN node f serving user n n An indicator function, which is 1 if F-RAN node f is chosen n. Then min maxIf × (· ) represents total service latency If ∀f∈F for serving user n, and 0 otherwise for the set of cooperating F-RAN nodes F serving user The number of radio resource blocks allocated to F-RAN n node f for delivering processing data of user n by the master n (when the last F-RAN node f finishes the cooperative δf task computing), and finally min max (· ) accounts for total F-RAN node ∀n∈N The amount of delivered processing data of user n for F-RAN service latency for the set of users N (when the last user n n Df node f completes its cooperative task computing among multiple F- The number of computing resource units form F-RAN node RAN nodes). Table I summarizes the notations used in the n θf f allocated to user n in cooperative task computing problem formulation. The amount of assigned computing tasks of user n for F-RAN Cn III.LATENCY-DRIVEN COOPERATIVE TASK COMPUTING f node f In this section, we consider the latency-driven cooperative task computing problem. In Section III-A, we show NP- user can leverage all the available radio resource blocks from hardness of the problem. For simplicity, in Section III-B, we the master F-RAN node and computing resource units for each consider a special case with only a single user in coopera- F-RAN node, and the problem is thus feasible for the set of F- tive task computing and propose a polynomial-time optimal RAN nodes in cooperative task computing of the target user. algorithm based on dynamic-programming to minimize total In other words, an algorithm is optimal if it can derive the service latency by multiple F-RAN nodes for a single user. minimum service latency provided by multiple F-RAN nodes Then, in Section III-C, we present an efficient and effective for the target user. For the problem formulation of the single algorithm, which relies on the algorithm presented in Sec- user version, we omit “n” when the meaning is clear from the tion III-B, for the general case with multiple users and deal context (e.g., D, C, θf ,If , δf ,Df ,Cf ). with joint heterogeneous resource allocation and cooperative For this single-user case, we propose an optimal algo- task computing to minimize total service latency among all rithm with polynomial time property based on dynamic- users. programming, named cooperative task computing with dynamic-programming (CTC-DP) algorithm. It determines A. Problem Hardness which F-RAN nodes should be selected to serve the target Theorem 1. The latency-driven cooperative task computing user, the number of allocated radio resource blocks, the amount problem is NP-hard. of delivered processing data, and the amount of assigned Proof: This problem obviously is NP-hard and can be computing tasks. The proposed algorithm is based on the proved by a reduction from the partition problem known to be recursive formula given in Equation (5). Let g(r, c, f) be the NP-complete [15]. The proof is omitted due to lack of space. minimum service latency achieved by the first f F-RAN nodes, where F-RAN node f can be allocated with any number of radio resource blocks within total available r radio resource B. Special Case blocks and can execute any possible amount of computing 1) A Polynomial Time Optimal Algorithm: tasks within total required c computing tasks. There exist three Next, we consider a special case of the target problem when possible cases in Equation (5). there is a single user served by cooperative task computing of (1) If c = 0, g(r, c, f) is set as 0. That is, total service latency multiple F-RAN nodes. Since there is only one user, the target is zero because there is no any computing tasks to be  0, if c = 0  g(r, c, f) = ∞, else if r = 0 or f = 0 (5)  f   min max(g(r − r,ˆ c − c,ˆ f − 1), tr,ˆ cˆ), g(r, c, f − 1) , otherwise  rˆ∈[1,r],cˆ∈[1,c]

executed. Algorithm 1 CTC-DP (2) If r = 0 or f = 0, then g(r, c, f) is set as ∞. That is, Input: F, D, C, δ, γf , θf , ρf no service can be provided because there is no available Output: If , δf ,Df ,Cf radio resource blocks nor any joined F-RAN node to do 1: If ← 0; δf ← 0; Df ← 0; Cf ← 0, ∀f cooperative task computing. 2: FILL-TABLE(δ, C, F ) (3) Otherwise, F-RAN node f is either dedicated or not 3: BACK-TRACE(δ, C, F ) dedicated to joining cooperative task computing. 1) If 4: return If , δf , Df , and Cf , ∀f it is dedicated to executing computing tasks cˆ for the target user with allocated radio resource blocks rˆ and it Procedure 1 : FILL-TABLE(δ, C, F ) will consume tf service latency for its working part. r,ˆ cˆ 1: for r ← 0 to δ do Then, the remaining available radio resource blocks, i.e., 2: for f ← 0 to F do r − rˆ, can be used for the first f − 1 F-RAN nodes, 3: for c ← 0 to C do each of which can be responsible for the remaining 4: if c = 0 then required computing tasks, i.e., c−cˆ, and the first f −1 F- 5: g[r, c, f] ← 0 RAN nodes will consume g(r − r,ˆ c − c,ˆ f − 1) service 6: else if r = 0 or f = 0 then latency for completing the remaining working part. In 7: g[r, c, f] ← ∞ this case, the total service latency will choose the bigger 8: else one (max(g(r −r,ˆ c−c,ˆ f −1), tf )) as the total service r,ˆ cˆ 9: g[r, c, f] ← min (max(g[r−r,ˆ c−c,ˆ f − complete time. Since there are many combinations for rˆ∈[1,r],cˆ∈[1,c] f the allocated resource block (rˆ ∈ [1, r]) and assigned 1], tr,ˆ cˆ), g[r, c, f − 1]) computing tasks (cˆ ∈ [1, c]) for the F-RAN node f, 10: end if the formula will try every possible cases and record 11: end for them as one of candidate solutions. 2) In contrast, if F- 12: end for RAN node f is not dedicated to providing cooperative 13: end for task computing, the target user will seek for the first f − 1 F-RAN nodes’ assistance such that the whole computing tasks c are assigned to the first f − 1 F- c = 0 to C first; then from f = 0 to F ; and finally from r = RAN nodes with total available r radio resource blocks, 0 to δ. and the total service latency will consume g(r, c, f − 1) Procedure BACK-TRACE() also takes δ, C, F as inputs, and as its complete time. Finally, the algorithm will choose selects a feasible set of F-RAN nodes F (i.e., If = 1, ∀f), with all possible candidate solution from 1) and 2), and the allocated radio resource blocks δf and assigned processing smaller one will be the final service latency with the best data/computing tasks (Df ,Cf ) to pursue the lower service strategy of radio and computing resource allocation. latency for the target user by back tracing table g[]. We begin Algorithm 1, represented as CTC-DP, conducts the with the last entry (i.e., g[δ, C, F ]) by setting three indexes dynamic-programming in Equation (5). First, the algorithm r, c and f as δ, C and F respectively (Lines 1-3). During initializes all possible variables to zero. Then, a 3-dimensional Procedure FILL-TABLE(), g[r, c, f] is set as the minimum f table g[] is created, each entry of which stores the solution among g[r, c, f − 1] and max(g[r − r,ˆ c − c,ˆ f − 1], tr,ˆ cˆ). We derived by g(r, c, f). The Procedure FILL-TABKE() simply discuss the two cases. If the minimum is the first term, then fills in the corresponding table g[] according to Equation (5). radio resource blocks r and computing tasks c are not allocated Upon the completion of the table, Procedure BACK-TRACE() for/assigned to the F-RAN node f. That is, the first f − 1 F- is invoked to select the feasible set of F-RAN nodes with the RAN nodes will be responsible for cooperative task computing allocated radio resource blocks, delivered processing data and of the target user with total available radio resource blocks. assigned computing tasks such that the total service latency Hence, f is updated to f − 1 and other output variables are of the target user is minimized by back tracing the table, after updated (i.e., If , δf ,Df ,Cf are set as 0) (Lines 5-7). If the which the algorithm returns the solution If ,Df , δf , and Cf . minimum is the second term, then F-RAN node f is to be Procedure FILL-TABLE() takes δ, C, F as inputs and fills selected with allocated radio resource blocks rˆ and assigned in table entry g[r, c, f] based on Equation (5) (Lines 4-9). The computing tasks cˆ among all possible combinations of radio computation of a table entry may refer to some other entries, and computing resource allocation (Lines 9-10). Therefore, cˆ and thus the table entries are computed sequentially, i.e., from If is set as 1, δf is set as rˆ, Df is set as C × D (which Procedure 2 : BACK-TRACE(δ, C, F ) g(0, c − 1, f), ∀f, sustains for some positive integer c. We 1: r ← δ intend to show that the formula g(0, c, f), ∀f, also sustains. 2: c ← C When r = 0, there is no any available radio resource 3: f ← F blocks and it is impossible to provide any cooperative task 4: while f > 0 do computing. Thus, the minimum service latency is ∞, i.e., 5: if g[r, c, f] = g[r, c, f − 1] then g(0, c, f) = ∞, ∀c, f. The theorem is correct when r = 0. 6: If , δf ,Df ,Cf , ← 0 Next, we consider the case when r = 1. For induction, 7: f ← f − 1 we suppose that the formula g(0, c − 1, f), ∀f, sustains for 8: else some positive integer c. We intend to show that the formula 9: for rˆ ← 1 to r do g(1, c, f), ∀f, also sustains. If the available number of radio 10: for cˆ ← 1 to c do resource block is only one, the minimum service latency must f 11: if g[r, c, f] = max(g[r − r,ˆ c − c,ˆ f − 1], tr,ˆ cˆ) be achieved by one of cooperative F-RAN nodes handling then total computing tasks. By the induction hypothesis and the cˆ 12: If ← 1; δf ← rˆ; Df ← C × D; Cf ← cˆ claim proved to hold when r = 0, the minimum service latency 13: r ← r − rˆ f g(1, c, f) = mincˆ∈[1,c](max(g(0, c − c,ˆ f − 1), t1,cˆ), g(1, c, f − 14: c ← c − cˆ f 1)) = min(t1,c, g(1, c, f −1)). In case of the first f −1 F-RAN 15: f ← f − 1 nodes, for an arbitrary node assigned to part of computing 16: end if tasks but without any available radio resource blocks, it will 17: end for not be able to do cooperative task computing and will consume 18: end for total service latency as ∞. (i.e., g(0, c, f − 1) = ∞, ∀c, f.) 19: end if Therefore, the total computing tasks are possibly assigned 20: end while to the target F-RAN node f or to the first f − 1 F-RAN nodes with one allocated radio resource block. That is, the minimum service latency is the smaller one of the two values, means computing tasks are transformed into processing data), f i.e., min(t1,c, g(1, c, f − 1)). Thus, the claim sustains when r Cf is set as cˆ, and the three indexes are updated to their = 1. The validity of the theorem with r ≥ 2 can be proved to corresponding values (Lines 11-15). Then, we start with the hold, via the similar procedure. Finally, we conclude that the entry indexed by the updated r, c, and f, and repeat the above formula g(r, c, f), ∀r, c, f, holds correctly. process until all the F-RAN nodes have been examined. 2) The Properties of Algorithm 1: C. The General Case 1) Algorithm Description: Theorem 2. The time complexity of Algorithm 1 is O(|δ|2|C|2|F |). In this section, we propose an efficient algorithm, named cooperative task computing one-for-all (CTC-All) algorithm, Proof: Due to the sequential computing order, the time to solve the general case of cooperative task computing among complexity of the algorithm 1 is a function of the number multiple users. Since there are multiple users to be served, of table entries and the time required for obtaining each each user will jointly utilize the available radio resource entry. The 3-dimensional table is composed of |δ||C||F | table blocks from the master F-RAN node and computing resource entries. Moreover, the value of each entry g[] refers to at units from each F-RAN node. The problem thus includes most |δ||C| other entries and requires O(|δ||C|) complexity. two subproblems 1) heterogeneous resource allocation and 2) Within a single round of operation, a derived entry value cooperative task computing to pursue minimizing total service will not be changed. Therefore the table can be completed latency among all users. in O(|δ|2|C|2|F |). Besides, constructing the corresponding In the first part heterogeneous resource allocation problem, allocation via back tracing the table evaluates at most |C||F | the master F-RAN node will directly allocate communication other entries, and each evaluation requires at most O(|δ||C|) resources to each user based on his/her weight of process- complexity. Thus, the time complexity of Algorithm 1 is ing data such that every user roughly consumes the same O(|δ|2|C|2|F |), which is a pseudo-polynomial time function communication delay. However, the same concept may not of the total computing tasks requirement C of the target user. be applicable for computing resource allocation since each F- RAN node may not actually serve every user in the distributed computing architecture. Compared with prior reserving com- Theorem 3. Algorithm 1 yields the minimum service latency puting resources for each user from each F-RAN node, it is for a single user in cooperative task computing. better to leverage dynamic computing resource allocation in Proof: We prove this theorem via two-dimensional math- determining each user’s available computing resources from ematical induction on the indexes r and c. At induction each cooperating F-RAN node. basis, c = 0, there is no any computing tasks and no need Next, in the second part of cooperative task computing for cooperative task computing. Thus, the minimum service problem, we will rely on the dynamic-programming algorithm latency is 0, i.e., g(r, 0, f) = 0, ∀r, f. For induction, suppose presented in Section III-B to derive a feasible set of F-RAN Algorithm 2 CTC-All users served by the F-RAN node f. Then, we sort the set of n n n Input: N,F,D ,C , δ, γf , θf , ρf users N based on his/her processing data value D from the n n n n n Output: If , δf ,Df , θf ,Cf largest to the smallest in Line 2. For each user, we will first n n n n n 1: If ← 0; δf ← 0; Df ← 0; θf ← 0; Cf ← 0; δ ← deal with his/her heterogeneous resource allocation following 0; Nf ← 0, ∀n, f his/her cooperative task computing (Lines 3-19). In the first 2: Sort N in a decreasing order by the value of Dn part heterogeneous resource allocation of user nˆ (Lines 4-7), 3: for nˆ ← 1 to N do the number of allocated radio resource blocks is in proportion nˆ nˆ D nˆ 4: δ ← P Dn × δ to the ratio of user nˆ’s processing data value (D ) to the sum ∀n∈N P n 5: for all f ∈ F do of all users’ processing data value ( ∀n∈N D ). Specifically, nˆ Cnˆ 6: θf ← P Cn+Cnˆ × θf this will let the master F-RAN node balance all users’ radio ∀n∈Nf 7: end for resources and approximately achieve the same communication nˆ nˆ nˆ nˆ latency. On the other hand, since each F-RAN node may not 8: If , δf ,Df ,Cf , ∀f ← Algorithm 1 (nˆ) 9: for all f ∈ F do serve every user for cooperative task computing, the number nˆ of allocated computing resource units from each F-RAN node 10: if If = 1 then f to user nˆ only depends on those actually serving users (Nf ), 11: Nf = Nf + {nˆ} which is in proportion to the ratio of user nˆ’s computing tasks 12: for all n ∈ Nf , n 6=n ˆ do n nˆ 13: θn ← C × θ value (C ) to the sum of all serving users’ computing tasks f P Cn f P n nˆ ∀n∈Nf value ( C + C ). That is, we instantly update the ∀n∈Nf 14: end for current available computing resources for each F-RAN node 15: else to serve user nˆ and also let each F-RAN node balance all 16: θnˆ ← 0 f serving users’ computing resources according to the work 17: end if loading. In the second part cooperative task computing of 18: end for user nˆ (Lines 8-18), we uses Algorithm 1 with input instance 19: end for nˆ nˆ nˆ nˆ n n n n n hF,D ,C , δ , γf , θf , ρf i to derive a feasible set of F-RAN 20: return If , δf ,Df , θf ,Cf , ∀n, f. nˆ nodes (i.e., If = 1, ∀f) with the number of allocated radio nˆ resource blocks (δf ), the amount of delivered processing data nˆ nˆ (Df ) and the amount of assigned computing tasks (Cf ) such nodes that minimize total service latency for all users with that the total service latency of user nˆ is minimized in Line 8. cooperative task computing. Then, based on the algorithm 1, In fact, CTC-DP will apply one-for-all concept such that each CTC-DP, we propose “one-for-all” concept such that each table entry will record not only user nˆ’s service latency but user will select the suitable set of cooperating F-RAN nodes also other users’ additionally increased latency. For example, if while taking others’ penalty into consideration. Since dynamic user nˆ chooses F-RAN node fˆ for cooperative task computing, computing resource allocation may instantly update available user nˆ will jointly utilize computing resources of F-RAN node ˆ computing resources for each user in the same F-RAN node’s f. However, the other users already in serving user group Nf serving group as a new user joining, the cooperative task are forced to reduce their allocated computing resource units computing of such new user without “one-for-all” concept and this results in additional increasing latency as the penalty. may result in additional latency increasing for those original Therefore, the user nˆ will need to consider other users in users. We note that the total service latency is decided by his/her own decision of whether choosing F-RAN node fˆ or the bottleneck of the longest time period for the last user not to achieve the minimization of total service latency among completing his/her cooperative task computing. Every user all users. Since the feasible set of F-RAN nodes F for user should compromise with each other and seeks for a “win-win” nˆ is decided, we need to add user nˆ into each cooperating F- solution as the strategy of “one-for-all”. Therefore, our pro- RAN node f’s serving user group Nf (Lines 9-11). Besides, posed solution will jointly deal with heterogeneous resource each user n already in the serving user group Nf also needs allocation and cooperative task computing to minimize total n to update their current available computing resource units θf service latency among all users. Finally, for serving each user (Lines 12-14). On the other hand, for those F-RAN node f, the master F-RAN node will decide which F-RAN node to which does not join cooperative task computing of user nˆ n nˆ be selected (i.e., If = 1, ∀n, f), the suitable number of radio (i.e., I 6= 1, ∀f) should release their computing resource n f resource blocks δf , the amount of delivered processing data units reserved previously for user nˆ in Line 16. Finally, n n Df , the actual number of computing resource units θf and after all users finish their heterogeneous resource allocation n the amount of assigned computing tasks Cf . and cooperative task computing, the algorithm returns the Algorithm 2, represented as CTC-All, starts with the initial- n n n n n derived instances hIf , δf ,Df , θf ,Cf i in Line 20. Based on ization of parameters. In Line 1, all of the output parameters the returned instances, the master F-RAN node can calculate n n n n n n hIf , δf ,Df , θf ,Cf i with two additional parameters hδ ,Nf i for total service latency among all users. n are initialized as zero. The parameter δ represents the allo- 2) The Properties of Algorithm 2: cated number of radio resource blocks from the master F-RAN node to each user n while the set Nf represents the set of Theorem 4. The time complexity of Algorithm 2 is 2 2 n O(|δ| |C| |F ||N|), where δ = maxn∈N |δ | and C = redefined as a special case problem mentioned in section III-B. n maxn∈N |C |. Therefore, we can directly use CTC-DP to derive an optimal Proof: For each user n, Algorithm 1 is invoked once solution for the virtual giant and take AO as the performance and takes O(|δn|2|Cn|2|F |) time to derive a feasible set of upper bound in our problem. The performance metric was the F-RAN nodes for cooperative task computing, as analyzed in total service latency of cooperative task computing including Theorem 2. Thus, it takes O(|δ|2|C|2|F ||N|) time for coop- the communication delay (the time for the master F-RAN node n erative task computing of all users, where δ = maxn∈N |δ | to transmit processing data of each user to all joined F-RAN n and C = maxn∈N |C |. Besides, the heterogeneous resource nodes) and computing delay (the time for all joined F-RAN allocation includes two parts: 1) radio resource allocation of nodes to conduct their computing tasks for each user) when each user takes O(1) time (see Line 4) and 2) computing the last user finishes his/her cooperative task computing. resource allocation of each user before executing their cooper- For simulations, we use commonly practiced and practi- ative task computing takes O(|F |) time (see Line 6). Thus, the cal configurations and use Augmented Reality (AR) as the total heterogeneous resource allocation takes O(|F ||N|) time. representative ultra-low latency application [16]. Our pro- Moreover, after finishing cooperative task computing of each posed solution attempts to accelerate the AR tracking, i.e., user, we need to update each related user’s current computing the most computation-intensive component in AR [17], with resource units and takes O(|F ||N||Nˆ|) time, where Nˆ = cooperative task computing among multiple F-RAN nodes. maxf∈F |Nf | (see Line 13). Therefore, the time complexity of The parameters in the simulation model are based on the LTE Algorithm 2 is bounded by the time complexity of cooperative standard for a 20 MHz spectrum [18]. The network consists task computing and takes O(|δ|2|C|2|F ||N|), which also is a of multiple users with one closest F-RAN node as his/her pseudo-polynomial time function of the maximum computing master F-RAN node and other F-RAN nodes. F-RAN nodes tasks requirement C among all users. are randomly located in the master F-RAN node proximity, at a distance between 50 to 200 meters [19] from the master IV. PERFORMANCE EVALUATION F-RAN node. The user sends its AR request to the master A. Simulation Settings F-RAN node. The standard AR codec uses QCIF resolution 176×144 pixels [20], [21] per frame, and data size of each In this section we develop simulations to evaluate our frame ranges from 0.64 - 1 ratio of standard AR frame size, proposed algorithm, abbreviated as CTC-All (cooperative task which will transform into the total computing tasks in the computing one-for-all algorithm), and compare with four ap- unit of cpu instructions. The AR tracking video frames are proaches including Single, Reservation (RESV), CTC-SELF encoded by H.264 in gray scale with 8 bits per pixel. The and All-in-One (AO). The comparison baseline Single is set as master F-RAN node has at most 100 radio resource blocks a single powerful F-RAN node to serve all users’ cooperative to communicate with other F-RAN nodes and the number task computing. Because Single’s architecture is not able to of F-RAN node candidates ranges from 20 - 40. The path execute distributed computing, the F-RAN node will conduct loss model is PL(dB) = 35.2 + 35 log10(d1), where d1 is in each user’s task in sequential. Thus, we take Single as the meters [22]. The F-RAN node’s signal-to-noise ratio (SNR) is performance lower bound in our problem and show the ad- derived based on its distance to the master F-RAN node and vantages of leveraging multiple F-RAN nodes for cooperative the path loss model. Each F-RAN node uses the best feasible task computing. The RESV approach is designed to reserve modulation-coding scheme according to Table II [22]. Path all radio and computing resources of each user according to loss, shadowing, and multi-path fading are all conducted in their workloads in advance, which can avoid the resource our simulations. To obtain empirical computing capability of starvation problem for those users in behind of user serving a F-RAN node, we conduct experiments in ARToolkit [23] list. However, the side-effect is the utilization degradation of with i7 Dual core 2.5 GHz CPU, 8G RAM platform each user’s performance since most of resources are not in use and analyze the computing capability by Valgrind [24]. The actually. As a result, the comparison baseline RESV is selected results of the real computing throughput range from 700-1700 to demonstrate the side-effect of reservation in distributed million instructions/second in different state of work loads. At computing architecture. Next, the comparison baseline CTC- the beginning, the master F-RAN node associates with each SELF, without using “one-for-all” concept, is selected to F-RAN node via the best feasible modulation-coding scheme demonstrate the policy of only pursuing each user’s own best feasible set of F-RAN nodes may lead to worse side effects. That is, other users may suffer penalty of additional increasing latency due to the policy’s lack of a whole consideration TABLE II MODULATION-CODING SCHEMESWITH DIFFERENT SNR of minimizing total service latency for all users. Finally, the comparison baseline AO is designed to use “all-in-one” Modulation Coding rate γf (kbps) SNR range (dB) concept, with which we gather all users into one virtual giant QAM16 1/2 9.6 [9.6598,12.361) and let their service requests merge as one huge request. QAM16 3/4 14.4 [12.361,16.6996) Since this virtual giant can leverage all radio and computing QAM64 2/3 19.2 [16.6996,17.9629) resources from all F-RAN nodes now, the problem can be QAM64 3/4 21.6 [17.9629,+∞) within its reachable region. The F-RAN node is set to possess 250 Single RESV distinct computing capability in the above mentioned range. CTC-SELF 200 CTC-All The results are obtained by averaging over 5000 independent AO runs. 150 B. Total Service Latency 100 Figure 2 shows the impact of the number of users on the Latency (ms) total service latency. As the total number of users increases, 50 we evaluate the performance of our proposed scheme CTC- All with other baseline approaches. In this figure, the larger 0 2 4 6 8 10 12 14 total number of users leads to more competitive heteroge- Number of users neous resource allocation for each user and more complicated cooperative task computing among multiple F-RAN nodes. Fig. 2. Impacts of number of users on total service latency. However, our proposed scheme CTC-All can effectively deal with the communication and computing tradeoff in time do- considering others’ penalty, users in the front of user list main and successfully select a proper set of F-RAN nodes for (e.g., user 1,2,3) may suffer from other users’ side-effects each user at the cost of slightly increasing the total service which lead to additional latency. Therefore, we can observe latency. We have the following observations. 1) The CTC- that the one-for-all rationale is beneficial for cooperative task All is lower bounded by Single (4.2x) which only uses one computing among multiple users. Next, as shown in Fig. 3(b), powerful F-RAN node to execute whole computing tasks of most approaches will allocate individual users with different all users. The result shows that our proposal of leveraging percentage of radio resources in proportion to its workloads multiple F-RAN nodes is a possible solution to achieve the except Single and AO. Because Single will deal with each user ultra-low latency within 40 ms for AR tracking [25]. 2) On sequentially while AO will serve all users as one virtual giant, the other hand, although CTC-All does not perform as well as both approaches serve one user in each round and can utilize the upper bound solution AO, the CTC-All follows the same all available radio resources. In fact, CTC-All follows the same increasing slope as AO does, and the best service latency of radio resource allocation policy as RESV does but performs CTC-All is about 30 ms per frame under 10 users. Besides, the better cooperative task computing than RESV. Therefore, we total program execution time for CTC-All is about 11 seconds can conclude that dynamic computing resource allocation is an while for AO it is about 5 hours (1636x more than CTC-All). important key to perform effective cooperative task computing. Therefore, the CTC-All is observed to be a feasible solution for Finally, the Fig. 3(c) shows percentage of assigned computing the practical F-RAN nodes. 3) CTC-All reduces more service tasks of each F-RAN nodes in different approaches. We have latency (1.5x) than RESV due to the fact that dynamic comput- observed that AO leverages all F-RAN nodes for cooperative ing resource allocation can avoid utility degradation problem task computing while our proposed scheme CTC-All follows while all serving users, regardless of locations, can be jointly the same trend of leveraging nearly all F-RAN nodes. That considered, and the resource starvation can be avoided under is due to the fact that CTC-All with one-for-all concept will the management of the master F-RAN node. 4) CTC-All with take others’ penalty into consideration, which leads to higher one-for-all concept effectively reduces 24% service latency possibility of selecting many other F-RAN nodes for better than CTC-SELF and also demonstrates the rationale to take choices. Therefore, CTC-All also demonstrates the property of other nodes’ penalty into consideration for cooperative task load-balancing among all F-RAN nodes and pursues the goal computing. To further demonstrate the operation effects on of minimizing total service latency for all users. In summary, individual users, we show the results of various performance Fig. 3 shows that dynamic computing resource allocation and metrics under the scenario of 10 users in the following figures. one-for-all concept are crucial techniques in the cooperative The individual conditions of each user and F-RAN node, as task computing operation among multiple users. shown in Fig. 3, demonstrate the complete heterogeneous re- source distribution of our proposed scheme and other baseline V. CONCLUSIONS approaches with three different evaluation metrics: total ser- In this paper, we studied the latency-driven cooperative vice latency, percentage of allocated radio resources, and per- task computing problem in Fog-Radio Access Networks. To centage of assigned computing tasks. The first Fig. 3(a) shows enable F-RAN for temporally low latency operations within total service latency of each user with different approaches. limited computing and communication resources, we introduce Since total service latency is decided by the longest one among the concept to leverage multiple F-RAN nodes which operate all users, our proposed scheme CTC-All can intelligently select separately on different parts of the computing tasks. In the each user’s cooperating set of F-RAN nodes with one-for-all multi-user scenario, this work deals with a more complicated concept to achieve the win-win situation among all users. As heterogeneous resource management for each user and ensures we can see, the index number of user represents the service to achieve low total service latency. We first formulate the order in the user list. Since CTC-SELF only concerns the problem as an optimization problem which is shown to be target user’s optimized feasible set of F-RAN nodes without NP-hard and then we propose a latency-driven cooperative 00 100 1 Single 140 Single ∼ ∼ ∼∼ ∼ ∼ ∼ ∼ ∼ Single ∼∼ RESV ∼ ∼ ∼∼ ∼ ∼ ∼ ∼ ∼ RESV ∼∼ RESV CTC-SELF CTC-SELF CTC-SELF 120 CTC-All 12 CTC-All 20 CTC-All AO AO AO 100 10 15 80 8

60 6 10 Latency (ms) 40 4 5 20 2 0 Percentage of allocated RBs (%) 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Percentage of assigned tasks (%) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Index of user Index of user Index of F-RAN node (a) Total service latency of each user. (b) Percentage of allocated RBs of each user. (c) Percentage of assigned tasks of each FNs.

Fig. 3. Operational metrics for each user and F-RAN node. task computing algorithm to select the joining F-RAN nodes [12] K. Intharawijitr, K. Iida, and H. Koga, “Analysis of fog model consid- for each user request. Our proposed framework targets the joint ering computing and communication latency in 5G cellular networks,” in Proc. of IEEE PerCom Workshops, 2016. consideration of communication resource allocation and com- [13] T. Nishio, R. Shinkuma, T. Takahashi, and N. B. Mandayam, “Service- puting task assignment, in time domain. The simulations are Oriented Heterogeneous Resource Sharing for Optimizing Service La- conducted to show that our scheme with one-for-all concept tency in Mobile Cloud,” in Proc. of ACM MobileCloud, 2013, pp. 19–26. [14] T.-C. Chiu, W.-H. Chung, A.-C. Pang, Y.-J. Yu, and P.-H. Yen, “Ultra- can significantly reduce the total service latency of the co- Low Latency Service Provision in 5G Fog-Radio Access Networks,” in operative task computing and reach a win-win situation for Proc. of IEEE PIMRC, 2016. each user. We have also observed that leveraging dynamic- [15] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the theory of NP-Completeness, 1st ed. W. H. Freeman, Jan. 1979. programming design approach can efficiently reduce total run- [16] ETSI, Mobile Edge Computing - A key technology towards 5G, White ning time with the increasing number of users, which validates Paper no.11 v1.0, Sep. 2015. the feasibility and scalability of our proposed scheme. [17] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, and D. Schmalstieg, “Real-time detection and tracking for augmented reality on mobile phones,” IEEE Trans. Vis. Comput. Graphics, vol. 16, no. 3, pp. 355– ACKNOWLEDGMENT 368, 2010. This work was supported in part by Ministry of Science and [18] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA); Physical channels and modulation, TS 36.211 v10.0.0, Dec. 2010. Technology under Grant 105-2221-E-002-144-MY3 and Grant [19] S. Zhou, J. Gong, Z. Yang, Z. Niu, and P. Yang, “Green Mobile Access 105-2218-E-390-003-MY2, by Ministry of Education under Network with Dynamic Base Station Energy Saving,” in Proc. of ACM Grant 105M7109, and by Information and Communications Mobicom, 2009, pp. 10–12. [20] “Video Sequences.” [Online]. Available: http://trace.eas.asu.edu/yuv Research Laboratories of the Industrial Technology Research [21] J. Ha, K. Cho, F. Rojas, and H.S.Yang, “Real-time Scalable Recognition Institute (ICL/ITRI). and Tracking Based on the Server-Client Model for Mobile Augmented Reality,” in Proc. of IEEE ISVRI, 2011, pp. 1–5. REFERENCES [22] P. Li, H. Zhang, B. Zhao, and S. Rangarajan, “Scalable Video Multicast in Multi-Carrie Wireless Data Systems,” in Proc. of IEEE ICNP, 2009, [1] S. Choy, B. Wong, G. Simon, and C. Rosenberg, “A Hybrid Edge-Cloud pp. 141–150. Architecture for Reducing Ondemand Gaming Latency,” Multimedia [23] “ARtoolKit.” [Online]. Available: http://artoolkit.sourceforge.net Systems, vol. 20, pp. 503–519, 2014. [24] “Valgrind.” [Online]. Available: http://valgrind.org [2] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog Computing and its [25] G. Klein and D. Murray, “Parallel Tracking and Mapping for Small AR Role in the Internet of Things,” in Proc. of ACM SIGCOMM Workshop Workspaces,” in Proc. of ACM ISMAR, 2007, pp. 1–10. on Mobile Cloud Computing, 2012, pp. 13–16. [3] M. Satyanarayanan, P. Bahl, R. Caceres, and N.Davies, “The Case for VM-based Cloudlets in Mobile Computing,” IEEE Pervasive Computing, vol. 4, no. 8, pp. 14–23, 2009. [4] “ETSI First Meeting of New Standardization Group on Mobile-Edge Computing.” [Online]. Available: http://www.etsi.org [5] “FP7 European Project (TROPIC).” [Online]. Available: http://www.ict- tropic.eu [6] “OpenFog.” [Online]. Available: https://www.openfogconsortium.org [7] Y.-Y. Shih, W.-H. Chung, A.-C. Pang, T.-C. Chiu, and H.-Y. Wei, “Enabling Low-Latency Applications in Fog-Radio Access Network,” IEEE Network, 2017. [8] S. Yi, C. Li, and Q. Li, “A Survey of Fog Computing: Concepts, Applications and Issues,” in Proc. of ACM Mobidata, 2015, pp. 37–42. [9] B. Ottenwalder, B. Koldehofe, K. Rothermel, and U. Ramachandran, “Migcep: Operator Migration for Mobility Driven Distributed Complex Event Processing,” in Proc. of ACM DEBS, 2013, pp. 183–194. [10] S. Sardellitti, G. Scutari, and S. Barbarossa, “Joint Optimization of Radio and Computational Resources for Multicell Mobile-Edge Computing,” IEEE Transactions on Signal and Information Processing over Networks, vol. 1, no. 2, pp. 89–103, 2015. [11] R. Deng, R. Lu, C. Lai, T. H. Luan, and H. Liang, “Optimal Workload Allocation in Fog-Cloud Computing Towards Balanced Delay and Power Consumption,” IEEE Internet of Things Journal, 2016.