Keywords: Ant Colony Optimization, Data Centers
Total Page:16
File Type:pdf, Size:1020Kb
Data Center Resource Mapping Algorithm Based on the Ant Colony Optimization1
A.V. Plakunov, V.A. Kostenko and R.L. Smelyanskiy Moscow State University, Faculty of Computational Mathematics and Cybernetics, Leninskie Gory 1, Moscow, Russia [email protected], {kost, smel}@cs.msu.su
Keywords: Ant Colony Optimization, Data Centers
Abstract: A data center resource scheduling algorithm based on the ant colony optimization is described in this paper. The algorithm can be used in the data centers with a joint scheduler for all the resource types. The algorithm uses ant colony optimization approach to map resource requests to physical computational nodes and data storages. The graph shortest path algorithm is used to map virtual channels to the data center’s physical network channels and network switches.
1 INTRODUCTION being improved as the total number of iterations will grow. This mechanism can provide high quality We consider a resource usage efficiency problem solutions on a wide class of the input data. which is a crucial problem in Infrastructure as a Service data centers. In terms of the problem efficiency is the number of the requests that can be 2 PROBLEM DEFINITION mapped on the same physical resources. The complexity of the problem is there are several Data center physical resource model is defined as a resource request types (virtual machines, storage weighted graph: requests, virtual channels) that must be mapped on H (P M K, L) , several physical resource types (computational where P is a set of computational nodes, M is a nodes, data storages, network resources (channels, set of data storages, K is a set of network switches, L switches)). is a set of network channels. The weights are defined Existing algorithms don’t consider mapping of as follows: all three resource types [1,2,3,4]. In this paper we . vh( p) , defined on the set P, is a performance will show how we can implement three types of the computational node p P (operations per resource mapping using and algorithm based on the second); ant colony optimization. . uh(m) , defined on the set М, is a capacity of We propose to use an algorithm based on the ant colony optimization to map virtual machines on the the data storage m M (bytes); computational nodes and to map storage requests on . bh(k) , defined on the set K, is a bandwidth of the data storages. After that virtual channels will be the network switch k K (bytes per second). The mapped on the physical ones using a graph shortest network switch bandwidth is defined as a maximum path algorithm based on the greedy strategies. The total bandwidth of the virtual channels coming advantage of this scheme is that ant colony through the switch. We consider all input and output optimization approach allows the algorithm to switch ports to have an equal priority. automatically adjust itself for a particular case of the . rh(l) , defined on the set L, is a bandwidth of problem (particular input data values are set up) by the network channel l L (bytes per second). additional input data marking up used to build a Resource request is defined as a weighted graph: solution on the each iteration. The solutions are G (W S, E) , 2 Chapter Error! No text of specified style in document.
1 Supported by EMC International Company Grant №01.08.12 and Skolkovo Fund Grant №79 where W is a set of virtual machines, S is a set of graph these subproblems are reduced to is denoted storage requests, E is a set of virtual channels as A). The reduction process is described in section between virtual machines and storage requests. The 3.2.1. weights are defined as follows: After the virtual machines are mapped to . v(w) , defined on the set W, is a required computational nodes and requested storages are performance of the virtual machine wW mapped to data storages we can solve the problem of (operations per second); mapping the virtual channels to physical ones by a . u(s) , defined on the set S, is a required greedy algorithm. capacity of the storage request s S (bytes); The algorithm’s common scheme is given as . r(e) , defined on the set E, is a bandwidth of follows: the virtual channel e E (bytes per second). 1. Building the graph A. The way the graph is Resource request mapping is defined as follows: build is chosen so that path in the graph determines how virtual machines are mapped A : G H {W P, S M , E {K, L}} . to computational nodes and how storage Given: requests are mapped to data storages. 1. A set of requests Z = {(G ,T )}, received by i i 2. Building path Pi in the graph A. The path is the data center scheduler where Ti is the time build according to the restrictions on span which the resources are requested for by maximum computational node performance vh(p) and maximum data storage memory the request Gi volume uh(m). 2. Data center physical resource model 3. For each P mapping the virtual channels to H (P M K, L) i . physical ones given that virtual machines and The problem is to define the resource request storage requests are mapped according to mappings with the following constraints: path Pi. v(w) vh( p) 1. , 4. Calculating the target function Ti for each wW p path Pi. 2. r(e) rh(l) , 5. Updating the pheromone values on the arcs of eEl the graph A depending on the target function values T . 3. r(e) h(k) , i eEk If the stop condition isn’t satisfied, go to stage 2. u(s) uh(m) 4. . 3.2 Basic Algorithm Operations sS m 3.2.1 Building Graph
3 PROPOSED ALGORITHM Let N be the number of computational nodes in the data center and S be the number of data storages, and 3.1 Algorithm Common Scheme let R be the number of resource requests to be
mapped. Each request consists of ni ,i 1..R The initial problem can be divided into three s ,i 1..R subproblems: virtual machines and i storage requests.
1. Mapping virtual machines to physical The vertices V1 ,...,VN and S1 ,..., SS are added computational nodes. where the vertex V corresponds to the 2. Mapping storage requests to physical data i storages. computational node with the number i, and the
3. Mapping virtual channels to physical network vertex S j corresponds to the data storage with the channels. R We can solve the subproblems 1 and 2 by an number j. For each k 1..n, n ni let one graph algorithm based on the ant colony optimization. i1 Since the ant colony optimization approach is vertex to correspond to k-th virtual machine: this k intended for optimization problems represented in vertex V0 is connected to each of the vertices the form of the graph shortest path problem, V ,...,V subproblems 1 and 2 should be reduced to it (the 1 N by two differently directed arcs. Error! No text of specified style in document.. Authors’ Instructions 3
For each k 1..n vertex V k is connected to 0 ij t ij (t) l , j L each of the vertices V ,l 1..n,l k by two k 0 il t il (t) . Pij,k t differently directed arcs (figure 1). lJk R 0,(i, j) Lk Similarly, for each k 1..s, s si let one i1 Here ij (t) is the amount of pheromone of the arc graph vertices to correspond to k-th storage request. (i,j), ij (t) is a heuristic function on the arc (i,j), For each k 1..s vertices S k are connected to each 0 0 and 0 are algorithm’s parameters l of the S0 ,l 1..s,l k by two differently directed determining the importance of the pheromone and arcs. heuristic in the process of choosing the arc, Lk is the list of visited arcs of the k-th ant. k When building a path from the vertex S0 to the
vertex Su the type of the storage request that k corresponds to the vertex S0 , is compared to the type of the u-th data storage that corresponds to the
vertex Su . If the types aren’t match, the ant can’t choose this arc. k When building a path from the vertex V0 to the u vertex V0 the ant can’t choose the arc if he has u already visited the vertex V0 .
Figure 1. Graph G structure After the ant has chosen a vertex Vu , the virtual machine that corresponds to the vertex V k is added Let’s also add vertex O connected to each of the 0 to the W set. The current value of u-th V m , m 1..n and S l ,l 1..s by two differently u 0 0 computational node load is increased by the value directed arcs. This vertex will be the starting vertex v(k). Same actions are performed with the Su set for each ant. Ants can only choose this vertex when when storage request is chosen. they have no other vertices to choose. One of the possible ways to set the (t) The following two values correspond to each arc ij function is as follows: in the graph: ij is the amount of pheromone on the v(w) arc (i,j) and ij is a heuristic function set for arc (i,j). (t) . ij max(v(m)) for The current value of i-th computational node load mW V ,...,V k ,...,V k corresponds to each of the vertices 1 i N j V0 ,i,k 1..n and the current amount of free memory on the i-th u(s) (t) data storage corresponds to each of the vertices . ij max(u(r)) for rS S1 ,..., Si ,..., SS . k j S0 ,i,k 1..s 3.2.2 Building Paths in the Graph Here v(w) and u(s) are, respectively, the Each ant starts its path in the vertex О. The same ant requested performance or a virtual machine and the requested amount of memory. can’t go on the same arc twice. The ant chooses the k next vertex by a probabilistic rule. The probability If i V0 , j Vm ,m 1..N,k 1..n the for the k-th ant to travel from vertex i to vertex j on function (t) is calculated as: the iteration t depends on the list of visited arcs, ij amount on pheromone and heuristic values on the (t) v(w) v(i) available arcs: . ij . wW j 4 Chapter Error! No text of specified style in document.
k the algorithm and the same rule is used for all If i S0 , j Sm ,m 1..S,k 1..s the the arcs): function ij (t) is calculated as: 1.1. The weight equals to:
(h(q) r(lij )) (rh(l pq ) r(lij )) (t) u(s) u(i) . ij . if vertex p is a network switch, and sS j equals to Here Wj (Sj) is the set of currently mapped virtual rh(l ) r(l ) machines (the set of currently mapped storage pq ij requests) to the computational node (to the data in other cases. storage), corresponding to the vertex j. If, If one of the weights or the summands in the formula is below zero, the respectively, ij (t) 0 then ij (t) is set to zero. corresponding arc is temporarily deleted So if the virtual machine (storage request) can’t from the graph. The weights are chosen be mapped to the computational node (data storage) so all the arcs which the channel can’t be due to performance (memory) restrictions violation mapped to are deleted. The less capacity the probability for the ant to choose the arc will be remains on the network channel after the zero. If the probability is zero for all the available virtual one is mapped, the less the weight arcs at the moment, ant skips the step and the entire of the corresponding arc. request is considered as a non-mapped request, and 1.2. All the weights are equal to one so we all the virtual machines and storage requests are searching for the shortest path. corresponding to the same request are removed from Dijkstra’s algorithm is used to build the shortest (t) the mapping. After the function ij is calculated path from the vertex i to the vertex j in the weighted for all the arcs of one vertex, ij (t) values for these graph. arcs are normalized: 3.2.4 Pheromone Update Rule (t) (t) ij . ij . After the target functions are calculated the maxij (t) i pheromone values for each arc in the graph are 3.2.3 Virtual Channel Mapping Algorithm updated. The additional pheromone value for an arc
Physical resource graph H is temporarily modified depends on the target function value that before the channel lij is mapped: corresponds to the path this arc is included in: 1. All the arcs ( p,q) where vertex q is a network switch and p isn’t, and p i are Tk ,i, j Pk t deleted. . ij,k t 2. All the arcs ( p,q) , where vertex p is a 0,i, j Pk t network switch and q isn’t, and q j are deleted. Here Pk(t) is the path built by the k-th ant
3. Arcs connecting two network switches are and Ti is the target function equals to the number of duplicated and are set to different directions. 4. Arcs connecting a computational node or a the successfully mapped requests divided to the total data storage to a network switch are directed number of the requests. towards the network switch. Let’s consider a connected component C in the A pheromone evaporation coefficient modified graph H containing the vertex i. This p [0;1] defines how much pheromone will be left connected component also contains vertex j and contains no other vertices that aren’t a network after previous iterations. So the total value of the switch by construction. The connected component C pheromone on the arc (i, j) after iteration t is is used to map the virtual channel lij as follows: 1. A weight is assigned to all the remaining arcs calculated as follows: (p,q) in the graph C using one of the following rules (a rule is chosen at the start of Error! No text of specified style in document.. Authors’ Instructions 5
N K N K . [({Wi }i1 , {S j } j1 ), E {(W {Wi }i1 , S {S j } j1 )}] m In a particular input data set all the requests must ij t 1 1 pij t ij,k t k 1 be either loosely coupled or tightly coupled. So we consider two separate resource mapping problems: 1. Map 10 loosely coupled resource 4 EXPERIMENTAL RESEARCH requests each consisting of 10 virtual machines and 3 storage requests 2. Map 10 tightly coupled resource This section presents experimental research under requests each consisting of of 10 virtual the proposed algorithm. The purpose of the research machines, 3 storage request and 5 virtual is to show how many requests can be mapped by the channels between storage requests and virtual algorithm using one of the typical data center machines. network topologies. 4.2 Research Results 4.1 Research Metodology Figure 3 demonstrates the number of mapped requests depending on the network load with The research was conducted using the following different fixed computational nodes load and data input data parameter values: storages load. Standart data center topology “fattree” (figure 2) with 100 computational nodes and 100 data storages with performance (capacity) 1000 conventional units. Second fattree level has the following parameter values: network channels’ bandwidth is 1000 conventional units, network switches’ bandwidth is 3000 conventional units. First fattree level has the following parameter values: network channels’ bandwidth is 3000 conventional units, network switches’ bandwidth is 5000 conventional units. Root fattree level has the following parameter values: network channels’ bandwidth is Figure 3. The number of mapped requests depending on the network load 5000 conventional units, network switches’ bandwidth is 10000 conventional units. The graph shows that when the network load is less than 0.55 and the computational nodes and data storages load is less than 0.8, the algorithm successfully maps all 10 requests. As the network load grows the number of mapped requests decreases to the average of 6. Figure 4 demonstrates the number of mapped requests depending on the computational nodes load with different fixed data storages load and fixed Figure 2. Standard data center topology network load which was equal to 0.6.
We also consider two request set types: Loosely coupled:
N K [({Wi }i1 , {S j } j1), E )]
Tightly coupled: 6 Chapter Error! No text of specified style in document.
The future work should be focused on speeding it up and increasing the number of mapped requests when the network load is high.
REFERENCES
[1] Bhuvan U., Rosenberg A., Shenoy P., 2004. Application Placement on a Cluster of Servers, Department of Computer Science, University of Massachusetts, Amherst. [2] Bein D., Bein W., Venigella S., 1998. Cloud Storage Figure 4. The number of mapped requests depending on and Online Bin Packing, Intelligent Distributed the computational nodes load with fixed network load Computing V. [3] Lischka J., Karl H., 2009. A Virtual Network Mapping When the computational nodes load and data Algorithm Based on Subgraph Isomorphism storages load is less than 0.85 the algorithm Detection. Proceedings of the 1st ACM workshop on successfully maps all the requests. So the algorithm Virtualized infrastructure systems and architectures. [4] Nagendram S., Vijaya J., Venkata D., Rao N., Naha maps virtual machines and storage requests better Jyothi, 2011. Efficient Resource Scheduling in Data than it maps virtual channels. Centers using MRIS. Indian Journal of Computer Figure 5 demonstrates the number of mapped Science and Engineering, Vol.2, No.5. loosely coupled requests depending on the the computational nodes load and data storages load. The graph shows that for the loosely coupled requests the algorithm has always successfully mapped all the requests.
Figure 5. The number of mapped loosely coupled requests
5 CONCLUSIONS
This paper proposes a data center resource mapping algorithm based on ant colony optimization. The algorithm maps three resource request types (virtual machines, storage requests, virtual channels) to the data center’s physical resources. An experimental research was conducted in the paper. The research has shown that the algorithm based on the ant colony optimization works efficient when the network load is less than 0.8 and the number of mapped requests depends weakly on the computational nodes load and data storages load. The drawback of the algorithm is its working time.