QoS-Assured Service Composition in Managed Service Overlay Networks

Xiaohui Gu, Klara Nahrstedt Rong N. Chang, Christopher Ward Department of Computer Science Network Hosted Application Services

University of Illinois at Urbana-Champaign IBM T.J. Watson Research Center

¡ ¡ ¢ xgu, klara ¢ @ cs.uiuc.edu rong, cw1 @ us.ibm.com

Abstract Service Provider: XXX.com service Service Overlay Network instance X service instance (SON) service Z Many value-added and content delivery services are instance being offered via service level agreements (SLAs). These SON Y Portal services can be interconnected to form a service overlay network (SON) over the . Service composition in SON Access SON has emerged as a cost-effective approach to quickly Domain creating new services. Previous research has addressed the reliability, adaptability, and compatibility issues for com- posed services. However, little has been done to manage SON generic quality-of-service (QoS) provisioning for composed SON Portal Portal services, based on the SLA contracts of individual ser- SON Access SON Access Domain vices. In this paper, we present QUEST, a QoS assUred Domain composEable Service infrasTructure, to address the prob- lem. QUEST framework provides: (1) initial service com- Figure 1. Illustration of the Service Overlay position, which can compose a qualified service path under Network Model. multiple QoS constraints (e.g., response time, availability). If multiple qualified service paths exist, QUEST chooses the best one according to the load balancing metric; and (2) dynamic service composition, which can dynamically re- and peer-to-peer file sharing overlays [2]. Beyond this, we compose the service path to quickly recover from service envision the emergence of service overlay networks (SON), outages and QoS violations. Different from the previous illustrated by Figure 1. Each SON provides not only work, QUEST can simultaneously achieve QoS assurances application-level data but also a set of value-added and good load balancing in SON. services (e.g., media transcoding, data encryption). Each service component is offered via a service level agreement (SLA) contract [15]. Service composition in SON has become necessary in order to cost-effectively create new 1 Introduction Internet services [8]. Thus, a challenging problem is to provide a service infrastructure to enable efficient service The Internet has evolved to become a commercial in- composition with quality-of-service (QoS) assurances. frastructure of service delivery instead of merely providing Much research work has addressed the service compo- host connectivity. Different forms of overlay networks have sition problem. The SAHARA project [8] addressed the been developed to provide attractive service provisioning fault-resilience problem in wide-area service composition. solutions, which are difficult to be implemented and de- The CANS project [9] addressed the adaptability problem ployed in the IP-layer, such as content delivery overlays [1] in service composition. The SPY-Net [17, 18] framework £ This work was supported in part by the NASA grant under contract addressed the problem of resource contention while finding number NASA NAG 2-1406, National Science Foundation under contract a multimedia service path. In the Gaia project [11], we number 9870736, 9970139, and EIA 99-72884EQ. Any opinions, findings, addressed the QoS consistency and load partition issues for and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science composing service path in ubiquitous computing environ- Foundation (NSF), NASA or U.S. Government. ments. However, little has been done to support generic

1 QoS provisioning for composed services, based on the SLA contracts of individual service components. In this paper, we present QUEST, a QoS assUred composEable Service infrasTructure, which can provide both QoS assurances under multiple QoS constraints, and load balancing in SON. Service composition is performed by the service composer (SC) using a composed service provisioning protocol. The protocol is designed based on a network-centric client-SERVICE model [4]. Instead of contacting service providers directly, the client contacts SC through a well-known address and specifies its desired services and QoS requirements. Then, SC, as an interme- diate agent, composes and instantiates a qualified service path in SON. The key algorithms used by SC include: (1) initial service composition, and (2) dynamic service composition. The initial service composition algorithm properly chooses and composes the service instances, based on their SLA contracts and current performances in order to best match the QoS constraints of the user. QUEST achieves not only QoS assurances but also load balancing in SON by comprehensively considering both QoS and resources of different service instances. Moreover, QUEST provides dynamic service composition, which is used during Figure 2. Illustration of the QUEST’s service runtime when service outages or QoS violations occur. The composition model. algorithm finds an alternative service path in SON, which can quickly recover the failed composed service delivery. Extensive simulation results show that QUEST can pro- vide much better QoS assurances and load balancing for define the management boundary of QoS provisioning for composed services in SON than other common heuristics. composed applications. We call such a SON with SON The rest of the paper is organized as follows. Section 2 in- portals a managed SON. troduces the overall system design. Section 3 describes the The QUEST service composition model includes two design details and the key service composition algorithms. mapping steps, illustrated by Figure 2 (a). For each Section 4 presents the performance evaluations. Section user request, the service composer (SC) first maps it to a 5 discusses related work. Finally, the paper concludes in composite service template, and then maps the template to Section 6. an instantiated service path. The mapping from the user request to different composite service templates (mapping- 2 System Overview 1) is constrained by the user’s application-specific quality requirements and different pervasive client devices, such as PDAs and cell-phones. Mapping-1 has been addressed by In QUEST, SON consists of various service compo- several research work [11, 9]. It can be performed based nents1, called SON nodes. Each SON node represents a on the application-specific QoS specifications [12] or using service component, which is managed by individual compo- automatic composition plan tools [13]. The mapping from nent service provider (CSP). The connections between SON the composite service template to an instantiated service nodes are called SON links, which are application-level path (mapping-2) is constrained by distributed performance virtual links. We assume that each CSP can manage and (e.g., response time) and resource availability conditions. control the quality levels of its own services in accordance Little research has addressed the mapping-2 that is the focus to the SLA contract with the portal service provider (PSP). of this paper. On the other hand, the PSP has an SLA contract with the user for each composed service. In order to allow PSP In QUEST, the composed service delivery, within SON, to monitor and manage the quality levels of the composed starts from the entrance portal, traverses through the chosen service, we introduce SON portals that serve as the en- service instances, ends at the exit portal, illustrated by

Figure 2 (b). We assume that each service instance ¢¡

trance/exit points of the SON. In QUEST, SON portals £

(or overlay link ¡ ) is associated with an SLA contract 1several service instances can co-located on the same physical host. specifying its QoS assurances such as response time

2

¢¡¤£¦¥ ¥ £ ¥ ¥

¨

(or delay §©¨ ) and availability (or ). The service process on a lightly loaded host. Similarly, we

P

 ¦ ¦"!#%$ 

availability is calculated by define the term bandwidth ratio for the service link as

ž

¡

¡IL L

& '(*)+¦&,-&/.01¤'(('(¦2'435¦2" ¦687:9 *+;2 (<*= ¦

Š5 ‘ ’ Š: “

L

9¡3{'(1¤¡ M< Œ‰{•‰ ‰– 

. NV>Ÿ3{'(1¤ V¦¡ Œ

LON L N . With the

The connection between two service instances is called above notations, we can formulate the QoS-assured service ¡

the service link  , which is mapped to an overlay routing composition problem as follows,

£DFE GHGIGJE £K

¡?>A@CB MLON

path  . The availability of

¥

HP Q ¨

the service link ( ) is calculated as ¥

¨

¥ (QSC) Problem Suppose we are given a directed graph

L P

§ N  T §©¨ ¥

delay of the service link ( ) is derived using R/S . representing a service overlay network (SON) topology, G ¨

Each CSP provides an SLA contract to the PSP and is = (V, E), where V and E are the sets of N SON nodes and ¡ responsible for managing its own service components or M SON links, respectively. Suppose also each SON node

is characterizedD by nonnegative values of 2 additive QoS

U

£

0¡ ¥

overlay links. The composed service is also associated £

¥

¢¤£ ¡ with an SLA contract specifying the QoS assurance that the attributes ( 21 , ), i = 1...N. Each SON link is

PSP promises to the user. also characterizedD by nonnegative values of 2 additive QoS

¥

¥

¢;¥

¤1 §

attributes ( , ¨ ), i = 1...M. Given the template for a

D ¢”¦¨§ ©¦ª¦«¦ composed service X and user QoS requirements ¤1

3 Initial Service Composition ¬

0¡ ˆ2‰{ŠŽ‹<ˆ

and Y , the problem of QoS-assured service compo-

D E GIGHG¯E

­ ®

In this section, we present the initial service composition sition is to compose a service path p \

D

D D

\/]

solution, which are used during the setup phase of the from Z (entrance portal) to (exit portal), such that

0¡ 0¡

˜ ˜

¢

¢”¦¨§ ©¦ª¦«2¦

¤1 ¤1 ˆ2‰{ŠŽ‹<ˆ Y

composed service delivery. Suppose each service instance Y

¬ (equ.(1)) and (equ.

¬

J£¦¥ ž

˜ ˜

¡ P 

(or service link ) is offered to the PSP via an SLA

L

 N

£¦¥ (2)), and also 1, i = 0... n+1, and 1, j =

contract specifying its provisioning QoS: availability

¢¡ ¥

£ 0...n. §XLON (or VLWN ) and response time (or delay ). The QoS

¢¡ We now prove that the QSC problem is NP-complete. Y

assurances (availability: ¢Y and response time: ) of

D D

E GIGHG[E

U Z \/] the composed application : can be Theorem 1 QSC problem is NP-complete.

derived as follows 2, Proof: We prove this by showing that the Multiple Con- f

fg;h strained Path selection (MCP) problem, which is known to

i j i

^H_C` ^H_q`

^I_a` be NP-complete [10] maps directly to a special case of the

¥on

b-m b-r

bdc e (1)

N

p kl

kl QSC problem. The detailed proof is omitted due to the page

fg;h f

limitation. °

i j i

¥

c m r

sdt sdt

n e

N (2) Besides its NP-completeness, the above QSC problem

p kl8u

kl definition also neglects several important practical issues.

_4 _8  ^„   ^‡

vdwyx{zx}| x z€ x‚(ƒz € | x{ y† ‚yƒz €

fg;h ~ l-~ First, in the real world SLA contract, the QoS attributes are often specified using average values measured over a long Suppose also that the PSP has an SLA contract with the time period such as a month or a year [15]. However, the user to specify its provisioning QoS for the composed ser-

0¡ composed service session can last for only several minutes

ˆ2‰{ŠŽ‹<ˆ ¢ˆ2‰{ŠŒ‹<ˆ Y vice X (availability Y and response time ). or hours. Hence, we need to consider not only SLA contract In order to guarantee a successful service delivery, the values but also recent performance conditions of service resource requirements of the instantiated service path have instances/links. Second, the PSP can concurrently serve to be satisfied. For simplicity, we only consider the CPU thousands of user requests by composing available service resource for end hosts and bandwidth for overlay links. instances. To achieve best aggregate QoS assurances for all

We define the term cpu ratio for the service instance ¢¡

¥

£

£¦¥ £¦¥

£¦¥ SON users, we also need to consider the load balancing

¡ ¡IL L ¡

>¦);Š:Œ‘ ’ Š5 “ 9, )”‰{•‰ ‰–  ¦);Š5 ‘ ’ Š: “ as  . The problem. Third, because SON is highly dynamic, QoS

represents the required CPU resource for running a new

£¦¥

IL L

¡ violations can happen sometimes. Commercial SLA con-

¡

¦)¤‰{•‰ ‰–  process. The represents the available CPU

—£¦¥ tracts usually make the PSP lose money when QoS violation

¡  resource in the physical hosting environment of . If

˜ happens [15]. Hence, the goal of our QSC algorithms is to 1, then the service request can be admitted. Otherwise,

™£¦¥ compose a service path that can best avoid QoS violations the service request will be denied. The smaller the  , or minimize the QoS violation degree if a violation occurs. the more advantageous we choose the service instance in We now provide a polynomial heuristic algorithm, called terms of CPU load balancing because we start the new QSC-basic, for the QSC problem. The basic idea is to use a modified Dijkstra algorithm by comprehensively 2In order to make the metric availability becomes additive and

minimum-optimal, we apply the logrithm and inverse operations on the considering multiple constraints (e.g., SLA contracts, re-

¥

m

r

š”› œ metric availability. We assume that and N are measured for an cent performance, system load). The QSC-basic primarily application data unit (e.g., a video frame). involves two steps: (1) Generate the weighted candidate

3 c(S 11, S 2 c(S 1) 21, c(S11, c( ) S 3 S21) S2 on the “pressure” of different QoS constraints. Intuitively, 1 1) 1, S 1 ) 31 S 1 ) , 1 0 S11 c S S (S , c if the current accumulated value of a QoS attribute (e.g., ( 3 0 S11 ( c S21 1 S S , ( 3 S31 S c S21 1, 4) S31 S 4) response time) approaches its constraint, we increase its S0 S12 S4 S0 S12 S4 SONc S22 weight in hope that its accumulation will catch up in the (S SONc S22 0 S32 (S entrance, SON 0 S32

S ,

) ¡ SON 1 4 entrance S later stage of the service path composition. Suppose 4 , S 1 4) portal ) S13 3 exit 4 S 3 ) S13 , exit ) (S portal 33 23 c portal ) (S , S S23 23 c portal is the current chosen node by Extract Min, whose final 14 S33 , S S23 c(S 14 S33

c(S

shortest path from the source Z is just determined. We de- S14 S14

(a) (b) fine the response time pressure “ IKJ/L ”, availability pressure

¡WL ‰{•‰

“ I ”, and the weight adjustment functions as follows,

j

^I_ h

¥

¥

Q1T

m!Q1R m

r

s t

£ £

Figure 3. Illustration of the QSC-basic algo- 

MON%P M

e ¨ e

. .

^H_ h

c S sdt

rithm. j (4)

¦¨§ ©¦ª¦«¦

 "#£



¬

r

j



M

vV] vV^

` `V\ \

£ Z

vVU vXW v v r

h

l

e e e e

Y[Z M M

N%P (5)

n ¨

graph. Instead of searching the service path in the entire j



M

N%P

vV] vV^

` `V\ \

Z

X_ vV` vVa vXb r

SON, we first generate a candidate graph, illustrated by v

e e e e

Y[Z M M

N%P (6)

n ¨ Figure 3 (a), to minimize the searching range. The " Œ

column in the candidate graph includes all the service Both QSC-basic and QSC-enhanced algorithms have the < Œ

instance candidates providing the service function in ®

ed 7 d same computational complexity c , where is the the application template. The “cost value” on the edge

number of nodes in the candidate graph.

¡£¢ P ¢ ¡¨¢ P

¡ 0> 7¥¤§¦ ¤ ©  is defined using the following

integrated metric:

^H_Ch h

^H_ 4 Dynamic Service Composition

m

sdt

r

£



 

N

N

 ¤^ 

c c

c

c

n n n

e

s t

^H_ h ^H_ h

u

£

j

 

  

 SON is a highly dynamic system compared to the IP

£  

u

m



f*) ¥+-, ) f . )!.

f network infrastructure. First, unlike routers, hosts can

£ ! "#£ %$ ! ¨&'£ ( ' (

) f*.

) f*.

h

^I_ h

^H_ dynamically join or leave SON over long time scales.

) f*.

sdt

) f . m

r

£

 ¨

 

/ (

¨

N

N

(

c c c

c Second, hosts or underlying IP network path can experience

n n n n

sdt

^H_ h ^H_ h

u

£

01   

 performance failures, outages, or degradations over short

 £ 

u

f . f*)

) time scales [3]. Hence, during runtime, an established

( $ ! ¨&'¨ (

m

2

s

r

4Ms

 service path can become broken or violate QoS constraints,

N

c

c

n n

2

s 4Ms j

j (3) particularly for the long-lived application session such as

3

£  

 r

m multimedia streaming.

) f f .

' 5 '

'(5 When the service instance (or link) experiences outage or

¬

§ 9 Intuitively, the ratio 6 ( represents any of the above significant quality degradations, the SC is notified. It recov- parameters) represents687 the normalized cost of selecting a ers all the affected sessions using the dynamic service com- portion of the service path in terms of one specific factor position algorithms. We have designed two dynamic service (e.g., QoS assurances or load balancing); and (2) Run the composition algorithms: (1) DQSC-complete, which com- Dijkstra algorithm to find the shortest path, which is re- pletely re-composes the service path without considering turned as the result of the QoS-assured service composition, the original service path, to recover from failures; and illustrated by Figure 3 (b). (2) DQSC-partial, which partially re-composes the service However, the above QSC-basic algorithm does not con- path based on the original service path. sider each individual QoS constraint while composing a Figure 4 illustrates the complete service re-composition service path. We now present an enhanced algorithm, called algorithm DQSC-complete. Figure 4 (a) shows the can-

QSC-enhanced. After generating the candidate graph, we didate graph and the original service path, on which the

D

¢ ¢ ¢

 > ¡ P*7:¤;¦ ¡ P<¤=©

associate each edge with a ®

service instance is failed and the service link between D

“cost value”, which modifies the equation (3) by multiply- D

B

®

f

and is broken. The DQSC-complete algorithm first

˜ ˜A@

¡ ¢ T ¡

?>  >FE7

ing a non-negative value with modifies the candidate graph by removing those failed or

¡DC

Z poorly-performing service instances, and also replacing the

¬

§

¡ each ratio 6 . The weight represents the significance

6G7 broken service links with alternate SON routing paths when

D

"HI Œ

of the  factor while selecting the service instance during ® each EXTRACT Min step in the Dijkstra algorithm. The it is possible, illustrated by Figure 4 (b). In this example,

is removed from the second column of the candidate graph.

¡ (HI Œ D

higher the value, the more important the factor. D

® Different from the QSC-basic, the QSC-enhanced algorithm The failure service link between and f is replaced

dynamically changes the importance of different factors by with an alternative SON path. The recovered service link

¡ ¡ adjusting accordingly. The adjustment of is based is illustrated as a dotted line in Figure 4 (b). Then, we use

4 Recovered Service Link quickly re-compose a new service path to recover from S11 S21 S11 S11 S31 S21 S21 S31 S31 failures or QoS violations. However, each of them has both S12 SON S22 SON S22 entrance S32 SON SON S22 advantages and disadvantages. The advantage of the DQSC- entrance S32 SON S32 portal exit entrance SON S13 portal exit portal S13 portal S13 exit S23 portal portal S33 S23 S23 complete algorithm is that it can re-compose a better service S33 S33 S14 S14 S14 path in terms of QoS assurances than the DQSC-partial (a) (b) (c) algorithm, since it has more choices of service instances. The disadvantage of the DQSC-complete algorithm is that Figure 4. Illustration of the complete service the service re-composition takes longer time since it re- re-composition algorithm DQSC-complete. composes the entire service path. On the other hand, the advantage of the DQSC-partial algorithm is that it is

Recovered quicker and easier to implement since it only changes part Service Link S11 of the service path. However, its disadvantage is that S21 S11 S31 S21 S11 S31 S21 S12 S31 the new composed service path may not be optimal. We SON S22 SON entrance S32 SON entrance SON SON portal exit entrance SON will further compare these two different dynamic service S13 portal S13 exit portal portal portal S13 exit S23 portal S33 composition approaches in the next section. S14 S14 S14 (a) (b) (c) 5 Performance Evaluation Figure 5. Illustration of the partial service re- composition algorithm DQSC-partial. 5.1 Evaluation methodology

We evaluate the performance of the initial and dynamic the QSC-enhanced algorithm, described in Section 3.1, to QoS-assured service composition algorithms using exten- compose a new qualified service path. Thus, the service sive simulations. We first use the degree-based Internet session can quickly recover from failures or QoS violations topology generator Inet 3.0 [16] to generate a power-law by switching from the old service path to the new one 3. random graph topology with 3200 nodes to represent the Contrasting with the DQSC-complete algorithm, DQSC- Internet topology. We then randomly select 500 nodes as partial algorithm partially re-composes the service path the SON nodes and 40 other nodes as the SON portals. based on the original service path. Figure 5 (a) shows We assume an equal-degree random graph topology for the

the same service path example as Figure 4 (a). In the SON. Each SON node is randomly assigned 5 other SON

D D E

E nodes as its neighbors. Hence, the probing overhead of each

® ®

¡

original service path SON entrance portal ¡

D E DdE

>8>J> E£¢

SON node is within 9 .

®

f SON exit portal, service instance is failed and D

D The initial resource availability of each IP link and

® service link between and f is broken. In Figure 5 (b), however, we modify the candidate graph by not only service instance is uniformly distributed in a certain range. removing the poorly-performing or failed service instances The SLA values of each IP link or service instance are also

D uniformly distributed within certain range. Different values ®

(e.g., ) and recovering the broken service links (e.g., the D

D reflect the heterogeneity and diversified quality guarantees

® service link between and f ), but also removing, in the

column where the old service instance is good, the other in SON. Moreover, to simulate the performance variation D

D in the real world, the QoS attributes of each IP link and

®

f candidate service instances. In this example, and

service instance are set by uniform distribution functions,

®Ž® ®

f still perform well. Thus, we remove and in the

with SLA values as the mean values. We assume the

®

f f1f third column of the candidate graph, and and in the fourth column. Then, we use the QSC-enhanced algorithm Dijkstra shortest path algorithm for both the IP layer and to compose a new service path on the modified candidate overlay layer routing, using the instantaneous value of delay graph, illustrated by Figure 5 (c). The purpose of such an as the routing metric. The bandwidth of an overlay link is approach is to keep those original well-performing service the bottleneck bandwidth along the IP network path. The instances in the new service path. Hence, we can reduce the delay of an overlay link is the addition of the delays along migration overhead for switching from the old service path the IP network path. to the new one. The computational complexity of DQSC- During each minute, certain number of user requests

® are generated. The user request is represented by any

ed 7 d complete and DQSC-partial is still c , where is the number of nodes in the candidate graph. of 40 composite service templates that comprise 2 to 6 Both DQSC-complete and DQSC-partial algorithms can services. Each user session lasts from 15 to 60 minutes. The metrics we use for evaluating the QoS assurances include 3We assume that the states of all the service instances can be recovered QoS violation rate and QoS violation degree. The QoS by software. violation rate is measured by the ratio of the sessions during

5 40 which QoS violation happens over the total sessions. For Fixed Random each session, the QoS violation is said to happen if the 35 QSC-Basic QSC-Enhanced measured average QoS attribute values (i.e., availability, 30 DQSC-Partial DQSC-Complete response time) is worse than that specified in the SLA 25 contract. The QoS violation degree measures that if a QoS 20 violation occurs, how severe the QoS violation is. It is 15 measured by the ratio of difference between the measured 10 QoS attribute value and its target value, over the target Response Time Violation Rate [%] 5 value. Those two metrics are often associated with the 0 financial refund/penalty policies specified in the real world 50 100 150 200 250 300 350 400 450 500 SLA contracts. The minimization of those two metrics Load: Session Request Rate [sessions/min] means to reduce the financial loss of the service provider. Figure 7. Average response time QoS viola- The metric we use for evaluating the load balancing tion rate under different system load. is the provisioning success rate. A composed service provisioning is said to be successful if and only if during its entire session, all the service instances and links’ resource requirements on the service path are always satisfied. The different system workload put on the SON. The Y axis composed service provisioning success rate is defined as the shows the average QoS violation rate for the availability number of successful requests over the total number of all attribute, achieved by the fixed, random and our four requests. Given the total amount of resource in SON, higher QSC (QoS-assured service composition) algorithms. QSC- provisioning success rate represents better load balancing in Basic and QSC-Enhanced represent the two initial service SON. composition algorithms. Both of them do not include any For comparison, we also implement two common heuris- dynamic service re-composition mechanisms. Both DQSC- tic algorithms for composing service path: fixed and random Complete and DQSC-Partial use the QSC-Enhanced for the algorithms.The fixed algorithm always chooses the same initial service composition and also dynamically recovers service instances for a composed application. The random from the service outage/quality degradations by completely algorithm randomly chooses service instances to compose or partially re-composing the service path. Each average the service path. D availability QoS violation rate ( ) value is calculated and averaged over a period of 200 minutes for all successfully

70 composed sessions. The results show that all the four D 60 QSC algorithms achieve much lower than the fixed 50 and random algorithms. The QSC-Enhanced has as much

40 Fixed as 20% improvements than the QSC-Basic. Both DQSC- Random D QSC-Basic Complete and DQSC-Partial further reduce ( ) to almost 30 QSC-Enhanced DQSC-Partial 0% lower. The reason is that the application-level service 20 DQSC-Complete outage recovery can quickly finish in a few seconds while Availability Violation Rate [%] 10 the IP-layer Internet path recovery may take several minutes

0 or even hours [3]. However, the performance difference 50 100 150 200 250 300 350 400 450 500 between DQSC-Complete and DQSC-Partial is very small. Load: Session Request Rate [sessions/min]

Similarly, Figure 7 shows the results of the QoS vio- ®

Figure 6. Average availability QoS violation lation rate for the response time ( ). Again, the QSC ® rate under different system load. algorithms achieve much lower than fixed and ran-

dom. The performance order of different QSC algorithms ¡ is QSC-Basic ¡ (worse than) QSC-Enhanced DQSC-

Partial ¡ DQSC-Complete. The reason why the DQSC 5.2 Results and analysis algorithms are better than the QSC-Enhanced is that service re-composition always uses the most recent performance Figure 6 and Figure 7 show the simulation results about and load information, which allows it to make better service the violation rates of two QoS attributes: availability instance choices in terms of response time. and response time, respectively. In Figure 6, the X axis Figure 8 and Figure 9 show the results about the QoS vio-

represents different session request rate, calculated by the lation degree. Figure 8 shows the availability QoS violation f number of composed service session requests per minute. degree ( ). Once again, the QSC algorithms consistently The range of session request rate is selected to reflect achieve better performance than fixed and random. Figure

6 16 100 Fixed 90 Random 14 QSC-Basic 80 QSC-Enhanced 12 DQSC-Partial 70 DQSC-Complete 10 60 8 Fixed 50 Random 6 QSC-Basic 40

QSC-Enhanced Success Rate [%] DQSC-Partial 30 4 DQSC-Complete

Average Violation Degree [%] 20 2 10 0 0 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 Load: Session Request Rate [sessions/min] Load: Session Request Rate [sessioins/min]

Figure 8. Average availability QoS violation Figure 10. Average provisioning success rate degree under different system load. under different system load (load balancing).

25 Fixed Random 6 Related Work QSC-Basic 20 QSC-Enhanced DQSC-Partial DQSC-Complete Besides to the related work mentioned in the Introduc- 15 tion, much other research work has also addressed the service composition problem. The SWORD project at Stan- 10 ford [13] provided a developer toolkit for the web service

5 composition. It can automatically generate a functional composition plan given the functional requirements for the Average Response Time Violation Degree [%] 50 100 150 200 250 300 350 400 450 500 composed application. However, SWORD only addressed Load: Session Request Rate [sessions/min] the mapping-1 problem defined in our framework. The eFlow project at HP labs [7] provided an adaptive and dy- Figure 9. Average response time QoS viola- namic service composition mechanism for the commercial tion degree under different system load. e-business process management. It did not address the end- to-end QoS assurances for the composed service. Our work is also different from the traditional IP-layer QoS routing 9 shows similar results for the response time QoS violation problem [5] because: (1) The goal of IP-layer QoS routing degree. Hence, the simulation results further validate our is to find a network routing path satisfying QoS constraints algorithms by showing that QSC algorithms can not only while QUEST addresses two mapping problems to achieve greatly reduce the violation rate but also achieve lower not only QoS assurances but also load balancing and fault- violation degree when QoS violations occur. tolerance; and (2) IP-layer QoS routing only considers the Figure 10 shows the results about the composed service network resource while QUEST considers not only network provisioning success rate. Similar to the above experiments, resources but also end-system resources (e.g., CPU). each provisioning success rate value is calculated and aver- Other closely related work includes various overlay net- aged over a period of 200 minutes. The results show that works. Anderson et al. proposed a resilient overlay network all four QSC algorithms can similarly achieve much higher (RON) architecture [3] to allow distributed applications to provisioning success rate, namely better load balancing, quickly detect and recover from the Internet path failure. than the fixed and random. RON can recover from network path outage within several In all of the above experiments, we observe that the seconds using the application-level routing. RON is useful performance gains of the DQSC-Complete are very small and beneficial to QUEST although it only solved a subset compared to the DQSC-Partial. The reason is that the initial of the dynamic service composition problems. In [6], Duan service composition algorithm QSC-Enhanced is already et. al. also proposed the concept of SON, which can very good. Hence, the selected service instances in the provide value-added services with QoS assurances to the old service path are still, by large probability, the best ones user via SLA contracts. However, they only addressed the when we re-compose the service path. Hence, it is a near bandwidth provisioning problem for SON, while QUEST optimal solution that we keep the old good service instances considers not only resource provisioning (e.g., bandwidth in the new service path, which is exactly the DQSC-Partial and CPU), but also various service QoS (e.g., response time algorithm. and availability). The OverQoS [14] proposed an architec-

7 ture to provide Internet QoS (e.g., statistical bandwidth and [6] Z. Duan, Z.-L. Zhang, and T. Hou. Service Overlay net- loss rate assurances) using overlay networks. QUEST is works : SLAs, QoS and Bandwidth Provisioning. Proc. different from OverQoS by providing QoS assurances for of 10th IEEE International Conference on Network Proto- composed services, based on SLA contracts of individual cols(ICNP2002), Paris, France, November 2002. service components. [7] B. Raman et. al. The SAHARA Model for Service Com- position Across Multiple Providers. Proc. of International Conference on Pervasive Computing (Pervasive 2002), Aug 7 Conclusion 2002. [8] F. Casati et. al. Adaptive and Dynamic Service Composition We have presented a QoS-assured composed service in eFlow. HP technical Report HPL-2000-39, March 2000. delivery framework, called QUEST, for a managed service [9] X. Fu, W. Shi, A. Akkerman, and V. Karamcheti. CANS: overlay network (SON). The major contributions of this Composable, Adaptive Network Services Infrastructure . paper include: (1) formally define the QoS-assured service Proc. of 3rd USENIX Symposium on Internet Technologies composition problem and prove that it is NP-complete. and Systems, March 2001. We then design efficient approximate optimal algorithms [10] M. R. Garey and D. S. Johnson. Computers and Intractabil- to compose service paths under multiple QoS constraints. ity. A Guide to the Theory of NP-Completeness, Freeman, Moreover, QUEST can achieve sound load balancing in San Francisco, 1979. SON to provide best possible QoS for all SON users; [11] X. Gu and K. Nahrstedt. Dynamic QoS-Aware Multimedia (2) provide both partial and complete dynamic service re- Service Con®guration in Ubiquitous Computing Environ- composition algorithms, which can quickly recover the ments. Proc. of IEEE 22nd International Conference on service path from failures or QoS violations. We have Systems (ICDCS 2002), July 2002. implemented a large-scale simulation test-bed and our ex- [12] X. Gu, K. Nahrstedt, W. Yuan, D. Wichadakul, and D. Xu. tensive simulation results show that QUEST can provide An XML-based Enabling Language for both QoS assurances and load balancing for composed the Web. Journal of Visual Language and Computing, services in SON. The simulation results also indicate that Special Issue on Multimedia Language for the Web, 13(1), our partial dynamic service re-composition algorithm can pp. 61-95, February 2002. achieve almost the same level of QoS assurance as the [13] S.R. Ponnekanti and A. Fox. SWORD: A Developer Toolkit complete re-composition algorithm, but with much lower for Building Composite Web Services. Proc. of the Eleventh overhead. World Wide Web Conference (Web Engineering Track), Hon- olulu, Hawaii, May 2002. 8 Acknowledgment [14] L. Subramanian, I. Stoica, H. Balakrishnan, and R. H. Katz. OverQoS: Offering QoS using Overlays. Proc. of First Workshop on Hop Topics in Networks (HotNets-I), We would like to thank Dr. Eric Wu at IBM T.J. Watson Princeton, New Jersey, October 2002. research center for his helpful input to our work. We wish to thank anonymous reviewers for their helpful suggestions. [15] C. Ward, M. Buco, R. Chang, and L. Luan. A Generic SLA Semantic Model for the Execution Management of e- Bussiness Outsourcing Contracts. Proc. of 3rd International References Conference on e-Commerce and Web Technologies (EC-Web 2002), September 2002. [1] Akamai Inc. http://www.akamai.com/. [16] J. Winick and S. Jamin. Inet3.0: Internet Topology Generator. Tech Report UM-CSE-TR-456-02 [2] . http://gnutella.wego.com/. (http://irl.eecs.umich.edu/jamin/), 2002. [3] D. Andersen, H. Balakrishnan, F. Kaashoek, and R. Morris. [17] D. Xu and K. Nahrstedt. Finding Service Paths in a Media Resilient Overlay Networks. Proc. of 18th ACM SOSP 2001, Service Proxy Network. Proc. of SPIE/ACM Multimedia Banff, Canada, October 2001. Computing and Networking Conference, San Jose, CA, Jan- [4] R. Chang and C. Ravishankar. A Service Acquisition uary 2002. Mechamism for Server-Based Heterogeneous Distributed [18] D. Xu, K. Nahrstedt, and D. Wichadakul. QoS and Systems. IEEE Transactions on Parallel and Distributed Contention Aware Multi-Resource Reservation. Cluster Systems, 5(2), February 1994. Computing, the Journal of Networks, Software Tools and [5] S. Chen and K. Nahrstedt. An Overview of Quality-of- Applications, 4(2), Kluwer Academic Publishers, 2001. Service Routing for the Next Generation High-Speed Net- works: Problems and Solutions. IEEE Network Magazine, Special Issue on Transmission and Distribution of Digital Video, 12(6), pp. 64-79, 1998.

8