1

Start-time Fair Queuing: A Algorithm for

Integrated Services Networks

Pawan Goyal, Harrick M. Vin, and Haichen Cheng

Distributed Multimedia Computing Laboratory Department of Computer Sciences, University of Texas at Austin

Taylor Hall 2.124, Austin, Texas 78712-1188 g E-mail: fpawang,vin,hccheng @cs.utexas.edu, Telephone: (512) 471-9732, Fax: (512) 471-8885 URL: http://www.cs.utexas.edu/users/dmcl

Abstract To determine the characteristics of a suitable scheduling algo- rithm, consider the requirements of some of the principal applica- We present Start-time Fair Queuing (SFQ) algorithm that is com- tions envisioned for integrated services networks: putationally ef®cient, achieves fairness regardless of variation in a server capacity, and has the smallest fairness measure among all  Audio applications: To maintain adequate interactivity for known fair scheduling algorithms. We analyze its throughput, sin- such applications, scheduling algorithms must provide low gle server delay, and end-to-end delay guarantee for variable rate average and maximum delay. Fluctuation Constrained (FC) and Exponentially Bounded Fluctua-

 Video applications: Variable bit rate (VBR) video sources, tion (EBF) servers. We show that SFQ is better suited than Weighted which are expected to impose signi®cant requirements on Fair Queuing for integrated services networks and it is strictly better network resources, have unpredictable as well as highly vari- than Self Clocked Fair Queuing. To support heterogeneousservices able bit rate requirement at multiple time-scales [10, 12, 14]. and multiple protocol families in integrated services networks, we These features impose two key requirements on network re- present a hierarchical SFQ scheduler and derive its performance source management: bounds. Our analysis demonstrates that SFQ is suitable for inte- grated services networks since it: (1) achieves low average as well ± Due to the dif®culty in predicting the bit rate require- as maximum delay for low-throughput applications (e.g., interac- ment of VBR video sources, video channels may utilize tive audio, telnet, etc.); (2) provides fairness which is desirable more than the reserved bandwidth. As long as the ad- for VBR video; (3) provides fairness, regardless of variation in a ditional bandwidth used is not at the expense of other server capacity, for throughput-intensive, ¯ow-controlled data ap- channels (i.e., if the channel utilizes idle bandwidth), plications; (4) enables hierarchical link sharing which is desirable it should not be penalized in the future by reducing its for managing heterogeneity; and (5) is computationally ef®cient. bandwidth allocation.

± Due to multiple time-scale variation in the bit rate re- Intro duction 1 quirement of video sources, to achieve ef®cient utiliza-

tion of resources, a network will have to over-book Motivation 1.1 available bandwidth. Since such over-booking may Integrated services networks are required to support a variety of yield persistent congestion, a network should provide applications (e.g., audio and video conferencing, multimedia in- some QoS guarantees even in the presence of conges- formation retrieval, ftp, telnet, WWW, etc.) with a wide range of tion. (QoS) requirements. Whereas continuous media Unfair scheduling algorithms, such as Virtual Clock [24], applications such as audio and video conferencing require the net- Delay EDD [23], etc., penalize channels for the use of idle work to provide QoS guarantees with respect to bandwidth, packet bandwidth and do not provide any QoS guarantee in the pres- delay, and loss; applications such as telnet and WWW require low ence of congestion [18]. Fair scheduling algorithms, on the packet delay and loss. Throughput intensive applications like ftp,on other hand, guarantee that, regardless of prior usage or con- the other hand, require network resources to be allocated such that gestion, bandwidth would be allocated fairly[18]. Hence, fair the throughput is maximized. A network meets these requirements scheduling algorithms are desirable for video applications.

primarily by appropriately scheduling its resources.



 Data applications: To support low-throughput, interactive

hwas supp orted in part by IBM Graduate Fellowship,

This researc data applications (e.g., telnet), scheduling algorithms must

IBM Faculty DevelopmentAward, Intel, the National Science Foun-

h Initiation Award CCR-9409666), NASA, Mitsubishi

dation (Researc provide low average delay. On the other hand, to support h Lab oratories (MERL), and Sun Microsystems Inc. Electric Researc throughput-intensive, ¯ow-controlled applications in hetero- geneous, large-scale, decentralized networks, scheduling al- gorithms must allocate bandwidth fairly [5, 15, 19]. Due to the coexistence of VBR video sources and data sources in integrated services networks, the bandwidth available to data applications may vary signi®cantly over time. Consequently, the fairness property of the scheduling algorithm must hold To appear in the Proceedings of SIGCOMM 1996. regardless of variation in server capacity.

2

(t) Hence, in summary, a suitable scheduling algorithm for inte- a bit-by-bit manner. Formally, v is de®ned

grated services networks should: (1) achieve low average as well as:

(t) C

as maximum delay for low throughput applications (e.g., interac- dv

P

= (3)

dt r

tive audio, telnet, etc.); (2) provide fairness for VBR video; and j

2B (t)

(3) provide fairness, regardless of variation in server capacity, for j

B (t) throughput-intensive, ¯ow-controlled data applications. Further- where C is the capacity of the server and is the set of back- more, since such networks will support a wide variety of services logged ¯ows at time t in the bit-by-bit round robin server. WFQ and multiple protocol families, the scheduling algorithm should fa- then schedules packets in the increasing order of their ®nish tags.

cilitate hierarchical link sharing [6, 20]. Finally, to facilitate its Observe that implementation of WFQ requires computation of

(t) implementation in high-speed networks, it should be computation- v , which in turn requires simulation of bit-by-bit round robin ally ef®cient. A scheduling algorithm that achieves all of these server in real time. This simulation is computationally expensive. objectives is the subject of investigation in this paper. Furthermore, as the following examples illustrate, since WFQ is based on the assumption that the capacity of a server is constant, it

fails to provide fairness over servers with time varying capacity.

1.2 Relation To Previous Work

Each unit of data transmission at the network level is a packet. Example 1 Let the capacity of the server that WFQ is emulating

C>1

We refer to the sequence of packets transmitted by a source as a be C pkts/s, . Let the actual servercapacity be 1 pkt/s in [0,1)

f m

¯ow [24]. Each packet within a ¯ow is serviced by a sequence of and C pkt/s in [1,2). Consider two ¯ows and both of which have

C +1

servers (or switching elements) along the path from the source to unit length packets and weights of 1 pkt/s. Let ¯ow f send

j

f F (p )=j 1j C +1

the destination in the network. Before we describe fair scheduling packets at time 0. Hence, for ¯ow , ;.

f

t =1 algorithms that may be employed by the servers, let us consider the Let ¯ow m become backlogged at and be backlogged during

precise meaning of fair allocation of link bandwidth. the interval [1,2]. Since only ¯ow f is backloged during [0,1),

1

)=C +1 v (1) = C m F (p

Intuitively, allocation of link bandwidth is fair if equal band- using (3), we get . Hence, for ¯ow , m .

width is allocated in every time interval to all the ¯ows. This Since WFQ schedules packets in the increasing order of ®nish tags,

C 1  W (1; 2)  C W (1; 2)  1 m

concept generalizes to weighted fairness in which the bandwidth we get: f and . However, for

W (1; 2) W (1; 2) m

must be allocated in proportion to the weights associated with the fair allocation of bandwidth, f and should both

C=2 C

r f W (t ;t )

f 1 2 ¯ows. Formally, if f is the weight of ¯ow and be .Since can be chosen arbitrarily, this example illustrates

is the aggregate service (in bits) received by it in the interval the unfairness that can result when the actual capacity is lower than

[t ;t ] [t ;t ]

2 1 2

1 , then an allocation is fair if, for all intervals in which the capacity being assumed.

W (t ;t )

W (t ;t )

1 2

f

m

1 2

=0 m

both ¯ows f and are backlogged, .

r r m f Example 2 Let the capacity of the server that WFQ is emulating Clearly, this is an idealized de®nition of fairness as it assumes be 1 pkts/s. Let the actual server capacity be C pkt/s, C > 1, in

that ¯ows can be served in in®nitesimally divisible units. The

f m

[0,1) and 1 pkt/s in [1,C). Consider two ¯ows and both of C objective of fair packet scheduling algorithms is to ensure that 3

which have weights of 1 pkt/s. Let ¯ow f send packets at time

2

W (t ;t )

W (t ;t )

1 2

f

m

1 2

j

3C

j

j is as close to 0 as possible. How-

0 f F (p 1 j  m )= j

r r

m

f . Hence, for ¯ow , ;. Let ¯ow send

f

2

1 f

ever, it has been shown in [7] that if a packet scheduling algorithm C packets at time . Since only ¯ow is backloged during [0,1),

W (t ;t ) j

W (t ;t )

1 2

f

m

1 2

v (1) = 1 m F (p )= j+1 1  j  C

j j H (f; m)

guarantees that for all inter- . Hence, for ¯ow , m ;.Since

r r

m

f

 

max

max WFQ schedules packets in the increasing order of ®nish tags, while

l

l

1 f

m

[t ;t ] H (f; m)  + H (f; m)

1 2 W (1;C +1)  1 C 1  W (1;C +1)  C m

vals then ,where f , . However, for

2 r r

m

f

max

W (1;C+1) W (1;C+1) m

fair allocation of bandwidth, f and both

f m l

is a function of the properties of ¯ows and and f and

max

2 C

should be C= . Since, can be chosen arbitrarily, this example

l f m

m denote the maximum lengths of packets of ¯ow and , respectively. illustrates the unfairness that can result when the actual capacity

is greater than the capacity being assumed.

(f; m) Several fair scheduling algorithms that achieve value of H close to the lower bound have been proposed in the literature. The earliest known fair scheduling algorithm is Weighted Fair Queu- Thus, WFQ fails to provide fairness over variable rate servers. ing (WFQ) [5] (also referred to as Packet-by-Packet Generalized As we will outline in Section 3, to be useful for hierarchical link Processor Sharing (PGPS) [18]). WFQ was designed to emulate sharing [6, 20], a scheduling algorithm must provide fairness over a hypothetical bit-by-bit weighted round robin server in which the variable rate servers. Consequently, WFQ fails to meet two key number of bits of a ¯ow served in a round is proportional to the requirements (i.e., fairness over variable rate servers and support weight of the ¯ow. Since packets can not be serviced a bit at a time, for hierarchical link sharing) of a fair scheduling algorithm for inte- WFQ emulates bit-by-bit round robin by scheduling packets in the grated services networks. Observe that it may be possible to extend

increasing order of their departure times in the hypothetical server. WFQ to provide fairness over variable rate servers by changing the

(t)

To compute this departure order, WFQ associates two tags, a start de®nition of v as given in (3) to be a function of time vary-

C (t)

j ing server capacity . However, due to the unpredictable and tag and a ®nish tag, with every packet of a ¯ow. Speci®cally, if p

f multiple time-scale variation in VBR video bit rate, it may not be

j

th

l j f

(t) and denote the packet of ¯ow and its length, respectively, C

f possible to accurately estimate . Furthermore, this would make

j j

v (t)

A(p p

and if ) denotes the arrival time of packet at the server, then the computation of even more expensive, thereby making WFQ

f f

j j

j infeasible for high speed networks.

S (p F (p p )

start tag ) and ®nish tag of packet are de®ned as:

f f

f Fair Queuing based on Start-time (FQS), proposed in [13], com-

j j 1

j putes start tag and ®nish tag of a packet exactly as in WFQ. However,

S (p ) = maxfv (A(p ));F(p )g j  1 (1)

f f f instead of scheduling packets in the increasing order of ®nish tags,

it schedules packets in the increasing order of start tags. Though

j

l

f j

j FQS has advantages for processor scheduling, it is not known to

j  1 (2) F (p )= S(p )+

f f r

f have any advantage over WFQ for scheduling packets in a net-

v (t)

0 work. Moreover, since it utilizes as de®ned in (3), it has all the

F (p )=0 v (t) where f and is de®ned as the round number that disadvantages of WFQ. would be in progress at time t if the packets were being serviced in 3

Self Clocked Fair Queuing (SCFQ), originally proposed in [4] analysis is simple and conceptually elegant. We demonstrate that and later analyzed in [7], was designed to reduce the computational the hierarchical SFQ scheduler, in addition to supporting hetero- complexity of fair scheduling algorithms like WFQ. SCFQ also geneity, can be used to achieve separation of delay and throughput

schedules packets in the increasing order of ®nish tags. However, allocation as well as delay shifting (i.e., reduction of delay of a set

(t) it achieves ef®ciency over WFQ by approximating v with the of ¯ows while increasing the delay of other ¯ows).

®nish tag of the packet in service at time t. It has been shown that Our analysis demonstrates that: (1) SFQ is better suited than

max

max

l

l f

m WFQ for integrated services networks since it ef®ciently achieves

) (f; m) ( +

the value of H for SCFQ is , which is only a

r r m f fairness over variable rate servers and provides signi®cantly smaller factor of two away from the lower bound [7]. The main limitation average and maximum packetdelay to low-throughput applications; of SCFQ is that it increases the maximum delay incurred by the (2) SFQ is strictly better than SCFQ since maximum packet delay

packets signi®cantly. Speci®cally, if Q is the set of ¯ows served in SFQ is considerably smaller than in SCFQ and both have same f by a server and C its capacity, then packets of ¯ow may incur

P the fairness measure and implementation complexity; (3) SFQ is

max

l

n

2Q^n6=f n more delay in SCFQ than in WFQ [8]. This may strictly better than FQS since it has lower complexity and achieves be unacceptablyC large in many cases. fairness over variable rate servers without increasing the maximum WFQ and SCFQ sort and schedule packets in the increasing packet delay; and (4) SFQ provides considerably better fairness order of ®nish tags. Hence, per packet computational complexity properties and smaller maximum delay than DRR.

We demonstrate that SFQ is suitable for integrated services net-

(logQ) Q is O where is the number of ¯ows served by the server. To reduce this per packet computational complexity, De®cit Round works since it: (1) achieves low average as well as maximum delay Robin (DRR) was proposed in [21]. It is a derivative of weighted for low-throughput applications (e.g., interactive audio, telnet, etc.); round robin algorithm designed to accommodate variable length (2) provides fairness which is desirable for VBR video; (3) provides packets of a ¯ow. Though the per packet computational complexity fairness, regardless of variation in server capacity, for throughput-

intensive, ¯ow-controlled data applications; (4) enables hierarchical (1) of DRR is O per packet, it has the following two limitations:

link sharing which is desirable for managing heterogeneity; and (5)

 

max

max

l l

f is computationally ef®cient.

m

(f; m) 1+ +

1. The value of H for DRR is when

r r m

f The rest of the paper is structured as follows. We present SFQ

min r =1

2Q n n , which can deviate arbitrarily away from algorithm and analyze its fairness, throughput, single server de-

that of fair scheduling algorithms like SCFQ. For instance, if lay guarantee, and end-to-end delay guarantee in Section 2. We

max max

=1 = l r = r = 100 l H (f; m)

f m m

and f ,then for discuss hierarchical link sharing in Section 3 and present our im-

:02

DRR is 1 , which is 50 times larger than the corresponding plementation of SFQ for an ATM network interface in Solaris 2.4

:02 0 value for SCFQ. Clearly, appropriate choice of weights environment in Section 4. Finally, Section 5 summarizes our results.

can make this factor as high as desired. Since absolute values

(f; m)

of H have no physical meaning, the relative values are

Start-time Fair Queuing important. 2 2. The maximum delay incurred by packets serviced by a DRR In Start-time Fair Queuing (SFQ) algorithm, two tags, a start tag server can be arbitrarily high. To observe this, consider a and a ®nish tag, are associated with each packet. However, unlike WFQ and SCFQ, packets are scheduled in the increasing order of

weighted round robin server with all packets of size l (DRR is

(t) equivalent to weighted round robin server in such a scenario). the start tags of the packets. Furthermore, v is de®ned as the

It is easily observed that a lower bound on maximum delay start tag of the packet in service at time t. The complete algorithm

P

r l is de®ned as follows:

n

n2Q^n6=f

of a packet of ¯ow f is . Since weights can

C

j j

p S (p

1. On arrival, a packet is stamped with start tag ),

f be arbitrary, the lower bound can be arbitrarily high. f computed as:

In summary, the design of a fair scheduling algorithm that is: (1)

j j1

computationally ef®cient, (2) provides fairness over variable rate j

S (p ) = max fv (A(p ));F(p )g j  1 (4)

f f

servers, (3) facilitates hierarchical link sharing, and (4) has good f j

delay properties is an open problem. j

F (p p

where ), the ®nish tag of packet ,isde®nedas:

f f

j

1.3 Research Contributions of This Pap er

l

f

j j

F (p j  1 (5) )=S(p )+

f f

r In this paper, we present Start-time Fair Queuing (SFQ) algorithm f

that is computationally ef®cient and allocates bandwidth fairly re-

0

F (p )=0 r f

gardless of variation in a server rate. We show that it has a fairness f

where f and is the weight of ¯ow .

max

max

l

l

f

m

+ )

measure of ( , which is only a factor two away from

r r m

f 2. Initially the server virtual time is 0. During a busy period, the

v (t)

the lower bound and is at least as good as the fairness measure server virtual time at time t, , is de®ned to be equal to the 1

of all known fair scheduling algorithms. We analyze the through- start tag of the packet in service at time t .Attheendofa

(t)

put, single server delay, and end-to-end delay guarantee of SFQ. busy period, v is set to the maximum of ®nish tag assigned 2

To accommodate links whose capacity ¯uctuates over time (for ex- to any packets that have been serviced by time t . ample, ¯ow-controlled and broadcast medium links), this analysis is carried out for servers which can be modeled as either Fluctua- 3. Packets are serviced in increasing order of the start tags; ties tion Constrained (FC) or Exponentially Bounded Fluctuation (EBF) are broken arbitrarily (some tie breaking rules may be more servers [17]. To the best of our knowledge, this is the ®rst analysis desirable than others and are discussed in Section 2.3).

of a fair or a real-time scheduling algorithm for such servers. To

1

e that server virtual time changes only when a packet n-

support hierarchical link sharing, we present a hierarchical SFQ Observ ishes service.

scheduler. We build upon the analysis of FC and EBF servers and 2

We set v (t) to the maximum of nish tags of the packets at the

y of pro ofs; all the start tags as well

analyze the throughput, delay and end-to-end delay guarantees of end of busy p erio d only for clarit

er virtual time can b e equivalently set to zero. a ¯ow when the link bandwidth is hierarchically partitioned. The as the serv

4

max

v (t) W (t ;t )  0 r (v v ) l  0

1 2 f 2 1

Proof: f

As is evident from the de®nition, the computation of in SFQ is Since ,if f , (8) holds

max

r (v v ) l > 0

f 2 1

inexpensive since it only involves examining the start tag of packet trivially. Hence, consider the case where f ,

max

l k

in service. Hence, the computational complexity of SFQ is same f

v >v + p f

2 1

i.e., . Let packet f be the ®rst packet of ¯ow ,

r

f

(logQ) Q

as SCFQ, which is O per packet, where is the number of

(v ;v ) 2 ¯ows at the server. that receives service in the open interval 1 . To observe that

Traditionally, scheduling algorithms have been analyzed only such a packet exists, consider the following two cases:

n n

for servers whose service rate does not vary over time. However, n

 p S (p ) v

1 1

f f Packet f such that and ex-

service rate of ¯ow-controlled, broadcast medium and wireless links

f [t ;t ] 2

ists: Since ¯ow is backlogged in 1 , we conclude +1

may ¯uctuate over time. Fluctuation in service rate may also occur n

v (A(p ))  v

1 . From (4) and (5), we get:

due to variability in CPU capacity available for processing packets f

+1 n

(for example, a CPU constrained IP router may not have suf®cient n

S (p )=F(p ) (9) f

CPU capacity to process packets when routing updates occur). If a f

max

l

n f n

server is shared by multiple types of traf®c with some traf®c types n

F (p )  S (p )+ S (p )

1

f f Since f and ,weget: r being given priority over the other, then for lower priority traf®c, f

the link appears as a server with ¯uctuating service rate. In order max

l

f

n+1

S (p ) < v + (10) 1

to accommodate such scenarios, we analyze SFQ for servers with f r

bounded ¯uctuation in service rate. f

< v (11)

Two server models,termed Fluctuation Constrained (FC) server 2

n+1

and Exponentially Bounded Fluctuation (EBF) server, that have n

S (p )=F(p ) >v

1 Since f , using (11), we conclude

bounded ¯uctuation in service rate and are suitable for modeling f

n+1

S (p ) 2 (v ;v ) 2 1 .

many variable rate servers have been introduced in [17]. A FC f

C

n n

server has two parameters; average rate (bits/s) and burstiness n

 p S (p )=v p

1

f f Packet f such that exists: may ®nish ser-

(C )

 (bits). Intuitively, an FC server, in any interval during a busy

t

vice at time 1or . In either case, since ¯ow

 (C ) +1

period, does at most less work than an equivalent constant n

f [t ;t ] v(A(p ))  v

1 2

is backlogged in , 1 . Hence,

f

rate server. Formally, max

l

n+1 n n f n

S (p F (p )  S (p )+ )= F(p )

f f f .Since and

f r

De®nition 1 A server is a Fluctuation Constrained (FC) server f

n

S (p )=v

1

(C; (C )) [t ;t ]

f ,weget: 2

with parameters , if for all intervals 1 inabusype-

W (t ;t )

max 2

riod of the server,the work done by the server,denoted by 1 ,

l

f

n+1

S (p (12)  v +

satis®es: )

1

f

r

f

W (t ;t )  C(t t ) (C) (6)

< v (13)

1 2 2 1

2

n+1

EBF server is a stochastic relaxation of FC server. Intuitively, n

S (p )=F(p ) >v

1 Since f , using (13), we conclude

the probability of work done by an EBF server deviating from the f

n+1

3

S (p ) 2 (v ;v ) 2

1 .

f average rate by more than , decreases exponentially with .

Formally, Since either of the two cases always holds, we conclude that packet

k k

p S (p ) 2 (v ;v )

1 2

f f such that exists. Furthermore, from (10) and

De®nition 2 A server is an Exponentially Bounded Fluctuation (12), we get:

max

C ; B ; ;  (C ))

(EBF) server with parameters ( , if for all intervals

l

f

k

S (p )  v + (14)

[t ;t ]

1

1 2

in a busy period of the server, the work done by the server, f

r

f

W (t ;t ) 2

denoted by 1 , satis®es:

k +m

Let p be the last packet to receive service in the virtual time

f

(v ;v )

1 2

interval . Hence,

P (W (t ;t )

1 2 2 1

k +m

F (p )  v (15) 2 In what follows, we analyze the fairness of SFQ for any variable f

rate server, and its throughput and delay guarantees for FC and From (14) and (15), we conclude:

(C; 0)

EBF servers. Since a FC server is a constant rate server, the max

l

f

k +m k

) S (p )  (v v ) F (p following analysis is also valid for constant rate servers. We present (16)

2 1

f

f r the proofs of only few results; the rest of the proofs are presented f

in [11].

f (v ;v ) 2 But since ¯ow is backlogged in the interval 1 , from (4) and

(5) we know:

2.1 Fairness Guarantee

n=m

k +n

X

l

f

W (t ;t ) k +m k

1 2

f

F (p ) = S (p )+ (17)

j f

To prove that SFQ is fair, we need to prove a bound on f

r

f

r

f

W (t ;t ) n=0

m

1 2

f m

j for any interval in which both ¯ows and are

r

n=m m

k +n

X

l

f

+m k

backlogged. We achieve this objective by establishing a lower and k

F (p ) S (p ) = (18)

f

f

W (t ;t )

f 1 2

an upper bound on in Lemmas 1 and 2, respectively. r

f

n=0

f [t ;t ] 2 Lemma 1 If ¯ow is backlogged throughout the interval 1 ,Hence, from (18) and ( 16), we get:

then in a SFQ server:

n=m

k +n

max

X

l

l

f max f

r (v v ) l  W (t ;t ) (8)

 (v v ) (19)

f 2 1 f 1 2

2 1 f

r r

f f

n=0

v = v (t ) v = v (t )

1 2 2

where 1 and .

n=m

X

k +n max

3

l  r (v v ) l (20)

f 2 1 The EBF server as presented here has an extra parameter  (C ).

f

f

However, this parameter do es not change the de nition signi cantly.

n=0

5

k +m k +m

S (p )

Since 2, packet is guaranteed to have been

f f

P

n=m

+n

k 1

l t W (t ;t ) 

f 1 2

transmitted by 2 . Hence, ,andthe

f

n=0 lemma follows.

2 Switch 2.5 Mb/s

[t ;t ] 2 Lemma 2 In a SFQ server, during any interval 1 :

Destination

max

W (t ;t )  r (v v )+l (21)

f 1 2 f 2 1

f 3

v = v (t ) v = v (t )

1 2 2 where 1 and .

Proof: From the de®nition of SFQ, the set of ¯ow f packets served Figure 1 : The network topology used for the simulations

[v ;v ] v v

2 1 2 in the interval 1 have service tag at least and at most . Hence, the set can be partitioned into two sets: 700 WFQ: Source 2 SFQ: Source 2

SFQ: Source 3 v  WFQ: Source 3

Set D consisting of packets that have service tag at least 1 600 v

and ®nish time at most 2 . Formally,

500

k k

D = fk jv  S (p )  v ^ F (p )  v g (22)

1 2 2

f f 400 From (4) and (5), we conclude:

300

X k

Packet Number Received

 r (v v ) (23) l

f 2 1 f

200

k 2D

100

 v

Set E consisting of packets that have service tag at most 2

v 0 and ®nish time greater than 2 . Formally, 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

Time

k k

E = fk jv  S (p )  v ^ F (p ) >v g (24)

1 2 2 f f Figure 2 : Comparison of the number of packets received by

Clearly, at most one packet can belong to this set. Hence, sources2and3inWFQandSFQ

X

k max

l  l (25) f

f were TCP Reno sources and used 200 bytes packets. The link ca-

2E k pacity between the switch and the destination was 2.5 Mb/s. The From (23) and (25) we conclude that (21) holds. scheduling algorithm at the switch gave higher priority to source Since unfairness between two ¯ows in any interval is maximum 1 packets and scheduled source 2 and 3 packets using either WFQ when one ¯ow receives maximum possible service and the other or SFQ. Consequently, the output link at the switch appeared as a minimum service, Theorem 1 follows directly from Lemmas 1 and variable rate server to sources 2 and 3. The WFQ implementation 2. used the link capacity to compute the ®nish tags. Source 3 was

made active 500 ms after sources 1 and 2, and the network was

[t ;t ] f m 2 Theorem 1 For any interval 1 in which ¯ows and are simulated for one second. Figure 2 plots the sequence number of backlogged during the entire interval, the difference in the service packets of sources 2 and 3 received by the destination when WFQ received by two ¯ows at a SFQ server is given as: and SFQ were used. As the ®gure demonstrates, source 2 received

unfair advantage over source 3 when WFQ was used. Speci®cally,

max

max

l

W (t ;t ) W (t ;t ) l

f 1 2 m 1 2 f

m when WFQ was used, during the 500ms interval after source 3 be-

j j + (26)

r r r r

m f m f came active, whereas the destination received only 48 packets from source 3, it received 331 packets from source 2. On the other hand, when SFQ was used, it received 189 packets and 190 packets from Theorem 1 demonstrates that fairness measure of SFQ is at sources 2 and 3, respectively. Furthermore, during the ®rst 435 ms most a factor of 2 away from an optimal fair packet scheduling after source 3 was started, the number of source 3 packets received algorithm. Furthermore, it demonstrates that SFQ has the smallest by the destination in WFQ and SFQ was 2 and 145, respectively. fairness measure among all the known scheduling algorithms [22]. This experimentally validates the superiority of SFQ over WFQ for Observe that to establish Theorem 1, we did not make any achieving fair allocation of bandwidth over variable rate servers. assumptions about the service rate of the server. Hence, Theo-

rem 1 holds regardless of the characteristics of the server. This

Throughput Guarantee demonstrates that SFQ achieves fair allocation of bandwidth over 2.2 variable rate servers, and thus meets a fundamental requirement of In the previous sections, we have not assigned any interpretation

fair scheduling algorithms for integrated services networks. This is to the weight of a ¯ow. To establish the throughput and delay r an important advantage over WFQ that does not allocate bandwidth guarantee of a ¯ow, we will henceforth interpret f as the rate

fairly over variable rate servers. assigned to ¯ow f . Theorems 2 and 3 establish the throughput To experimentally evaluate the relative performance of SFQ and guaranteed to a ¯ow by a SFQ FC and EBF server, respectively, WFQ, we simulated 3 ¯ows with the same destination traversing a when appropriate admission control procedures are used. single switch using REAL network simulator (see Figure 1). Source

Theorem 2 If Q is the set of ¯ows served by a SFQ FC server

1 transmitted a MPEG compressed VBR video sequence with av- P

4

r  C (C; (C ))

erage rate 1.21 Mb/s using 50 bytes packets . Sources 2 and 3 with parameters , and n , then for all inter-

n2Q

[t ;t ] f 2

vals 1 in which ¯ow is backlogged throughout the interval,

4

The video sequence was derived by digitizing and compressing

television serial Frasier.

6

W (t ;t )

1 2 f is given as: The throughput guarantees of other fair scheduling algorithms

P have not been established. However, it can be shown that for

max

l

n

 (C )

n2Q

max constant rate servers, when similar admission control procedure is

W (t ;t )  r (t t ) r r l

f 1 2 f 2 1 f f

f C C used, the throughput guarantee of neither WFQ nor SCFQ is better

(27) than that of SFQ. Observe that throughput guarantees derived in Theorems 2 and

c 3 for SFQ have a recursive structure. Speci®cally, the through-

W (v ;v ) v (t )=v

1 2 1 Proof: Let 1 and let denote the aggregate length of packets served by the server in the virtual time interval put guaranteed to a ¯ow by an FC or an EBF SFQ server is also

¯uctuation constrained or has exponentially bounded ¯uctuation,

[v ;v ] 2 1 . Then, from Lemma 2 we conclude:

respectively. We will exploit this recursive structure to analyze

X X max

c performance bounds when a link bandwidth is hierarchically parti-

W (v ;v )  r (v v )+ l (28)

1 2 n 2 1

n tioned.

n2Q n2Q

P

r  C

Since n ,

2.3 Delay Guarantee

n2Q

X max

c SFQ algorithm, as de®ned so far, only allocates constant rate to the

(29) W (v ;v )  C(v v )+ l

1 2 2 1

n packets of a ¯ow. However, due to the multiple time-scale variation

2Q

n of VBR video, to achieve ef®cient utilization of network resources, v

De®ne 2 as: a server may be required to allocate variable rate to packets of P

max a video ¯ow. To support variable rate allocation, we generalize

l

n

 (C )

j n2Q

r

v = v + t t (30)

1 2 1

2 SFQ by extending the de®nition of service tags. Let be the

f

C C

j j j

p F (p )

rate assigned to packet p . Then ®nish tag of packet , is

f f f

Then from (29), we conclude: de®ned as:

P

j

max

l

l

f

j j

n

n2Q

F (p )= S(p )+ j  1 (41)

c

f f

W (v ;v )  C(v + t t (31) j

1 2 1 2 1

r

C

f

X

 (C )

max Start tag of a packet and the system virtual time are de®ned as

v )+ l

1 n

C before. We now prove the delay guarantee of the generalized SFQ

2Q

n algorithm.

 C (t t )  (C ) (32) 1 2 Clearly, a server can provide a bound on delay only if its ca-

pacity is not exceeded. However, when variable rate is allocated to

b b

t v (t )=v T (w )

2 2 Let 2 be such that . Also, let be the time taken by the packets of a ¯ow, the intuitive meaning of capacity not being server to serve packets with aggregate length w in its busy period.

Then, exceeded needs to be de®ned precisely. To do so, let rate function

f v R (v )

for ¯ow at virtual time , denoted by f , be de®ned as the

c

v

b

 t + T (W (v ;v )) (33)

t rate assigned to the packet that has start tag less than and ®nish

2 1 1 2

tag greater than v . Formally,

 t + T (C (t t )  (C )) (34)

1 2 1





j j j

r 9j 3 S (p )  v

From the de®nition of Fluctuation Constrained server, we get: if )

f f f

R (v )= f

0 otherwise

w  (C )

T (w )  + (35)

C C

Let Q be the set of ¯ows served by the server. Then a FC or EBF

From (34) and (35) we get: server with average rate C , is de®ned to have exceeded its capacity

P

v R (v ) >C

at virtual time if n . If the capacity of a SFQ

n2Q

C (t t )  (C )  (C )

2 1

b

t  t + + (36) 1

2 server is not exceeded, then it guarantees a deadline to a packet C

C based on its expected arrival time. Expected arrival time of packet

 t (37)

2

j j j j

p r EAT (p ;r

that has been assigned rate , denoted by ),is

f f f f

From Lemma 1 we know that: de®ned as:

max

j1

b

W (t ; t )  r (v v ) l (38)

f 1 2 f 2 1

f

l

f

j j1 j1

max fA(p g j  1 (42) );EAT(p ;r )+

f f f

j1

b

t  t r 2

Since 2 , using (30) we get:

f

P

max

l

0 0 n

 (C )

n2Q

max

EAT (p ;r )=1

f where f . A deadline guarantee based on ex-

W (t ;t )  r (t t ) r r l

f 1 2 f 2 1 f f

f C C pected arrival time has been referred to as delay guarantee [8, 16]. (39) Theorems 4 and 5 establish the delay guarantee of SFQ for FC and EBF servers, respectively.

Theorem 3 If Q is the set of ¯ows served by a SFQ EBF server P

Theorem 4 If Q is the set of ¯ows served by a SFQ FC server with

P

(C ; B ; ;  (C )) r  C

with parameters , and n , then for

n2Q

R (v )  C (C; (C )) v

parameters and n for all , then the

n2Q

[t ;t ] f 2

all intervals 1 in which ¯ow is backlogged throughout the

j j

p L (p )

SF Q

(t ;t )

W departure time of packet at the server, denoted by ,is

f 1 2 f interval, is given as: f

P given by:

max

l

n

 (C )

n2Q

P (W (t ;t ) < r (t t ) r r j

f 1 2 f 2 1 f f

max

X

l

l  (C )

C C

f

n j j j

L (p )  EAT (p ;r )+ + + (43)

SF Q

f f f

max

C C C

r )  Be 0  (40) l

f

f

n2Q^n6=f C

7

T (w ) Proof: Let set H be de®ned as follows: Let be the time taken by server to serve packets with aggregate

length w in a busy period. From the de®nition of Fluctuation

m m

))g (44) )= v(A(p H = fmjm> 0^S(p f

f Constrained server, we get:

k

k  j H v = v (A(p ))

1 w  (C )

Let be largest integer in . Also, let f

T (w )  + (55)

j

v = S (p )

C C

and 2 . Observe that as the server virtual time is set f

to the maximum ®nish tag assigned to any packet at the end of a

j

j

k

p v

Since packet departs at system virtual time 2 and all the packets

f

p p

busy period, packet f and are served in the same busy period

f

[v ;v ] 2 served in the virtual time interval 1 are served in the same of a server. From the de®nition of SFQ, the set of ¯ow f packets

busy period of the server, we get:

[v ;v ] v

2 1

served in the interval 1 have start tag at least and at most v

2 . Hence, the set can be partitioned into two sets:

j k

c

)+ T(W(v ;v ))  L (p ) (56) A(p

1 2 SF Q

f

f

 v

This set consists of packets that have start tag at least 1 and

v n

®nish tag at most 2 . Formally the set of packets of ¯ow , D

denoted by n ,inthissetis:

k+n

P P max

l

n=j k 1 l

k

f

n

A(p ) + +

m m

f

k+n

C

n=0 n2Q^n6=f

r

D = fmjv  S (p )  v ^ F (p )  v g (45)

n 1 2 2

n n

f

j

l

m m

 (C ) j

f

R (v ) F (p ;r ) n

Then, from the de®nition of and n , we know

f

+ +  L (p ) (57)

SF Q

f

C C

that the cumulative length of such ¯ow n packets served by

[v ;v ] 2

the server in the virtual time interval 1 , denoted by From (42) we get

AP (v ;v )

1 2

n , is given as:

j

max

X

l

Z

 (C )

l

f

v

j j n j

2

EAT (p ;r )+ + +  L (p ) (58)

SF Q

f f f

C C C

R (v)dv (46) AP (v ;v ) 

n n 1 2

n2Q^n6=f

v 1

Hence, aggregate length of packets in this set,

P

AP (v ;v )

1 2

n ,is given as:

n2Q

Theorem 5 If Q is the set of ¯ows served by a SFQ EBF server

P

Z

v

2 (C ; B ; ;  (C )) R (v )  C

X X

with parameters and n for all

n2Q

j

R (v)dv (47) AP (v ;v ) 

n n 1 2 p

v , then the departure time of packet at the server, denoted by

f

v

1

n2Q n2Q

j

L (p ) SF Q

Z , is given by:

f

v

2

X

R (v )dv (48) 

n

max

X

l

j j j n

v

1

P ( L (p )  EAT (p ;r )+

n2Q

SF Q

f f f

C

Z

v

2

n2Q^n6=f

Cdv (49) 

j

l

 (C )

f

v

1

+ + + )  1 Be 0  (59)

C C C

 C (v v ) (50)

2 1

j j

v = S (p ;r ) k v v =

2 1

But since 2 , from the de®nition of ,

f f

5

k+n

P

l The delay guarantee derived in Theorems 4 and 5 is indepen-

n=j k 1

f

+n

k . Hence,

=0 n dent of a tie breaking rule that a SFQ server may use when more r

f than one packet have the same start tag. Though a tie breaking rule

=jk1

n does not effect the delay guarantee, it can be used by a server to

k +n

X X l

f achieve different objectives. For example, a tie-breaking rule may

AP (v ;v )  C (51)

n 1 2

k +n

r give higher priority to interactive, low-throughput applications to

f

2Q n=0 n reduce the average delay.

Theorems 4 and 5 can be used to determine delay guarantee

v 

This set consists of packets that have start tag at most 2 and even when a server has ¯ows with different priorities and services v

®nish tag greater than 2 . Formally, the set of packets of ¯ow them in the priority order (such a scenario may occur in an inte-

n E

, denoted by n ,inthissetis grated services network with different traf®c types). To illustrate, m

m consider a server that services ¯ows with two priorities and uses

E = fmjv  S (p )  v ^ F (p ) >v g (52)

n 1 2 2 n n SFQ to schedule the packets of lower priority ¯ows. If the band-

Clearly, at most one packet of ¯ow n can belong to this set. width requirement of ¯ows that are given higher priority can be



E = fj g

Furthermore, f . Hence, the maximum aggregate characterized by a leaky bucket with average rate and burstiness

length of packets in this set is:  (such a characterization may be enforced by shaping the higher

priority ¯ows through a leaky bucket), then the residual bandwidth

X

max j

l + l

(53) available to the lower priority ¯ows can be modeled as ¯uctuation

n

f

C ;  )

constrained with parameters ( . Hence, Theorem 4 can be

2Q^n6=f n used to determine the delay guarantee of the lower priority ¯ows. Hence, the aggregate length of packets served by the server in Similarly, if the aggregate arrival process of the high priority ¯ows

can be modeled as poisson process, then the residual bandwidth

c

[v ;v ] W (v ;v )

1 2 2 the interval , denoted by 1 ,is: can be modeled as EBF server [17] and Theorem 5 can be used to

determine the delay guarantee.

n=jk1

k +n

X X

l

f

max j

c

5

(54) W (v ;v )  C + l + l

1 2

n

The pro of metho ds of Theorem 4 and 5 can b e used to derive

f

k +n

r

f delay guarantee of FC and EBF SCFQ servers.

n=0 n2Q^n6=f 8

Flows =100 Theorem 4 demonstrates that maximum delay of a packet in 50 Flows =150 SFQ is signi®cantly smaller than in SCFQ. Speci®cally, a tight Flows =200 bound on the departure time of a packet at a constant rate server 40 employing SCFQ, given in [9], is:

30

j

max

X

l

l

f

j j j

n

(60) L (p )  EAT (p ;r )+ +

SC F Q

f f f

j 20

C

r

f

n2Q^n6=f Difference in delay(s)

10

(C )=0 Since  for a constant rate server, the difference in maxi- mum delay that a packet may incur at servers employing SCFQ and 0

SFQ is:

j j l

l -10 f

f 0 200 400 600 800 1000

(61) Rate(Kb/s)

j

C

r f

Clearly, maximum delay in SFQ is much smaller than in SCFQ. Figure 3 : Difference in maximum delay in WFQ and SFQ

j j

= l =

To illustrate numerically, when r 64Kb/s, 200 bytes and f f 60 SFQ C=100Mb/s, the difference is 24.4ms. If there are K servers on WFQ

the path of a ¯ow, this difference increases by a factor of K .For

50 =5 instance, the difference increases to 122ms for K . Similarly, the difference increases linearly with the packet size. Theorem 4 demonstrates that SFQ does not couple bandwidth 40 and delay allocation. This enables it to provide lower maximum delay to low-throughput applications as compared to WFQ that 30 inversely couples delay and bandwidth allocation. To observe this, Delay (ms)

consider the difference in the maximum delay experienced by packet 20

j j

(p )

p , denoted by , in WFQ and SFQ. Since WFQ guarantees

f f

j 10

l

j j j

l f

max

p EAT (p + ;r )+

that packet will be transmitted by j

f f f

C

r f 0

l 0.7 0.75 0.8 0.85 0.9 0.95 where max is the maximum packet length at the server, we get:

Average Utilization

j j

max

X

l l

l l

max f f n

j Figure 4 : Comparison of average delay in WFQ and SFQ

(p + (62) )=

f

j

C C C

r

f

n2Q^n6=f j

j the increasing order of start tags, and thereby schedules packets at

(p ) l = l =

To gain a qualitative understanding of ,let max f

f the earliest possible instant; WFQ schedules packets in increasing

j

max

l = l r = r f

n and . Then,

f order of ®nish tag, and thus delays a packet as long as possible.

To validate this hypothesis, we simulated a switch that was shared

l (jQj1)l

j by high and low throughput ¯ows carrying poisson traf®c. The

(p )= (63)

f C

r link capacity was 1Mb/s and the packet size was 200 bytes. Seven f

high-throughput ¯ows with average rate 100Kb/s shared the switch

j

p )  0 Hence, ( if: with varying number of low-throughput ¯ows with average rate

f 32Kb/s. The number of low-throughput ¯ows was varied from r

1 2 to 10 and the switch was simulated for 1000 seconds. Figure

f (64)

 4 compares the average packet delay of low-throughput ¯ows in

Q j1 C j WFQ and SFQ at varying levels of link utilization. As the ®gure This shows that maximum delay of packets of a ¯ow in SFQ is illustrates, the average delay of low-throughput ¯ows is signi®cantly smaller than in WFQ if the fraction of the link bandwidth used by higher in WFQ than in SFQ. In fact, at 80.81% link utilization,

the ¯ow is at most 1 , which is expected to be true for low- the average delay is 53% higher in WFQ than in SFQ. Hence,

jQj1 throughput applications. This is also illustrated by Figure 3, which SFQ provides lower maximum as well as average delay to low- plots the reduction in delay in SFQ for different number of ¯ows throughput applications as compared to WFQ. and ¯ow rates, assuming 200 bytes packet and link capacity of As is evident from the de®nition of the expected arrival time, 100Mb/s. As the ®gure shows, the reduction is higher for lower two key properties of the delay guarantee of SFQ for a ¯ow are: throughput ¯ows. To compare the delay performance of WFQ and (1) it is independent of the behavior of other sources at the server, SFQ, consider a network link that is servicing 70 ¯ows (possibly and thereby isolates the ¯ow; and (2) it is independent of a traf®c video ¯ows) with rate 1Mb/s and 200 ¯ows (possibly audio ¯ows) characterization. Whereas the isolation property enables a server with rate 64Kb/s. In such a scenario, whereas the maximum delay to provide stronger guarantees to the ¯ow and is highly desirable of the packets of ¯ow with rate 64 Kb/s reduces by 20.39ms in when sources may be malicious [2,5, 16, 19], independence of delay SFQ, the maximum delay of 1Mb/s ¯ows increases by 2.48 ms. guarantee from traf®c characterization enables a server to provide Hence, SFQ reduces the maximum delay of low throughput ¯ows various QoS guarantees to ¯ows conforming to any speci®cation signi®cantly without an appreciable increase in the delay of high [9]. To enable a network of servers to provide similar guarantees, throughput ¯ows. in what follows, we derive end-to-end delay guarantee. SFQ not only reducesthe maximum delay as comparedto WFQ, but is also expected to lower the average delay of low-throughput applications. This is because whereas SFQ schedules packets in

9

K j j

2.4 End-to-End Delay Guarantee (p ) p K

where L is the time at which packet leaves server , and

j j;n

br = min br

2[1::K ] n . The objective is to determine the deadline guarantee of a network

of servers based on the expected arrival time of a packet at the ®rst As Corollary 1 illustrates, the end-to-end guarantee, like the th

server on the path of a ¯ow [9]. To do so, let the i server along the single server guarantee, is based on the expected arrival time of K path of a ¯ow be denoted as server i. Also, let there be servers a packet and hence is independent of the behavior of other ¯ows on the path of a ¯ow and let each of the server guarantee a deadline and any particular ¯ow characterization. Moreover, as illustrated to a packet based on its expected arrival time. Then, the network in [9], it can be used to determine various QoS parameters like

guarantees a deadline to a packet based on its expected arrival time upper bound on end-to-end delay, packet loss probability and buffer th

at the K server. Observe that the expected arrival time of a packet requirement for any traf®c speci®cation. Hence, such a guarantee

K 1 at server K is dependenton departure time of packetat server , is highly desirable.

which, in turn, is dependenton expected arrival time of the packetat To derive Corollary 1, we have only required the scheduling

1 server K . Using this argument recursively, a network of servers algorithm at each server to satisfy (66). Hence, any scheduling can guarantee a deadline to a packet based on the expected arrival algorithm that satis®es (66) (for example, Virtual Clock, WFQ, and time of the packet at the ®rst server. This method has been used in SCFQ) can interoperate to provide end-to-end guarantee. Further- [9] to derive end-to-end delay guarantee of a network of servers that more, Corollary 1 can be used for an internetwork of FC and EBF employ algorithms in the class of Guaranteed Rate (GR) scheduling servers. Finally, the proof method of Theorem 6 and Corollary 1 algorithms. However, the end-to-end delay guarantee presented in can be used to derive end-to-end delay guarantee even when packet [9] assumes that each of the server provides a deterministic bound may be fragmented and reassembled in the network. Hence, SFQ on the departure time of a packet. Consequently, even though SFQ can provide guarantees in heterogeneous internetworking environ- belongs to GR, the guarantee is not applicable to a network which ments. may have some SFQ EBF servers. To analyze such networks, we

generalize the method presented in [9]. Discussion Observe that SFQ delay guarantee for both FC and EBF servers 2.5

can be rewritten as: SFQ borrows the notion of ªself-clockingº, i.e., computing system 

virtual time based on a tag of a packet in service, from SCFQ to

j j j j 

P L (p )  EAT (p ;r )+ +  1 Be (65)

SF Q

f f f f achieve ef®ciency. However, it provides signi®cantly lower delay

than SCFQ while achieving the same fairness measure. Hence,

j

max

P

l

l

j  (C ) f

n using fairness, throughput, delay, and computational complexity as

 0 = + +

where . Substituting ,

f

C C C

2Q^n6=f

n performance metrics, we conclude that SFQ is strictly better than

=0  = 1 B and , yields the delay guarantee for FC server.

j SCFQ.

P max

l

l

 (C )

j

f

n

+ +  = C

Substituting = and ,on SFQ is similar in spirit to FQS which also schedules packets in

f

C C C

n2Q^n6=f

(t) the other hand,yields the delay guarantee for EBF server. Hence,we the increasing order of start tags. However, as FQS uses v as will use (65) to derive the end-to-end delay guarantee. Furthermore, de®ned in (3), it is computationally expensive and fails to allocate to facilitate interoperability with other scheduling algorithms, we bandwidth fairly over variable rate servers. Furthermore, as Theo- will only require each server on the path of a ¯ow to guarantee a rem 1 shows, its fairness measure is no smaller than that of SFQ. deadline which is similar to (65). We ®rst relate the expected arrival Finally, since in FQS, all Q ¯ows can becomeactive simultaneously, time of a packet at adjacent servers in Theorem 6 and then use it to and consequently Q packets can have the same start tag, the bound

derive end-to-end delay guarantee in Corollary 1. on the departure time of a packet in FQS cannot be smaller than in +1 i;i SFQ. Hence, SFQ has many advantages but no disadvantages over

Let  be an upper bound on the propagation delay between

i +1 i

servers i and . Also, let all the variables of server be FQS.

j j;i

j A key advantage of SFQ over WFQ, in addition to lower im-

r

identi®ed by superscript i, i.e., and are identi®ed as

f f f

j;i plementation complexity, is that it achieves fairness over variable and r , respectively. Henceforth in this section, we will refer to a

f rate servers while WFQ does not. Moreover, it provides consid- f single ¯ow f , and, hence drop the subscript from all the variables. erably lower maximum and average delay to low-throughput ap- plications than WFQ. Since low-throughput applications like audio

Theorem 6 If scheduling algorithm at server i guarantees that:

and telnet are more delay sensitive than high-throughput applica-



i

j i j j;i j;i i 

i tions like video and ftp, this feature is highly desirable. In case

L (p )  EAT (p ;r )+ +  1 B e (66)

P delay guarantee of WFQ is required, SFQ can be combined with

j j

i non work-conserving Virtual Clock to derive Fair airport scheduling

(p ) p i where L is the time at which packet departs server , then:

algorithm that provides the delay guarantee of WFQ and ef®ciently

+1 j j;i i j j;i

i achieves fairness even over variable rate servers [11]. Thus, since

P ( EAT (p ; br )  EAT (p ; br )

SFQ addresses the drawbacks of WFQ while achieving its delay

i

n;i i;i+1 i 

f g +  + )  1 B e

+ max guarantee if desired, it is better suited than WFQ for integrated

2[1::j ]

n services networks.

j;i j;i+1

j;i Finally, since SFQ provides a bound on maximum delay that

r  min fr ;r g  0 where b and . does not depend on the weights of other ¯ows and has a bounded Corollary 1 If scheduling algorithm at each server on the path of deviation from an optimal fair scheduling algorithm, it has better

a ¯ow satis®es (66), and there are K serverson the path of the ¯ow, fairness and delay properties than DRR. then: To summarize, we have shown that SFQ: (1) achieves low av-

erage as well as maximum delay for low-throughput applications;

n=K

X (2) provides fairness, regardless of variation in a server rate; (3) has

m;n K j 1 j j

g max f (p )  EAT (p ; br )+ ( L

P a fairness measure that is at least as good as that of all the known

m2[1::j ] =1

n fair scheduling algorithms; and (4) is computationally ef®cient. In

1

n=K n=K 1

P

X X the next section, we show that it enables hierarchical link sharing,

n=K

1

n n;n+1

n



n=1

B )e  + )  1 ( + and thus meets all the requirements of a scheduling algorithm for

integrated services networks.

n=1 n=1

10

3 Hierarchical Link Sharing f  Throughput Guarantee: Consider a class that is a sub-

class of the root class. Let the link be FC server with param-

C; (C ))

Hierarchical link sharing is an ideal mechanism for managing het- eters ( and let the set of the subclasses of the root f

erogeneity in integrated services networks [6, 20]. It can be used class be denoted by Q. Then, if class has been assigned r by a network to support services that provide heterogeneous QoS rate f , from Theorem 2 we conclude that the virtual server

as well as multiple protocol families that support different traf®c corresponding to f is a FC server with parameters:

types and/or congestion control mechanisms. For example, a net-

P

 

max l

work can support hard and soft real-time, as well as best effort n

 (C )

n2Q

max

(67) r ;r + r + l

f f f

services by partitioning the link bandwidth between them as per f C the expected requirements of each of the service. To support high C and low reliability soft real-time services, the bandwidth of soft Similarly, using Theorem 3, we conclude that if the link is real-time service may be further partitioned. Similarly, the band-

an EBF server, then the virtual server corresponding to f is width of the best effort services may be further partitioned between a EBF server. Hence, if the link is an FC or EBF server, throughput intensive and interactive services. then the virtual servers corresponding to the subclasses of A key advantage of hierarchical link sharing is that it provides the root class are FC or EBF servers, respectively. Using the isolation between different services while enabling similar services argument recursively, we conclude that if the link is a FC or to share resources. Hence, incompatible congestion control algo- EBF server, then each of the virtual server in the hierarchical rithms can coexist while compatible algorithms reap the advantages structure is a FC or EBF server, respectively. Consequently, of sharing. For example, while high and low reliability soft real- Theorems 2 and 3 can be used to determine the throughput time services get the bene®ts of sharing, the hard real-time service is guarantee of the ¯ows. isolated from the overbooking that may occur in soft real-time ser-

vices, and the congestion control algorithm that may be used by the  Delay Guarantee: Since each of the virtual server is either best effort services. Hierarchical link sharing also facilitates use of FC or EBF server, Theorems 4 and 5 can be used to determine different resource allocation methods for different services. This is the single server delay guarantee of the ¯ows. desirable as hard real-time services may use a scheduling algorithm that performs well when there is no overbooking; soft real-time  End-to-End Delay Guarantee: Since the single server services may prefer to use a scheduling algorithm that provides delay guarantee when a ¯ow is hierarchically scheduledsatis- QoS guarantees and/or minimizes deadline violations in presence ®es (66), Corollary 1 can be used to determine the end-to-end of overbooking [1]; and best effort services may use a fair scheduler delay guarantee. for throughput intensive ¯ow-controlled data applications. The requirements of hierarchical link sharing is speci®ed by The elegance of the above analysis is in its simplicity. This a tree, referred to as link-sharing structure, in which each node, simplicity demonstrates the generality of the analysis of SFQ servers other than possibly leaf nodes, denotes an aggregation of ¯ows [6]. presented in Section 26 . Each node in the tree is referred to as a class and has a weight Hierarchical SFQ scheduler not only achieves the objectives of associated with it. The objective of a mechanism implementing hierarchical link sharing, but can also be used to achieve several hierarchical link sharing is to distribute the bandwidth allocated to other objectives. For example it can be used to achieve: a class among its subclasses fairly, i.e., in proportion to the weights

 Separation of delay and throughput allocation: Observe that [20]. This objective can be achieved by a hierarchical scheduler SFQ does not allocate delay and throughput separately. How- that considers each class, other than the leaf classes, as a virtual ever, it may be desirable to do so for some ¯ows. This can server and uses a fair scheduler to schedule the virtual servers. be achieved by aggregating the ¯ows for which separation of However, as the following example illustrates, the scheduler used delay and throughput is desirable into one class and then us- must allocate bandwidth fairly even over variable rate servers. ing a scheduling algorithm that achieves such a separation for that class. Though conceptually simple, since the throughput Example 3 Consider a link sharing structure in which classes A of a class ¯uctuates over time, the algorithm used must be and B are subclasses of the root class. Let classes C and D be able to achieve the separation over variable rate servers. In subclasses of class A and let each class have weight 1. Initially, Theorem 7, we show that Delay EDD can achieve this over let there be no packets in class B. Hence, class A gets the full a FC server. Since the throughput of a class is ¯uctuation link bandwidth. When class B also becomes active, the bandwidth constrained, Delay EDD can be used to achieve the objective. available to class A (and hence to subclasses C and D) reduces to

50% of the link bandwidth. Consequently, to fairly partition the We ®rst de®ne Delay EDD and then prove its delay guarantee

j f

bandwidth of class A between subclasses C and D, the scheduler for a FC server. On arrival of packet p of ¯ow , Delay f

must be able to allocate bandwidth fairly over variable rate servers. j

D (p

EDD assigns it a deadline, denoted by ), and schedules

f

j

(p ) packets in increasing order of deadline. D is de®ned as: SFQ is the only scheduling algorithm that has been demonstrated f

to allocate bandwidth fairly even over variable rate servers. In what

j j

D (p )= EAT (p ;r )+d (68) f

follows, we present a hierarchical SFQ scheduler for link sharing. f f f

Hierarchical SFQ scheduler is simple. It uses SFQ to schedule

j

f r = r

each class; treating each subclass as a ¯ow. The scheduling of d f

where f is the deadline of ¯ow packets, ,and f

packets occurs recursively: the scheduler for root class schedules j

l = l

f . the subclasses; the scheduler of subclasses in turn schedule their f

subclasses. Since SFQ fairly allocates bandwidth regardless of the 6

Wewould liketowarn the reader of a p otential pitfall. It is

y reach the conclusion that this analysis would

server behavior, this simple recursive hierarchical scheduling en- p ossible that some ma

en if the throughput of a server was mo deled by a Service

sures that bandwidth allocated to a class is fairly allocated between hold ev

er which is similar to a FC server [3]. However, even

the subclasses and thereby achieves the objective of hierarchical Burstiness serv

teed to a owby a Virtual Clo ck server

link sharing. Moreover, in contrast to link sharing mechanism in though the throughput guaran

conforms to Service Burstiness, it can b e shown that Virtual Clo ck

hical link sharing provides no guarantees. [6], it provides bounds on various performance measures: when used for hierarc 11

Theorem 7 If Q is the set of ¯ows serviced by the server To experimentally validate the implementation of the scheduler,

and we initiated three connections with weights 1, 2, and 3. Each of 

 the connection terminated after transmitting 500,000, 4KB packets.

X

(t d )r l

n n

n Figure 5 shows the throughput received by each connection. As

8t>0: max f0; gt (69)

l C

n it demonstrates, when all the three connections were active, they

2Q n received throughput in the ratio 1:2:3. When the connection with

weight 3 terminated, the throughput of the other two connections

C; (C )) and the server is a ( Fluctuation Constrained Delay

EDD server,then the time at which the transmission of packet increased but still remained in the ratio 1:2. Finally, when only j

j one connection remained, it received the full link bandwidth. The

p L (p )

is completed, denoted by EDD ,is: f f throughput of the interface in this experiment was 48Mb/s which

was the same without the SFQ scheduler (i.e., the scheduler did not

 (C )

l

max

j j

L (p )  D(p )+ + (70)

EDD impose any overhead). Observe from Figure 5 that SFQ scheduler

f f C C achieved fair allocation even though the realizable bandwidth of the interface varied over time. This demonstrates the feasibility of employing SFQ for scheduling network interface in operating sys- Due to high computational complexity, it may not be feasible tems where the processingcapacity available for a network interface to employ (69) as the schedulability test. Hence, conditions varies over time. stronger than (69) which have lower computational complex-

ity have been developed in [25]. The theorem holds under

Concluding Remarks the stronger conditions as well. 5 In this paper, we presented Start-time Fair Queuing (SFQ) algo-  Delay shifting: This involves the reduction of the maximum delay of certain ¯ows at the expense of increasing the delay rithm that is computationally ef®cient, achieves fairness regardless of others. To illustrate, let the server be a FC server with of variation in a server capacity, and has the smallest fairness mea-

sure among all known fair scheduling algorithms. We analyzed

C; (C )) parameters ( and let the set of ¯ows served by it be its throughput, single server delay, and end-to-end delay guarantee denoted by Q. For ease of exposition, let the packet length

j for variable rate Fluctuation Constrained (FC) and Exponentially p of each ¯ow be l . Hence, using Theorem 4, for packet : f Bounded Fluctuation (EBF) servers. This is the ®rst analysis of

any fair or real-time scheduling algorithm for such servers. Our

 (C ) l (jQj1)l

j j

L (p + + )  EAT (p ;r )+

SF Q f f

f analysis demonstrated that SFQ is better suited than WFQ for in-

C C C tegrated services networks and it is strictly better than SCFQ and

(71) FQS. To support heterogeneous services and multiple protocol fam-

Q K Q ; :::; Q K Now, let the set be partitioned into sets 1 and ilies in integrated services networks, we presented a hierarchical let them be hierarchically scheduled. Let ¯ow f belong to

SFQ scheduler. We derived performance bounds for ¯ows that are

Q Q n partition i and the rate assigned to partition be denoted

hierarchically scheduled using a conceptually simple and elegant

C f

by n . To determine delay guarantee of ¯ow when it is hierarchically scheduled, observe from (67) that the virtual method. Finally, we presented an implementation of SFQ sched-

uler and demonstrated that it achieves fair allocation of bandwidth. Q

server corresponding to partition i is a FC server with pa-

C ( (C )+Kl)

i In summary, we demonstrated that SFQ: (1) achieves low av-

(C ; + l)

rameters i . Hence, the departure time

C erage as well as maximum delay for low throughput applications

j j

d

p f L (p )

of packet of ¯ow , denoted by SF Q , is given as: f f (e.g., interactive audio, telnet, etc.); (2) provides fairness which is

desirable for VBR video; (3) provides fairness, regardless of vari-

(jQ j+1)l

i

j j

d ation in server capacity, for throughput-intensive, ¯ow-controlled

L (p )  EAT (p ;r )+

SF Q f

f f C

i data applications; (4) enables hierarchical link sharing which is

(C)+ Kl

 desirable for managing heterogeneity; and (5) is computationally (72) + ef®cient. Thus, SFQ meets the requirements of a suitable schedul- C ing algorithm for integrated services networks. From (71) and (72), we conclude that the bound on the de- parture time would be smaller when the ¯ow is hierarchically

scheduled if: References

[1] S. Baruah, G. Koren, B. Mishra, A. Raghunathan,

(jQ j +1)l jQjl (C) (C)+Kl

i

< 0 (73) +

L. Rosier, and D. Shasha. On-line Scheduling in the

C C C C

i

erload. In

Presence of Ov Proceeding of Foundations of

, pages 100{110, 1991.

jQ j +1

C

i

i Computer Science

) < (74)

jQj K C

[2] D.D. Clark, S. Shenker, and L. Zhang. Supp orting Real-

tegrated Services Packet

Hence, by ensuring that (74) is satis®ed for the partitions Time Applications in an In

, pages 14{26, work. In

which require lower delay, delay shifting can be achieved. Net Proceedingsof ACM SIGCOMM

August 1992.

4 Implementation

[3] R.L. Cruz. Service Burstiness and Dynamic Burstiness

ramework. , Measures: A F Journal of High Speed Networks We have implemented SFQ scheduler for a FORE Systems ATM

network interface in Solaris 2.4 as a streams module and driver 2:105{127, 1992.

vin and A. Heyb ey. A Simulation Study of Fair

(see Figure 5). The driver is used to maintain a hierarchical link [4] J. Da

olicy Enforcement. sharing structure, created via ioctl() calls, for a network interface. Queueing and P Computer Communi-

The module, on the other hand, is used to schedule packets. We cation Review , 20(5):23{29, Octob er 1990. have modi®ed the FORE API for opening a connection to include the weight of a connection and its class as parameters. 12

60 Weight =1 Weight =2 Weight =3 Stream Stream Stream 50 Head Head Head

40

30 SFQ SFQ SFQ SFQ Module Module Module Driver Rate(Mb/s) 20

10

ATM Device Driver 0 0 200 400 600 800 1000 Time(s)

Figure 5 : (a) SFQ scheduler implementation (b) Throughput of the connections

[5] A. Demers, S. Keshav, and S. Shenker. Analysis and [16] S.S. Lam and G.G. Xie. Burst Scheduling: Architecture

ulation of a Fair Queueing Algorithm. In and Algorithm for Switching Packet Video. In

Sim Proceed- Proceed-

b er 1989. , April 1995.

ings of ACM SIGCOMM, pages 1{12, Septem ings of INFOCOM'95

[6] S. Floyd and V. Jacobson. Link-sharing and Resource [17] K. Lee. Performance Bounds in Communication Net-

t Mo dels for Packet Networks. works With Variable-Rate Links. In

Managemen IEEE/ACM Proceedings of ACM , pages 126{136, 1995.

Transactions on Networking, 3(4), August 1995. SIGCOMM'95

ked Fair Queueing Scheme [18] A.K. Parekh.

[7] S.J. Golestani. A Self-Clo c A Generalized Processor Sharing Approach to . PhD thesis,

for High Sp eed Applications. In Proceedings of INFO- Flow Control in Integrated Services Networks

Department of Electrical Engineering and Computer

COM'94 , 1994.

Science, MIT, 1992.

[8] P.Goyal, S. S. Lam, and H. M. Vin. Determining

End-to-End Delay Bounds In Heterogeneous Networks. [19] S. Shenker. Making Greed Work in Networks: A Game-

Theoretic Analysis of Switch Service Discipli nes. In

In ACM/Springer-Verlag Multimedia Systems Journal (to ap- , pages 47{57, 1994.

pear), 1996. Also app eared in the Pro ceedings of the Proceedings of ACM SIGCOMM'94

Workshop on Network and Op erating System Supp ort

[20] S. Shenker, L. Zhang, and D. Clark. A Scheduling Ser-

for Digital Audio and Video, April 1995.

vice Mo del

[9] P.Goyal and H. M. Vin. Generalized Guaranteed Rate and a Scheduling Architecture for an Integrated Ser-

Scheduling Algorithms: A Framework. Technical Re- vices Packet Networks. Available via anonymous ftp

p ort TR-95-30, The UniversityofTexas at Austin, July from ftp://ftp.parc.xerox.com/pub/arch n .ps, 1995.

1995.

[21] M. Shreedhar and G. Varghese. EcientFair Queu-

.Goyal and H. M. Vin. Network Algorithms and Pro- ing Using De cit Round Robin. In

[10] P Proceedings of ACM

to col for Multimedia Servers. In

Proceedings of INFO- SIGCOMM'95, pages 231{242, 1995. h 1996.

COM'96 , pages 1371{1379, Marc

[22] D. Stiliadis and A. Varma. Design and Analysis of

[11] P.Goyal, H. M. Vin, and H. Cheng. Start-time Frame-based Fair Queueing: A New Trac Scheduling

air Queuing: AScheduling Algorithm for Inte- Algorithm for Packet Switched Networks. In

F Proceed-

acket Switching Networks. Tech- , 1996.

grated Services P ings of SIGMETRICS'96 (to appear)

nical Rep ort TR-96-02, The University of Texas

[23] H. Zhang and S. Keshav. Comparison of Rate-Based

at Austin, January 1996. Available via URL Service Discipli nes. In

Proceedings of ACM SIGCOMM,

http://www.cs.utexas.edu/users/dmcl.

pages 113{121, August 1991.

[12] P.Goyal, H. M. Vin, C. Shen, and P.J. Shenoy. A

[24] L. Zhang. VirtualClo ck: A New Trac Control Algo-

Reliable, Adaptive Proto col for Video Transp ort. In

acket Switching Networks. In

rithm for P Proceedings of h Proceedings of INFOCOM'96, pages 1080{1090, Marc

ACM SIGCOMM'90 , pages 19{29, August 1990.

1996.

[25] Q. Zheng and K. Shin. On the Abilitiy of Estab-

[13] A. Greenb erg and N. Madras. HowFair is Fair Queuing.

lishing Real-Time Channels in Point-to-PointPacket-

The Journal of ACM, 39(3):568{598, July 1992.

hing Networks.

switc IEEE Transactions on Communica-

[14] M. Grossglauser, S. Keshav, and D. Tse. RCBR: A h 1994.

tions, 42(3):1096{1105, Marc

Simple and Ecient Service for Multiple Time-Scale

rac. In , 1995.

T Proceedings of ACM SIGCOMM'95

[15] S. Keshav. A Control-Theoretic ApproachtoFlow Con- , pages 3{15,

trol. In Proceedings of ACM SIGCOMM'91 1991.