A Prop ortional Share Resource Allo cation Algorithm for
Real-Time, Time-Shared Systems
y z x
Ion Stoica Hussein Ab del-Wahab Kevin Je ay Sanjoy K. Baruah
{ k
Johannes E. Gehrke C. Greg Plaxton
time quantum. In addition, the algorithm provides sup- Abstract
port for dynamic operations, such as processes joining
or leaving the competition, and for both fractional and
We propose and analyze a proportional sharere-
non-uniform time quanta. Asaproof of concept we
sourceallocation algorithm for realizing real-time per-
have implementeda prototype of a CPU scheduler un-
formance in time-sharedoperating systems. Processes
der FreeBSD. The experimental results shows that our
are assigned a weight which determines a shareper-
implementation performs within the theoretical bounds
centage of the resource they aretoreceive. The re-
and hence supports real-time execution in a general
source is then al located in discrete-sized time quanta
purpose operating system.
in such a manner that each process makes progress at
aprecise, uniform rate. Proportional shareallocation
algorithms are of interest because 1 they provide a
1 Intro duction
natural means of seamlessly integrating real- and non-
real-time processing, 2 they areeasy to implement,
Currently there is great interest in providing real-
3 they provide a simple and e ective means of pre-
time execution and communication supp ort in general
cisely control ling the real-time performanceofa pro-
purp ose op erating systems. Indeed, applications such
cess, and 4 they provide a natural mean of policing
as desktop video conferencing, distributed shared vir-
so that processes that use moreofa resource than they
tual environments, and collab oration-supp ort systems
request have no il l-e ect on wel l-behavedprocesses.
require real-time computation and communication ser-
We analyze our algorithm in the context of an ideal-
vices to b e e ective. At present the dominant approach
ized system in which a resource is assumedtobegranted
to providing real-time supp ort in a general purp ose op-
in arbitrarily smal l intervals of time and show that our
erating system is to emb ed a p erio dic thread or pro cess
algorithm guarantees that the di erencebetween the
mo del into an existing op erating system kernel and
service time that a process should receive in the ide-
to use a real-time scheduling algorithm such as rate-
alized system and the service time it actual ly receives
monotonic scheduling to schedule the pro cesses. In
in the real system is optimal ly bounded by the size of a
such as system, ap erio dic and non-real-time activities
are typically scheduled either as background pro cesses
Supp orted by GAANN fellowship. Dept. of CS, Old Domin-
ion Univ., Norfolk, VA 23529-0162 [email protected].
or through the use of a second-level scheduler that is
y
Supp orted by NSF grant CCR 95{9313857. Dept.
executed quasi-p erio dically as a server pro cess by the
of CS, Old Dominion Univ., Norfolk, VA 23529-0162
real-time scheduler.
z
In general, this framework can b e quite e ective
Supp orted by grant from the IBM & Intel corps and NSF
grant CCR 95{10156. Dpt. of CS, Univ. of North Carolina at
at integrating real-time and non-real-time computing.
Chap el Hill, Chap el Hill, NC 27599-3175, [email protected].
However, we observe that this approach has yet to b e
x
Supp orted by NSF under Research Initiation Award CCR{
embraced by the larger op erating systems community.
9596282. Dept. of CS, Univ. of Vermont, Burlignton, VT 05405,
We b elieve that this is due in part to the rigid distinc-
{
tions made b etween real-time and non-real-time. Real- Dpt. of CS, Univ. of Wisconsin-Madison, Madison, WI
53706-1685 [email protected].
time activities are programmed di erently than non-
k
Supp orted by NSF grant CCR{9504145, and the Texas Ad-
real-time ones e.g., as p erio dic tasks and real-time
vanced Research Program under grant No. ARP{93{00365{461.
activities receive hard and fast guarantees of resp onse
Dpt. of CS, Univ. of Texas at Austin, Austin, TX 78712-1188
time if admission control is p erformed. Non-real-time 1
activities are subservient to real-time ones and receive di ers from traditional metho ds of integrating real- and
no p erformance guarantees. While this state of a airs non-real-time pro cesses in that here all pro cesses, real-
is entirely acceptable for many mixes of real-time and and non-real-time, are treated identically. In a prop or-
non-real-time activities, for many it is not. Consider tional share system, real-time and non-real-time pro-
the problem of supp orting real-time video conferencing cesses can b e implemented very much like traditional
on the desktop. This is clearly a real-time application, pro cesses in a time-shared op erating system. Thus in
however, it is not one for which hard and fast guaran- terms of the pro cess mo del, no \sp ecial" supp ort is re-
tees of real-time p erformance are required. For exam- quired to supp ort real-time computing | there is only
ple, it is easy to imagine situations in which one would one typ e of pro cess. Moreover, like manyscheduling
like to explicitly degrade the p erformance of the video- algorithms used in time-shared systems, our algorithm
conference e.g., degrade the rate at which the video allo cates resources in discrete units or quanta which
is displayed so that other activities, such as search- makes it easier to implement than traditional real-time
ing a large mail database for a particular message, can p olicies which are typically event-driven and require
complete quicker. Ideally, a general purp ose op erating the ability to preempt pro cesses at p otentially arbi-
system that supp orts real-time execution should not a trary p oints. In addition, b ecause resources are allo-
priori restrict the basic tenor of p erformance guaran- cated in discrete quanta in a prop ortional share system,
tees that any pro cess is capable of obtaining. one can b etter control and account for the overhead
of the scheduling mechanism as well as tune the sys-
To address this issue weinvestigate an alternate
tem to trade-o ne-grain, real-time control for low
approach to realizing real-time p erformance in time
scheduling and system overhead. Finally, prop ortional
shared op erating systems, namely the use of propor-
share algorithms provide a natural means of uniformly
tional shareresourceallocation algorithms for pro ces-
degrading system p erformance in overload situations.
sor scheduling. In a prop ortional share allo cation sys-
In this pap er we present a prop ortional share
tem, every pro cess in the system is guaranteed to make
scheduling algorithm and demonstrate that it can b e
progress at a well-de ned, uniform rate. Sp eci cally,
used to ensure predictable real-time resp onse to all
each pro cess is assigned a share of the pro cessor | a
pro cesses. Section 2 presents our pro cess mo del and
p ercentage of the pro cessor's total capacity. If a pro-
formally intro duces the concepts of a share and the
cess's share of the pro cessor is s then in anyinterval of
requirement for predictable execution with resp ect to
length t, the pro cess is guaranteed to receives t
a share. Section 3 discusses related work in schedul-
units of pro cessor time where 0 , for some con-
ing. Section 4 presents a deadline-based, virtual-time
stant . In a prop ortional share system, resource al-
scheduling algorithm that is used to ensure pro cesses
lo cation is exible and the share received by a pro cess
receive their requested share of the pro cessor. Section
can b e changed dynamically. In this manner a pro cess's
5intro duces a key technical problem to b e solved in
real-time rate of progress can b e explicitly controlled.
the course of applying our algorithm, namely that of
Prop ortional share resource allo cation algorithms
dealing with the dynamic creation and destruction of
lie b etween traditional general purp ose and real-time
pro cesses. Section 6 outlines the pro of of correctness
scheduling algorithms. On the one hand, prop ortional
of our algorithm and Section 7 presents some exp eri-
share resource allo cation is a variant of the pureproces-
mental results using our prop ortional share system in
sor sharing scheduling discipline, in which during each
the FreeBSD op erating system, and demonstrates how
time unit each pro cess receives 1=n of the pro cessor's
\traditional" real-time pro cesses such as p erio dic tasks
capacity, where n is the numb er of active pro cesses.
can b e realized in a prop ortional share system.
Thus each pro cess app ears as it is making uniform
progress on a virtual pro cessor that has 1=n of the ca-
pacity of the physical pro cessor. On the other hand,
traditional real-time scheduling disciplines for p erio dic
2 The Mo del
tasks can b e viewed as coarse approximations of pro-
p ortional share allo cation. For example, if a p erio dic
We consider an op erating system to consist of a set
task requires c units of pro cessor time every p time
of pro cesses real-time and non-real-time that com-
units, then a rate-monotonic scheduler guarantees that
p ete for a time shared resource such as a CPU or a
for all k 0, in eachinterval [kp; k +1p], the p erio dic
communications channel. Toavoid confusion with ter-
task will indeed receive a share of the pro cessor equal
minology used in the exp erimental section of the pa-
to c=p. Sp eci cally, in each of the ab oveintervals, the
per we use the term client to refer to computational
pro cess will receive pc=p = c units of pro cessor time.
entities i.e., pro cesses. A client is said to b e active
Our prop ortional share resource allo cation p olicy while it is comp eting for the resource, and passive oth-
erwise. We assume that the resource is allo cated in started they must complete in the same time quantum.
time quanta of size at most q .At the b eginning of each For example, once a communication switch b egins to
time quantum a client is selected to use the resource. send a packet of one session, it cannot serveany other
Once the client acquires the resource, it may use it session until the entire packet is sent. As another ex-
either for the entire time quantum, or it may release ample, a pro cess cannot b e preempted while it is in
it b efore the time quantum expires. Although simple, a critical section. Thus, in the rst example we can
this mo del captures the basic mechanisms traditionally cho ose the size of a quantum q as b eing the time re-
used for sharing common resources, such as pro cessor quired to send a packet of maximum length, while in
and communication bandwidth. For example, in many the second example we can cho ose q as b eing the max-
preemptive op erating systems e.g., UNIX, Windows- imum duration of a critical section.
NT, the CPU scheduler allo cates the pro cessing time Due to quantization, in a system in which the re-
among comp eting pro cesses in the same fashion: a pro- source is allo cated in discrete time quanta it is not
cess uses the CPU until its time quantum expires or an- p ossible for a client to always receive exactly the ser-
other pro cess with a higher priority b ecomes active, or vice time it is entitled to. The di erence b etween the
it mayvoluntarily release the CPU while it is waiting service time that a client should receive at a time t, and
for an event to o ccur e.g., an I/O op eration to com- the service time it actual ly receives is called service time
i
plete. As another example, consider a communication lag or simply lag. Let t b e the time at which client
0
i
switch that multiplexes a set of incoming sessions on ;t b e the service time i b ecomes active, and let s t
i
0
i
a packet-by-packet basis. Since usually the transmis- the client receives in the interval [t ;t] here, we as-
0
i
sion of a packet cannot b e preempted, we take a time sume that client i is active in the entire interval [t ;t].
0
quantum to b e the time required to send a packet on Then the lag of client i at time t is
the output link. Thus, in this case, the size q of a time
i i
;t: 3 ;t s t lag t=S t
i i i
0 0
quantum represents the time required to send a packet
of maximum length.
Since the lag quanti es the allo cation accuracy,we
Further, we asso ciate a weight with each client that
use it as the main parameter in characterizing our pro-
determines the relative share of the resource that it
p ortional share algorithm. In particular, in Section
should receive. Let w denote the weight asso ciated to
i
6we show that our prop ortional share algorithm 1
client i, and let At b e the set of all clients activeat
provides b ounded lag for all clients, and that 2 this
time t.We de ne the instantaneous share f tofan
i
b ound is optimal in the sense that it is not p ossible to
active client i at time t as
develop an algorithm that a ords b etter b ounds. To-
w
i
gether, these prop erties indicate that our algorithm will
P
f t= : 1
i
w
j
j 2At
provide real-time resp onse guarantees to clients and
that with resp ect to the class of prop ortional share al-
If the client's share remains constant during a time in-
gorithms, these guarantees are the b est p ossible.
terval [t; t +t], then it is entitled to use the resource
for f tt time units. In general, when the client share
i
varies over time, the service time that client i should
receiveinaperfect fair system, while b eing active dur-
3 Related Work
ing a time interval [t ;t ], is
0 1
Z
t Tijdeman was one of the rst to formulate and an-
1
S t ;t = f d 2
alyze the prop ortional share allo cation problem [15].
i 0 1 i
t
0
The original problem, an abstraction of diplomatic pro-
time units. The ab ove equation corresp onds to an ideal
to cols, was stated in terms of selecting a union chair-
uid- ow system in which the resource can b e granted
man every year, such that the accumulated number of
1
in arbitrarily small intervals of time . Unfortunately,
chairmen from each state of the union to b e prop or-
in many practical situations this is not p ossible. One of
tional to its weight. As shown in [2], Tijdeman's results
the reasons is the overhead intro duced by the schedul-
can b e easily applied to solve the prop ortional share al-
ing algorithm itself and the overhead in switching from
lo cation problem. In the general setting, the resource is
one client to another: taking time quanta of the same
allo cated in xed time quanta, while the clients' shares
order of magnitude as these overheads could drasti-
maychange at the b eginning of every time quantum.
cally reduce the resource utilization. Another reason is
In this way dynamic op eration can b e easily accommo-
that some op erations cannot b e interrupted, i.e., once
dated. Tijdeman proved that if the clients' shares are
1
known in advance there exists a schedule with the lag
A similar mo del was used by Demers et al [4] in studying
b ound less or equal to 1 1=2n 2, where n represents fair-queuing algorithms in communication networks.
the total numb er of clients. Note that when n !1 stride scheduling [17, 18], which can b e viewed as a
the lag b ound approaches unity. Although he gives cross-application of fair queueing to the domain of pro-
an optimal algorithm for the static case i.e., when the cessor scheduling. Stride scheduling relies on the con-
numb er of clients do es not change over time, he do es cept of global pass which is similar to virtual time to
not giveany explicit algorithm for the dynamic case. measure the work progress in the system. Each client
Furthermore, we note that, even in the general setting, has an asso ciated stride that is inversely prop ortional
the problem formulation do es not accommo date frac- to its weight, and a pass that measures the progress of
2
tional or non-uniform quanta. that client. The algorithm allo cates a time quantum
Recently, the prop ortional share allo cation problem to the client with the lowest pass, which is similar to
has received a great deal of attention in the context of the PFQ p olicy.However, by grouping the clients in a
op erating systems and communication networks. Our binary tree, and recursively applying the basic stride
algorithm is closely related to weighted fair queueing scheduling algorithm at each level, the lag is reduced to
algorithms previously develop ed for bandwidth allo ca- O log n. Moreover, stride scheduling provides supp ort
tion in communication networks [4,5, 10], and general for b oth uniform and non-uniform quanta.
purp ose prop ortional share algorithms, such as stride
Goyal, Guo and Vin have prop osed a new algorithm,
scheduling [17,18]. Demers, Keshav, and Shenker were
called Start-time Fair Queueing SFQ, for hierarchi-
the rst to apply the notion of fairness to a uid- ow
cally partitioning of a CPU among various application
system that mo dels an idealized communication switch
classes [6]. While this algorithm supp orts b oth uni-
in which sessions are serviced in arbitrarily small incre-
form and non-uniform quanta, the delay b ound and
ments [4]. Since in practice a packet transmission can-
implicitly the lag increases linearly with the number
not b e preempted, the authors prop osed an algorithm,
of clients. However, we note that when the number of
called Packet Fair Queueing PFQ, in which the pack-
clients is small, in terms of delay, this algorithm can b e
ets are serviced in the order in which they would n-
sup erior to classical fair queueing algorithms.
ish in the corresp onding uid- ow system i.e., in the
In contrast to the ab ove algorithms, by making use
increasing order of their virtual deadlines. By using
of b oth virtual eligible times and virtual deadlines,
the concept of virtual time, previously intro duced by
the algorithm we develop herein achieves constant lag
Zhang [19], Parekh and Gallager have analyzed PFQ
b ounds, while providing full supp ort for dynamic op-
when the input trac stream conforms to the leaky-
erations. We note that two similar algorithms were
bucket constraints [10,11]. In particular, they have
indep endently develop ed in parallel to our original
shown that no packet is serviced T latter than
max
work [13] by Bennett and Zhang in the context of
it would have b een serviced in the uid- ow system,
allo cating bandwidth in communication networks [3],
where T represents the time to transmit a packet of
max
and by Baruah, Gehrke and Plaxton in the context of
maximum size. However, as shown in [3,13, 18], the lag
pro cessor scheduling for xed time quanta [2]. In addi-
b ound can b e as large as O n, where n represents the
tion to intro ducing the concept of virtual eligible time
numb er of active sessions clients in the system. More-
whichwas also indep endently intro duced in [2] and [3]
over, in PFQ the virtual time is up dated when a client
our work makes several unique key contributions.
joins or leaves the comp etition in the ideal system, and
First, by \decoupling" the request size from the size
not in the real one. This requires one to maintain an
of a time quantum we generalize the previous known
additional event queue, which makes the implementa-
theoretical results [10]. Moreover, our analysis can b e
tion complex and inecient. As a solution, Golestani
easily extended to preemptive systems, as well. For ex-
has prop osed a new algorithm, called Self-Clo cked Fair
ample, we can derive lag b ounds for a fully preemptive
Queueing SCFQ, in which the virtual time is up dated
system, by simply taking time quanta to b e arbitrarily
when the client joins/leaves the comp etition in the real
small. Similarly,by taking the size of a time quantum
system, and not in the idealized one [5]. Although this
to b e the maximum duration of a critical region, we
scheme can b e more eciently implemented, this do es
can derive lag b ounds for a preemptive system with
not come for free: the lag b ounds increase to within a
critical regions. Finally, this decoupling gives a client
factor of two of the ones guaranteed byPFQ.
p ossibility of trading b etween allo cation accuracy and
Recently,Waldspurger and Weihl have develop ed
scheduling overhead see Section 6.
a new prop ortional share allo cation algorithm, called
Second, we address the problem of a client leaving
2
The di erence b etween fractional and non-uniform quanta
the comp etition before using the entire service time it
is that while in the rst case the fraction from the time quan-
has requested. This is an imp ortant extension since in
tum that the client will actually use is assumed to b e known
an op erating system it is typically not p ossible to pre-
in advance, in the non-uniform quanta case this fraction is not
known. dict exactly howmuch service time a client will use for
the next request. We note that this problem do es not issuing 60 requests with a duration of one second each,
o ccur and consequently has not b een addressed in the or by issuing 600 requests with a duration of 100 ms
context of network bandwidth allo cation; in this case, each. As we will show in Section 6, shorter requests
the length of a message and therefore its transmission guarantee b etter allo cation accuracy, while longer re-
time is assumed to b e known up on its arrival. The only quests decrease system overhead. This a ords a client
previous known algorithms that address this problem the p ossibility of trading b etween allo cation accuracy
are lottery and stride scheduling [17,18]. However, and scheduling overhead.
the lag b ounds guaranteed by stride scheduling are as
We formulate our scheduling algorithm in terms of
large as O n, where n represents the numb er of active
the b ehavior of an ideal, uid- ow system that exe-
clients b eing a randomized algorithm, lottery do es not
cutes clients in a virtual-time time domain [19,10].
guarantee tight b ounds. In comparison, our algorithm
Abstractly, the virtual uid- ow system executes each
describ ed next guarantees optimal lag b ounds of one
client for w real-time time units during each virtual-
i
time quantum.
time time unit. More concretely, virtual-time is de ned
Third, we prop ose a new approximation scheme for to b e the following function of real-time
Z
maintaining virtual time, in which up date op erations
t
1
are p erformed when the events e.g., client leaving,
P
d : 4 V t=
w
j
0
j 2A
joining o ccur in the real system, and not in the ideal
one. This simpli es the implementation and eliminates
Note that virtual-time increases at a rate inversely
the need to keep an event queue. It is worth mention-
prop ortional to the sum of the weights of all ac-
ing that unlike other previous approximations [5], ours
tive clients. That is, when the comp etition increases
guarantees optimal lag b ounds.
virtual-time slows down, while when the comp etition
Besides the class of fair queuing algorithms, a sig-
decreases it accelerates. Intuitively, the ow of virtual-
ni cantnumb er of other prop ortional share algorithms
time changes to \accommo date" all active clients in one
have recently b een develop ed [1,9,12, 16]. Although
virtual-time time unit. That is, the size of a virtual-
none of them guarantees constant lag b ounds in a
time unit is mo di ed such that in the corresp onding
dynamic system, we note that the PD algorithm of
uid- ow system each active client i receives w real-
i
Baruah, Gehrke, and Plaxton [1]achieves constant lag
time units during one virtual-time time unit. For ex-
b ounds in a static system.
ample, consider two clients with weights w = 2 and
1
The idea of applying fair queueing algorithms to pro-
w = 3. Then the rate at which virtual-time increases
2
cessor scheduling was rst suggested byParekh in [11].
1
relative to real-time is =0:2, and therefore a
w +w
Waldspurger and Weihl were the rst to actually de- 1 2
virtual-time time unit equals ve real-time units. Thus,
velop and implement such an algorithm stride schedul-
in each virtual-time time unit the two clients should re-
3
ing for pro cessor scheduling [17, 18]. Finally, to our
ceive w = 2, and w = 3 time units.
1 2
b est knowledge we are the rst to implement and to
Ideally wewould like for our prop ortional share al-
test a prop ortional share scheduler which guarantees
gorithm to approach the b ehavior of the virtual uid-
constant lag b ounds.
ow system. Thus, since in the uid- ow system, at
all p oints in time a clientisbestcharacterized by the
service time it has received up to the current time, to
4 The EEVDF Algorithm
compare our approach with the ideal, wemust b e able
to compute the service time that a client should receive
In order to obtain access to the resource, a client
in the uid- ow system. By combining Eq. 1 and 2
must issue a request in which it sp eci es the duration
we can express the service time that an active client i
of the service time it needs. Once a client's request is
should receive in the interval [t ;t as
1 2
Z
ful lled, it may either issue a new request or b ecome
t
2
1
P
passive. For uniformity, throughout this pap er we as-
S t ;t =w d : 5
i 1 2 i
w
j
t
j 2A
1
sume that the client is the sole initiator of the requests
For exibilitywe allow the requests to haveany dura-
Once the integral in the ab ove equation is computed,
tion. Note that a clientmay request the same amount
we can easily determine the service time that any client
of service time by generating either fewer longer re-
i should receive during the interval [t ;t , by simply
1 2
quests, or many shorter ones. For example, a client
multiplying the client's weightby the integral's value.
may ask for one minute of computation time either by
Next, from Eq. 5 and 4 it follows that
3
We note that they have also applied stride scheduling to
S t ;t =V t V t w : 6 other shared resources, such as critical section lo ck accesses.
i 1 2 2 1 i
To b etter interpret the ab ove equation consider a computed exactly from Eq. 4 and 7, since wedonot
much simpler mo del in which the numb er of active knowhow the slop e of the virtual-time mapping will
clients is constant and the sum of their weights is vary in the future it changes dynamically while clients
P
one w = 1, i.e., the share of a client i is join and leave the comp etition. Therefore we will for-
i
i2A
f = w . Then, in this mo del, the service time that mulate our algorithm in terms of virtual eligible times
i i
client i should receive during an interval [t ;t is sim- and deadlines and not of the real times. With this,
1 2
ply S t ;t = t t w . Next, notice that by re- the Earliest Eligible Virtual Deadline First EEVDF
i 1 2 2 1 i
placing the real times t and t with the corresp onding algorithm can b e simply stated as follows:
1 2
virtual-times V t and V t we arrive at Eq. 6.
1 2
EEVDF Algorithm. A new quantum is al locatedto
Thus, Eq. 6 can b e viewed as a generalization for
the client which has the eligible request with the earliest
computing the service time S t ;t inadynamic sys-
i 1 2
virtual dead line.
tem | one in which clients are dynamically joining and
leaving the comp etition.
Since EEVDF is formulated in terms of virtual-
Our scheduling algorithms uses measurements made
times, in the remaining of this pap er we use ve and
in the virtual-time domain to makescheduling deci-
vd to denote the virtual eligible time and virtual dead-
sions. For each client's request we de ne an eligible
line resp ectively, whenever the corresp onding real el-
time e and a dead line d which represent the starting
k
igible time and the deadline are not given. Let r
and nishing time resp ectively for the request's service
th
denote the length of the k request made by client i,
i
in the corresp onding uid- ow system. Let t b e the
k k
0
and let ve and vd denote the virtual eligible time
time at which client i b ecomes active, and let t b e the
and the virtual deadline asso ciated to this request. If
time at which it initiates a new request. Then, a re-
each client's request uses the entire service time it has
quest b ecomes eligible at a time e when the service
requested, then by using Eq. 7 and 8 we obtain the
time that the client should receive in the corresp ond-
following recurrence which computes b oth the virtual
0
ing uid- ow system, S t ;e, equals the service time
i
eligible time and the virtual deadline of each request:
that the client has already received in the real system,
1 i
0 i i
; 9 ve = V t
s t ;t, i.e., S t ;e= s t ;t. Note that if at time
0
i i i
0 0
k
t client i has received more service time than it was
r
k k
; 10 vd = ve +
supp osed to receive i.e., lag t < 0, then it will b e
i
w
i
the case that e> tand hence the client should wait
k +1 k
ve = vd : 11
until time e b efore the new request b ecomes eligible.
Next, we consider the more general case in which
In this way a client that has received more service time
the client do es not use the entire service time it has
than its share is \slowed down", while giving the other
requested. Since a client never receives more service
active clients the opp ortunity to \catch up". On the
time than requested, we need to consider only the case
other hand, if at time t client i has received less service
when the client uses the resource for less time than
time than it was supp osed to receive i.e., its lag is p os-
k
requested. Let u denote the service time that client
itive, then it will b e the case that e i actually receives during its k -th request. Then the the new request is immediately eligible at time t.By only change in Eq. 9{11, will b e in computing the using Eq. 6 the virtual eligible time V eis eligible time of a new request. Sp eci cally, Eq. 11 is i s t ;t i 0 i replaced by V e=V t + : 7 0 w k i u k +1 k ve = ve + : 12 Similarly, the dead line of the request is chosen such w i that the service time that the client should receivebe- tween the eligible time e and the deadline d equals the Example. To x the ideas, let us take a simple exam- service time of the new request, i.e., S e; d= r , where i ple see Figure 1. Consider two clients with weights r represents the length of the new request. By using w = w = 2 that issue requests with lengths r =2, 1 2 1 again Eq. 6, we derive the virtual deadline V das and r = 1, resp ectively. We assume that the time 2 r quantum is of unit size q = 1 and that client1isthe : 8 V d=V e+ w rst one whichenters comp etition at real time t =0. i 0 Notice that although Eq. 7 and 8 give us the vir- Thus, according to Eq. 9 and 10 the virtual eligible tual eligible time V e and the virtual deadline V d, time for the rst request of client1isve = 0, while they do not necessarily give us the values of the real its virtual deadline is vd = 1. Being the single client times e and d!To see why, consider the case in which that has an outstanding eligible request, client 1 re- e is larger than the current time t. Then e cannot b e ceives the rst quantum. At real time t = 1, client 2enters the comp etition. Since during the interval client 1 [0; 1 the only active client in the system is client1, (0, 1) (1, 2) (2, 3) from Eq. 4, the value of virtual-time at real-time 1 R 1 1 is V 1 = d =0:5. Thus, virtual-time increases 0 w client 2 1 ay, in an ideal at half the rate of real-time. In this w (0.5, 1) (1, 1.5) (1.5, 2) (2, 2.5) system, during every virtual-time time unit, client1 0 0.5 1 1.5 2 virtual time receives w = 2 real time units. Next, after the second 1 tenters the comp etition, the rate of virtual-time clien 0 123456 7 time 1 =0:25. Hence, in the slows down further to w +w 1 2 ideal system, during one virtual-time time unit, each Figure 1. An example of EEVDF scheduling in- client will receive 2 real time units since w = w = 2. 1 2 volving two clients with equal weights w = w = 1 2 Next, assume that client 2 issues its rst request just 2. Al l the requests generated by client 1 have b efore the second quantum is allo cated. Then at real length 2, and al l of the requests generated by client time t = 1 there are two p ending requests: one of client 2 have length 1. Client 1 becomes active at time 1 with the virtual deadline 1 whichwaits for another 0 virtual-time 0, while client 2 becomes active time quantum to ful ll its request, and one of client at time 1 virtual-time 0:5. The vertical arrows 2 which has the same virtual deadline, i.e., 1. In this represent the times when the requests are initi- situation we arbitrarily break the tie in favor of client ated the pair associatedtoeach arrow represents 2, which therefore receives the second quantum. Since the virtual eligible time and the virtual dead line this quantum ful lls the current request of client2, of the corresponding request. The shadedregions client 2 issues immediately, at real time 3 virtual-time in the background show the the durations of ser- 1, a new request. From Eq. 11 and 10 the vir- vicing successive requests of the same client in tual eligible time and the virtual deadline of the new the uid- ow system. request are 1 and 1.5, resp ectively.Thus, at real time t = 2 virtual-time 0.75 the single eligible request is main issues of implementing dynamic op erations, rst the one of client 1, which therefore receives the next recall that the client's lag represents see Eq. 3 quantum. Further, at real time t = 3 virtual-time 1 the di erence b etween the service time that the client there are again two eligible requests: the one of client should receive and the service time it has actually re- 2 that has just b ecome eligible, and the new request ceived. An imp ortant prop erty of the EEVDF algo- issued by client1. Since the virtual deadline of the rithm is that at any time the sum of the lags of all active second client's request 1.5 is earlier than the one of clients is zero see Lemma 2 in [14]. Thus, if a client the rst client 2, the fourth quantum is allo cated to leaves the comp etition with a negative lag i.e., after client2.Further, Figure 1 shows how the next four receiving more service time than it was supp osed to, quanta are allo cated. the remaining clients should have received less service Note the uniform progress of the two clients in Fig- time than they were entitled to. In short, a gain for one ure 1. Although the uniformity is p erfect in this con- client translates into a loss for the other active clients. trived example, we show in Section 6 that in fact the Similarly, when a client with p ositive lag leaves, this deviation of a client's progress from the p erfectly uni- translates into a gain for the remaining clients. The form rate i.e., its rate of progress in the ideal uid- ow main question here is how to distribute this loss/gain system is b ounded and that these b ounds are the b est among the remaining clients. In [13]we answered this p ossible. This shows that for a given quanta q , the question by distributing it in proportion to the clients' EEVDF algorithm provides the b est p ossible guaran- weights. In the remaining of this section we show that tees of real-time progress. the same answer is obtained by approaching the prob- lem from a di erent angle. The basic observation is that this problem do es not 5 Fairness in Dynamic Systems o ccur as long as a client with zero lag leaves the comp e- tition, b ecause there is nothing to distribute. Since in In this section we address the issue of fairness in the corresp onding uid- ow system the lag of any client dynamic systems. Throughout this pap er, we assume is always zero, a simple solution would b e to consider that a dynamic system provides supp ort for client join- 4 the time when the client leaves to b e the time when ing and leaving the comp etition .To understand the it leaves in the corresp onding uid- ow system, and 4 Note that with these two op erations, changing a client's join op eration [13]. weight can b e easily implemented as a leave followed by a re- not in the real system. Unfortunately, this solution the comp etition while having a p ositive lag. Then the has two ma jor drawbacks. First, in many situations, client will b e simply delayed, while continuing to re- suchasscheduling incoming packets in a high sp eed ceive service time, until its lag b ecomes zero, i.e., until 0 networking switch, maintaining the events in the uid- time t .Ifwe assume that the slop e of virtual-time 1 ow system is to o exp ensive in practice [5]. Second with resp ect to real-time do es not change b etween t 1 0 0 and more imp ortant, this solution assumes implicitly and t , then from Eq. 5 and 6 we obtain S t ;t = 1 1 1 1 0 0 that the service time that a client will use is known V t V t w = w t t =w + w + w . Fur- 1 1 1 1 1 2 3 1 1 in advance. While this is generally true in the case ther, by using Eq. 3 and 5, and the fact that 0 0 of the communication switch, where the length of a s t ;t =t t we can compute the virtual-time 1 1 1 1 1 0 message and consequently its service time is assumed at t as 1 lag t 1 1 0 to b e known when the packet arrives, in the pro ces- 13 V t = V t + 1 1 w + w 2 3 sor case this is not always p ossible. To see why this is a p otential problem, consider the previous example The main drawback of this approach is that client1 see Figure 1 in which the rst client leaves the com- 0 continues to receive service time b etween t and t , al- 1 1 p etition after using only 1.1 time-units of the second though it do es not need it since it has already nished request, i.e., at time 6.1 in the real system and the cor- using the resource! Thus, this service time will b e resp onding virtual time 1.775. However, according to wasted, which is unacceptable. Our solution is to sim- Eq. 12, in the ideal system the client should com- ply let any client with p ositive lag leave immediately, plete its service and therefore leave the comp etition at while correctly up dating the value of virtual-time to 2 2 virtual time 1.55 = ve + u =w , which in our ex- 1 account for the change see Figure 2b. In this way ample corresp onds to the real time 5.2. Unfortunately, the virtual-times corresp onding to the times when a since at this p ointwe do not know for how long client client decides to leave and when it actually leaves are 1 will continue to use the resource we know only that the same in b oth systems. More precisely consider a it has made a request for two time-units of execution client k leaving the comp etition at time t with a p os- k and has actually executed for only one time unit we itive lag i.e., lag t > 0. Then, by generalizing Eq. k k cannot up date the virtual time correctly! 13, the value of virtual-time is up dated as follows Next, we present our solution to this problem for lag t k k P V t =V t + ; 14 k k a dynamic system in which the following two reason- w j j 2At nfk g k able restrictions hold: 1 all the clients that join the comp etition are assumed to have zero lag, and 2 a where At represents the set of all active clients just k client has to leave the comp etition as so on as it is n- before client k leaves. For example, in Figure 2b ished using the resource i.e., when a client terminates At = f1; 2; 3g.Thus, At nfk g represents the set 1 k it is not allowed to remain in the system. We con- of all active clients just after client k leaves the comp e- sider two cases dep ending on whether the client's lag tition. Further note that according to Eq. 3 the lag is negative or p ositive. From Eq. 3, 4, 6 it fol- of any remaining client i 2At nfk g changes to k lows that the client's lag increases as long as the client lag t k k P : 15 lag t = lag t +w receives service time, and decreases otherwise. Thus, i k i k i w j j 2At nfk g k when a client with negative lag wants to leave, we can simply delay that client without allo cating any service Thus the lag of client i is proportional ly distributed time to it until its lag b ecomes zero. This can b e sim- among the remaining clients, which is consistent with ply accomplished by generating a dummy request of our interpretation of fairness in dynamic systems, i.e., zero length. However, note that since a request cannot any gain or loss is prop ortionally distributed among b e pro cessed b efore it b ecomes eligible, and since the the remaining clients. virtual eligible time of the dummy request is equal to Since virtual-time is up dated only when the events its deadline see Eq. 8, this request cannot b e pro- actually o ccur in the real system as opp osed to when cessed earlier than its deadline. In this way,wehave they o ccur in the ideal one, the EEVDF algorithm can reduced the rst case to the second one, in which the b e easily and eciently implemented. Even in a sys- client leaving the comp etition has a p ositive lag. Our tem in which the service times are known in advance, solution is based on the same idea as b efore: the client it is theoretically p ossible to up date virtual-time as in is delayed until its lag b ecomes zero. the ideal system, however, in practice this is hard to For clarity, consider the example in Figure 2a, achieve. Mainly, this is b ecause we need to implement where three clients b ecome simultaneously active. an event queue which has to balance the trade-o b e- Next, supp ose that at time t , client 1 decides to leave tween timer granularity and scheduling overhead. As 1 virtual time virtual time t 1 t’1 t 2 t’2 time t 1 t 2 + (t’1 − t1 ) time client 1 client 1 client 2 client 2 client 3 client 3 (a) (b) Figure 2. Three clients become active at the same time, after which client 1 and client 2, both with positive lags, leave the competition. In a clients are al lowedtoleave only after their lags become zero; in b clients are al lowedtoleave immediately. The shadedregions in a represents the time intervals during which the system al locates service time to the clients until their lags become zero. In both cases the virtual-time just before a client wants to leave and just after it has actual ly left areequal. 6 Fairness Results wehave shown in [13] all the basic op eration required to implement the EEVDF algorithm, i.e., inserting and deleting a request, and nding the eligible request with The prop ortional share scheduling algorithm we the earliest deadline can b e implemented in O log n, have prop osed executes clients at a precise rate. One where n represents the numb er of active clients. can determine if a client has a desired real-time re- sp onse time prop ertyby simply computing the amount of service time it is to receive during the intervals of time of interest using either Eq. 5 or 6. However, b ecause service time is allo cated in discrete quanta, this We note that in the worst case it may b e p ossible computation is o by the client's lag. Thus, in order that all the dummy requests o ccur at the same time. In to use our prop ortional share algorithm for real-time this situation, the scheduler should p erform O n dele- computing, wemust demonstrate that the lag incurred tions b efore the next \real" request is serviced. Al- byany client is b ounded at all times. This is done next. though, this is a p otential problem in the case of a The problem is stated as that of demonstrating that communication switch, where the selection of the next the EEVDF algorithm is fair in the sense that all clients packet to b e serviced is assumed to b e done during make progress according to their weights. By demon- the transmission of the current packet, it do es not sig- strating that the lag of each client is b ounded at all ni cantly increase the complexity of CPU scheduling. times we conclude that our algorithm is fair. Here we This is mainly b ecause a pro cessor, b esides servicing sketch the argument that lags are b ounded. The com- the active clients pro cesses, also executes the schedul- plete pro of of each result are given in the extended ing algorithm, as well as other related op erating system version of this pap er [14]. functions e.g., starting a new pro cess, or terminating Theorem 1 shows that any request is ful lled no an existing one. Consequently, in a complete mo del we latter than q time units after its deadline in the cor- need to account for these overheads anyway. A simple resp onding uid- ow system, where q represents the solution would b e to charge each pro cess for the related maximum size of a time quantum. Theorem 2 gives overheads. For example, the time to select the next tight b ounds for the lag of any client in a system in pro cess to receive a time quantum should b e charged which all the clients that join and leave the comp eti- to that client. In this way, from the pro cessor's p er- tion have zero lags. Similarly, Theorem 3 gives tight sp ective, a dummy request is no longer a 0-duration re- b ounds for the client's lag in a system in which a client quest since it should account at least for the scheduling with p ositive lag may leaveatany time. Finally,asa overhead and eventually for the pro cess termination. corollary we show that in a dynamic system in which In the current mo del we ignore these overheads, which, no client request is larger than the maximum size q of as the exp erimental results suggest see Section 7, is a time quantum the lag of any client is b ounded by an acceptable approximation for many practical situa- q . Moreover, this result is optimal with resp ect to any tions. However, we plan to address this asp ect in the prop ortional share algorithm. We b egin by de ning future. formally the systems we are analyzing. request to b e no greater than several tens of millisec- onds, due to the delay constraints. Theorem 2 shows De nition 1 A steady system S-system for short that EEVDF can accommo date clients with di erent is a system in which the lag of any client that joins, or requirements, while guaranteeing tight b ounds for the leaves the competition is zero. lag of each client, which are indep endent of the other clients. As the next theorem shows this is not true for The next de nition is a formal characterization of the PS-systems. In this case the lag of a client can b e as system describ ed in Section 5 see Figure 2b. large as the maximum request issued by any clientin the system. De nition 2 A pseudo-steady system PS-system for short is a system in which the lag of any client Theorem 3 Let r be the size of the current request that joins is zero, and the lag of any client that leaves issued by client k in a PS-system with quantum q . Then is positive. Moreover, when a client with positive lag the lag of client k at any time t while the request is leaves, the value of virtual-time is updatedaccording to pending is bounded as fol lows Eq. 14.