Computer Networks 31Ž. 1999 169±189

Quality-of-service in packet networks: basic mechanisms and directions R. Guerin ), V. Peris IBM, T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, USA

Abstract

In this paper, we review the basic mechanisms used in packet networks to support Quality-of-ServiceŽ. QoS guarantees. We outline the various approaches that have been proposed, and discuss some of the trade-offs they involve. Specifically, the paper starts by introducing the different and buffer management mechanisms that can be used to provide service differentiation in packet networks. The aim is not to provide an exhaustive review of existing mechanisms, but instead to give the reader a perspective on the range of options available and the associated trade-off between performance, functionality, and complexity. This is then followed by a discussion on the use of such mechanisms to provide specific end-to-end performance guarantees. The emphasis of this second part is on the need for adapting mechanisms to the different environments where they are to be deployed. In particular, fine grain buffer management and scheduling mechanisms may be neither necessary nor cost effective in high speed backbones, where ``aggregate'' solutions are more appropriate. The paper discusses issues and possible approaches to allow coexistence of different mechanisms in delivering end-to-end guarantees. q 1999 Elsevier Science B.V. All rights reserved.

Keywords: QoS mechanisms; Real-time traffic; Scheduling; Buffer management; Aggregation

1. Introduction to a single, best-effort level of service which is the rule in today's . The Internet is a continuously and rapidly evolv- Providing different levels of service in the net- ing entity, that is being used by an increasingly large work, however, introduces a number of new require- number of applications with diverse requirements. IP ments, that can be classified along two major axes: is one of many new applications driving control path and data path. New data path mecha- the need for substantial changes in the current Inter- nisms are the means through which different services net infrastructure. Specifically, the emergence of ap- will be enforced. They are responsible for classifying plications with very different , loss, or and mapping user packets to their intended service delay requirements is calling for a network capable class, and controlling the amount of network re- of supporting different levels of services, as opposed sources that each service class can consume. New control path mechanisms are needed to allow the users and the network to agree on service definitions, ) Corresponding author. Current address: Department of Elec- trical Engineering, Rm. 367 GRW, 200 South 33rd Street, identify which users are entitled to a given service, Philadelphia, PA 19104-6390, USA, E-mail: and let the network appropriately allocate resources [email protected]. to each service.

1389-1286r99r$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S0169-7552Ž. 98 00261-X 170 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189

Data path mechanisms are the basic building gated service guarantees to the set of flows mapped blocks on which network QoS is built. They imple- into the same class. Here, the trade-off is again ment the actions that the network needs to be able to between fairness and efficiency, and complexity. take on user packets, in order to enforce different In Sections 2 and 3, we illustrate through a se- levels of service. As a result, the paper's first goal is lected set of scheduling and buffer management to provide a broad overview of the generic data path schemes, the many alternatives available to network approaches that have been developed to control ac- designers to control how user packets access network cess to resources in packet networks. resources. As mentioned above, the emphasis is just Resources that networks need to manage in order as much on reviewing the basic methods that have to support service differentiation primarily include been developed as on highlighting trade-offs. buffers and . The corresponding mecha- Trade-offs are not the prerogative of data path nisms consist of buffer management schemes and mechanisms, and very similar issues exist on the scheduling algorithms, respectively. Buffer manage- control path. This applies to both the type of services ment schemes decide which packets can be stored as available and how they can be requested. For exam- they wait for transmission, while scheduling mecha- ple, services can vary greatly in the type of guaran- nisms control the actual transmissions of packets. tees they provide. Service guarantees can be quanti- The two are obviously related, e.g., scheduling fre- tative and range from simple throughput guarantees quent packet transmissions for a given flow, i.e., to hard bounds on end-to-end delay and lossless allocating it more bandwidth, can help reduce its transmissions. Alternatively, some proposed service need for buffers, and conversely limiting the amount guarantees are qualitative in nature, and only stipu- of buffer space a flow can use impacts the amount of late ``better'' treatment for some packets, e.g., they'll bandwidth it is able to consume. go through a higher and, therefore, This paper does not attempt an exhaustive review experience lower delay, or will be given a lower of all possible scheduling and buffer management discard threshold in case of congestion. Similarly, schemes. Instead, its aim is to identify the basic requests for service can be dynamic and extend all approaches that have been proposed and classify the way to applications, e.g., using the RSVP proto- them according to the design trade-off they repre- colwx 6 , or may be handled only through static config- sent. In particular, it is important to understand how uration information that defines bandwidth provi- different schemes fare in terms of fairnessŽ access to sioning between specific subnets. excess capacity.Ž , isolation protection from excess In general, control path trade-offs are along two traffic from other users.Ž , efficiency number of flows dimensions: the processing required to support the that can be accommodated for a given level of service, and the amount of information that needs to service.Ž , and complexity both in terms of implemen- be maintained. For example, the complexity of a call tation and control overhead. . admission decision to determine if a new request can For example, certain buffer management schemes be accommodated, depends on both the service defi- maintain per flow buffer counts and use that infor- nition itself and how it maps onto the underlying mation to determine if a new packet should be scheduling and buffer management mechanisms. accommodated. Alternatively, other schemes base Similarly, a service such as the Guaranteed Service such decisions on more global information such as wx54 involves a number of parameters used to com- buffer content thresholds and service type indicators pute the end-to-end delay bound and buffer size that in packet headers. The finer granularity of per flow the service mandates. Examples illustrating some of information has benefits in terms of fairness and these issues and the associated trade-off are briefly efficiency, but comes at the cost of greater complex- reviewed in Section 4, that also describes and con- ity. Scheduling algorithms face similar trade-offs. trasts the two services currently being standardized For example, schedulers based on the weighted fair wx54,60 to provide QoS in IP networks. queueing algorithmwx 14,47 can be used to give rate In general, which trade-off is desirable or even and delay guarantees to individual flows, while class feasible is a function of the scalability requirements based mechanisms, e.g., CBQwx 19 , provide aggre- of each environment, i.e., the total amount of infor- R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 171 mation and processing that can be accommodated. gram size Ž.M . The token bucket has a bucket depth, This is a decision that involves both the data path b, and a bucket rate, r. The token bucket, the peak and the control path. For example, per flow schedul- rate, and the maximum datagram size, together de- ing, buffer management, and state maintenance and fine the conformance test that identifies the user processing may be appropriate in a local campus, but packets eligible for service guarantees. The confor- can create scalability problems in the core of a mance test defines the maximum amount of traffic backbone where the number of individual flows will that the user can inject in the network, and for which be orders of magnitude larger. In such an environ- it can expect to receive the service guarantees it has ment, scalability is of prime concern and while per contracted. This maximum amount of traffic is ex- flow service mechanisms remain desirable, ap- pressed in terms of a traffic envelope AtŽ., that proaches that rely on service aggregation may be specifies an upper bound on the amount of traffic needed. We review such issues in Section 5, where generated in any time interval of duration t: we highlight some of the approaches that have been F q q proposed so far and point to some of the challenges AtŽ.min ŽM pt,b rt ..1 Ž. being faced. In particular, we discuss the need for Eq.Ž. 1 simply states that the user can send up to interoperability between fine grain and coarse grain b bytes of data at its full peak rate of p, but must solutions, and outline possible solutions. then lower its rate down to r. The presence of the factor M in the first term is because of the packet nature of transmissions, where up to one maximum 2. Scheduling mechanisms size packet worth of data can be sent at link speed. The above model is essentially that of a traditional The nature of the scheduling mechanism em- ``leaky bucket'', that lets users specify a long term ployed on network links greatly impacts the QoS transmission rate Ž.r while preserving their ability to guarantees that can be provided by the network. The burst at a higher speed Ž.p for limited periods of basic function of the scheduler is to arbitrate be- timeŽŽ.. up to br pyr . The full TSpec further tween the packets that are ready for transmission on includes a minimum policed unit m, which counts the link. Based on the algorithm used for scheduling any packet of size less than m as being of size m. packets, as well as the traffic characteristics of the This allows for accounting of per flows multiplexed on the link, certain performance or transmission overhead, e.g., time needed to sched- measures can be computed. These can then be used ule packet transmissions. by the network to provide end-to-end QoS guaran- tees. First, we describe how the user traffic is charac- terized at the ingress into the network. 2.2. Scheduling policies

2.1. Traffic specification We now turn our attention to the different scheduling policies that can be used to select packets Both ATM and IP networks are similar in their for transmission on a link. There are several aspects characterization of user traffic. Rather than describe that are to be considered when choosing a scheduling both we concentrate on IP networks, where the algorithm. Some of the main ones are specification of user traffic is currently done through Ø The nature of the performance measures guaran- a TSpecwx 55 , which consists of the following param- teed by the scheduling algorithm. For instance, eters1 : a token bucket plus a peak rate Ž.p ,a does it enable the network to provide a per-flow minimum policed unit Ž.m , and a maximum data- rate guarantee, or does it provide a simple priority structure so that one class of traffic receives priority over another class? Ø The efficiency of the algorithm in enforcing ser- 1 Rates are in units of bytesrs, while packet sizes and bucket vice guarantees. Basically, how many calls with depth are in bytes. given characteristics can be packed on the link. 172 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189

This is often measured in terms of call admission packets. However, packets that arrive when the queue or schedulability region of the algorithm. is full have to be dropped. To ensure that the drop Ø The complexity of the algorithm. Given the high probability is below a certain threshold only a lim- link speeds that are in use today and the even ited number of flows should be accepted. A simple higher ones waiting in the wings, the amount of mechanism for performing this decision is to com- time available to schedule a single packet is pute an effectiÕe bandwidth that quantifies the continuously decreasing. Therefore, it is very im- amount of link capacity that is required for each flow portant for the algorithm to require a small num- wx25,31,40 . ber of operations per packet. Given that the FCFS policy is one of the least Ø The basic mechanism and parameters used by the sophisticated in its operation, it does not explicitly algorithm to make scheduling decisions. Of spe- provide any mechanism for fair sharing of link re- cific interest is whether the algorithm supports sources. However, with the help of some buffer separate delay and rate guarantees, e.g., management mechanisms it is possible to control the wx11,24,62 , or combines the two e.g., wx 29,46,58 . sharing of bandwidth and we describe some of these Separate rate and delay guarantees allow greater mechanisms in Section 3. flexibility. Ø The flexibility of the algorithm in handling traffic 2.4. Fixed priority scheduler in excess of the amount for which the service guarantees have been requested. Some algorithms This policy offers some ability to provide differ- easily handle excess traffic while others require ent grades of service with a rather coarse granularity. additional mechanisms. The flexibility of an algo- Traffic is classified as belonging to one of a fixed rithm in handling excess traffic is often described number of static priorities. The link multiplexer in terms of a fairness index, that attempts to maintains a separate queue for each priority and measure how equally the algorithm redistribute serves a packet in a lower priority queue only if all available resources across flowsŽ seewx 3,4,28 for the higher priority queues are empty. Each queue is discussions on this issue. . served in a FCFS manner and so this scheduling We try to cover each of these aspects as we mechanism is almost as simple as the FCFS policy review different scheduling policies. with the added complexity of having to maintain a few queues. Selecting a packet for transmission is a 2.3. First come first serÕed fixed cost operation that only depends on the number of priority levels and is independent of the number This is one of the simplest scheduling policies of flows that are being multiplexed. whose operation is as its name suggests, i.e., packets While the priority scheduler does offer a certain are served in the order in which they are received. amount of service differentiation capability, it still While this may seem like a fairly poor way of does not readily allow end-to-end performance guar- providing differentiated service, it is quite simple to antees to be provided on a per-flow basis. Rather, it implement. In particular, insertion and deletion from only provides for one class of traffic to receive better the queue of waiting packets is a constant time service than another class of traffic at a single link. operation and does not require any per-flow state to As with FCFS, the provision of end-to-end delay be maintained. Not surprisingly, the First Come First guarantees with a fixed priority scheduler can ServedŽ. FCFS policy is one of the most commonly achieved by limiting buffer sizes. However, one implemented scheduling policies. important difference, that has to be accounted for, is The FCFS policy does not lend itself readily to the fact that a lower priority packet will be served providing delay or rate guarantees. One way to only after all the packets from the higher priority provide a delay bound is to limit the size of the queues are transmitted. This is bound to affect the queue of waiting packets. That way, once a packet is variability in the delay that is observed by the lower queued up for transmission it is guaranteed to be sent priority packets. In general both the FCFS and fixed out in the time it takes to serve a full queue of priority schedulers are not ideal candidates when it R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 173 comes to providing guaranteed end-to-end delay is guaranteed to receive a minimum service rate of bounds. rtiŽ.given by Another problem with the priority scheduler is ° f that different switches may have different levels of i g , igBtŽ., priority and matching these levels from an end-to-end f s~ Ý j rti Ž. perspective is no easy feat. In Section 5 we touch ¢ jgBtŽ. upon some aspects of priority levels in the context of 0, otherwise. the Type of ServiceŽ. TOS bit semantics, which are currently being discussed at the IETF. In reality we don't have fluid flows, so the main use of GPS is as a reference model. By simulating this reference model it is possible to define service 2.5. Fair queueing schedulers disciplines that closely follow the GPS policy. This is precisely the way in which the Packetized Gener- The Weighted Fair QueueingŽ. WFQ service dis- alized Processor SharingŽ. PGPSwx 47 Ðalso referred cipline and its many variants have been very popular to as Weighted Fair QueueingŽ. WFQwx 14 Ðis de- in the current decade. Part of its popularity stems fined. A packet arrival in the real system is modeled from the fact that it overcomes some of the limita- as the arrival of a certain volume of fluid in the tions of the FCFS and priority schedulers by allow- reference system. A packet is considered to be trans- ing for a fine grain control over the service received mitted in the reference system when the fluid volume by individual flows. This allows the network to corresponding to that packet has been completely provide end-to-end delay guarantees on a per-flow served by the reference GPS scheduler. The PGPS basis. In addition, WFQ serves excess traffic in a fair policy simulates the reference GPS scheduler using manner, where fairness is measured relative to the the fluid arrival process as described above and amount of resources that are reserved for each flow computes the departure times of the packets in the wx26 . reference system. The PGPS scheduler then selects Most variants of the WFQ discipline are com- for transmission, the packet that would have departed pared to the Generalized Processor SharingŽ. GPS the earliest in the reference GPS system. scheduler which is a theoretical construct based on a One of the reasons for the popularity of the GPS form of processor sharingwx 47 . The operation of a policy is its fair handling of excess traffic. If two GPS scheduler can be elegantly defined for a fluid flows are back-logged during any interval of time model of traffic and tight end-to-end delay bounds they receive service in direct proportion to their can be computed for a flow that traverses multiple weights regardless of the amount of excess traffic links which are all served by GPS schedulers. The that any of them may be generating. Let WtiŽ.12,t basic idea behind GPS is as follows: a weight fi is denote the amount of flow i traffic served in the s s associated with flow i, i 1,...,N, and the link interval Ž.t12,t , i 1,...,N. The GPS policy ensures capacity is shared among the active flows in direct that for any two flows i, j back-logged during the proportion to their weights. In other words, if g interval Ž.t12,t , the following relation holds: were to denote the link speed, then flow i is guaran- teed to obtain a minimum service rate of Wti Ž.12,tWtj Ž.12,t s . f r N fg s Ž ijÝ s1 j . , i 1,...,N. However, at any given ffij time it is quite likely that some flows do not have a backlog of traffic waiting to be transmitted on the Indeed, it is this very relation that is one of the link. This will translate into some unutilized link measures used to quantify fairness among the many < r y capacity that can be shared among the back-logged approximations of GPS. A bound on WtiŽ.12,t fi r < Ž.active flows. The GPS scheduler shares this excess WtjŽ.12,t fj for any two flows that are back-logged capacity among the back-logged flows in proportion in the interval Ž.t12,t can be used to compare the to their respective weights. If BtŽ.denotes the set of relative fairness of different scheduling policieswx 26 . flows that are back-logged at time tG0, then flow i The GPS scheduler has a fairness measure of 0. 174 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189

G Most schedulers that are based on GPS can be condition, Riir , holds, the end-to-end delay guar- implemented using the notion of a virtual time func- antee for flow i is given by tion. The virtual time is relevant only during a busy period which is defined as a maximal interval of y H h time during which the server is not idle. For GPS the bHiiŽ.1 MLmax Dà sq q ,2Ž. virtual time, VtŽ., at the start of a busy period is i Ý g h RRiihs1 defined to be 0. During any busy period VtŽ.ad- vances at the rate of E V g where Mi denotes the maximum packet length for s . flow i, and Lh denotes the maximum packet length E max t fi r Ý at link h. Notice how the biiR term appears only igBtŽ. once regardless of the number of hops that are Based on this virtual time, one can define for each traversed. The reason for this, is that once a bottle- packet of a flow, say, the kth packet of flow i, its neck link is encountered, the PGPS scheduler effec- virtual start time, s k, and virtual finish time, f k, that tively smooths out the burst in the flow, so that the iir correspond to its start and finish times, respectively, burst delay Ž.biiR is no longer encountered down- in the GPS reference system. If we further denote its stream. Also notice how the end-to-end delay bound k k is independent of the number of other connections arrival time by aii, and its length by l , we then have the following relations: that are multiplexed at each of the links traversed by flow i. This property makes the PGPS policy one of 0 s fi 0, the best in terms of providing tight end-to-end delay k s k ky1 s guarantees. siiimaxÄ4VaŽ., f , k 1,..., l k kks q i s fiis , k 1,... fi 2.6. Variants of fair queueing The previously mentioned simulation of GPS on which PGPS relies, is based on this notion of virtual One of the main drawbacks of PGPS is the com- time. It is implemented as follows: the scheduler plexity involved in computing the virtual time func- always picks the packet with the smallest virtual tion. Recently, several scheduling policies with sim- finish time for transmission on the linkŽ ties can be pler computations of virtual time have been pro- arbitrarily broken. . This can be efficiently performed posed. These policies can guarantee end-to-end delay using a priority queue based on the finish times of bounds that have a form similar to that of Eq.Ž. 2 , the packets. The complexity of the insertion and and we briefly highlight some of them next. deletion operations from this queue is OŽ. log N . Self Clocked Fair QueueingŽ. SCFQ was pro- Note however, that there is additional complexity in posed inwx 26 as a simpler alternative to PGPS. computing the advancement of virtual time since Rather than use the GPS reference model for a BtŽ.is not likely to remain the same during the computation of the virtual time, it based the evolu- entire busy period. In the worst case the virtual time tion of VtŽ.on the virtual start time of the packet computation can be OŽ.N as shown inwx 26 . currently in service. This greatly reduced the amount Another advantage of the PGPS policy is that an of computation needed to keep track of the virtual end-to-end delay bound can be computed based on time. This reduction in complexity results in a larger the weight assigned to the flow. Let g h, hs1,...,H end-to-end delay bound than PGPS. Roughly, it adds N r h denote the speed of the links traversed by flow i. For a delay term of the form Ýis1 Mi g at each of the simplicity, assume that the traffic envelope for flow i links that are traversedwx 27 . The fact that the end-to- s q G is given by AtiiiŽ. b rt, t 0, and let R ibe the end delay now depends on the number of flows that minimum rate guaranteed to flow i by each of the are multiplexed on the link is the main drawback of links that it traverses. Assuming that the stability this policy. R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 175

Start-time Fair QueueingŽ. SFQ was proposed in delay guarantee associated with the flow that the wx28 as another Fair Queueing policy. It is very packet belongs to. The EDF scheduler selects the similar to SCFQ with the main difference being that packet with the smallest deadline for transmission on the SFQ scheduler picks that packet with the small- the link and hence the name. The dynamic nature of est virtual start timeŽ as opposed to the smallest the priority in the EDF scheduler is evident from the virtual finish time. for transmission on the link. The fact that the priority of the packet increases with the end-to-end delay of SFQ is very close to SCFQ, but amount of time it spends in the system. This ensures is slightly smaller. that packets with loose delay requirements obtain While PGPS tracks the GPS policy it only uses better service than they would in a static priority the finish times of the packets in the reference scheduler, without sacrificing the tight delay guaran- system to select the next packet for transmission. tees that may be provided to other flows. It is well Thus packets which have not yet started service in known that for any packet arrival process where a the reference system may be selected for service deadline can be associated with each packet, the simply because their finish times are earlier than all EDF policy is optimal in terms of minimizing the the other packets waiting to be transmitted. This can maximum lateness of packetswx 22 . Here, lateness is result in a significant difference between packet de- defined as the difference between the deadline of a partures in the PGPS and the reference system, with packet and the time it is actually transmitted on the packets departing much earlier in PGPS than in the link. reference system. This discrepancy can be measured As mentioned earlier, the GPS policy guarantees a using the worst-case fairness index as defined inwx 4 . delay bound on a per-flow basis based on the weight The Worst-case Fair Weighted Fair Queueing that is assigned to the flow. These weights are ŽWF2 Q. scheduler proposed inwx 4 uses both the start closely coupled to a reserved rate, and for a flow to and finish times of packets in the reference GPS receive a small end-to-end delay guarantee it is system to achieve a more accurate emulation of GPS. necessary that it be allocated a relatively large rate. The WF 2 Q policy selects the packet with the small- This can lead to an inefficient use of resources, est virtual finish time in the reference system pro- particularly if a low bandwidth flow requires tight vided that its virtual start time is less than the current end-to-end delay guarantees. One of the main attrac- time. This prevents it from running too far ahead of tions of the EDF policy is that it allows for the the reference system. The end-to-end delay bound separation of delay and throughput guarantees for a for a WF 2 Q scheduler is identical to that of PGPS. flow. There are a few more scheduling policies that also In terms of implementation the EDF policy is seek to emulate Fair Queueing. The interested reader more complex than the FCFS or the static priority is referred towx 56 for a description of Deficit Round scheduler. The complexity arises because the sched- Robin, which extends Weighted Round Robinwx 39 uler has to pick the packet with the smallest deadline by taking into account variable sized packets. for transmission on the link. This involves keeping a Frame-based Fair Queueingwx 57 is another schedul- priority list of deadlines and so the insertion or ing policy that provides the same end-to-end delay deletion from this list has a complexity of OŽ. log K guarantees as PGPS but is simpler to implement. operations, where K corresponds to the number of packets awaiting transmission. An obvious optimiza- tion is to only keep a single packet from each of the flows in the priority list of deadlines, since packets 2.7. Earliest deadline first belonging to the same flow must leave in the order in which they arrive. As a result, the complexity is The Earliest Deadline FirstŽ. EDF scheduler is a reduced to OŽ. log N , where N is the number of form of a dynamic priority scheduler where the flows multiplexed onto the link. Assuming that each priorities for each packet are assigned as it arrives. of these flows are characterized by a traffic envelope s wx Specifically, a deadline is assigned to each packet denoted by AtiŽ., i 1,...,N, it is known 22 that which is given by the sum of its arrival time and the the EDF scheduler can provide a delay guarantee of 176 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189

s Di to flow i, i 1,...,N, provided the following tees. This corresponds to the notion of a schedulable feasibility condition is satisfied: region, and it can be shownwx 24 that the RCS N discipline has the largest schedulable region among ty q Fgt tG Ý AiiŽ.D Lmax , 0, Ž. 3 all scheduling policies known today. An example is1 illustrating the schedulable region of different s - wx where AtiŽ.: 0 for t 0, and Lmax denotes the scheduling policies can be found in 48 . maximum transmission unit for the link. 2.8. Hierarchical link sharing 2.7.1. Rate controlled serÕice discipline The EDF policy by itself cannot be used to pro- As discussed in Section 4, a network may offer vide efficient end-to-end delay guaranteeswx 24 . This several different services over a single link. The link is because Eq.Ž. 3 only provides a feasibility check will, therefore, have to be partitioned to support the for the delay guarantees at a single link. To compute different service classes. Alternatively, a single link the feasibility check at a link that is downstream we in the network may be shared by several different need to know the traffic characterization at that link. organizations or departments within an organization. The very act of scheduling perturbs the flow and so Each of these may want to receive a guaranteed the traffic characterization of the flow is not likely to portion of the link capacity but are willing to allow remain the same as it traverses the network. It is other departments or organizations to borrow unuti- possible to compute a bound on the traffic character- lized link resources. The hierarchical structure of ization of a flow at the link output and use that to organizations suggests a hierarchical partitioning of compute a feasible delay guarantee for the flow at the link resources. A hierarchical link sharing struc- the downstream link. However this approach does ture consisting of classes that correspond to some not result in efficient end-to-end delay guarantees aggregation of traffic is suggested inwx 19 and is since the bound on the traffic characterization at the often referred to as Class Based QueueingŽ. CBQ . output link can be rather pessimisticwx 24 . Alterna- Each class is associated with a link-sharing band- tively, one could reshape the traffic at each node to a width and one of the goals of CBQ is to roughly pre-specified envelope before it is made eligible for guarantee this bandwidth to the traffic belonging to scheduling. Coupled with traffic shapers the EDF the class. Excess bandwidth should be shared in a policy can be used to provide efficient end-to-end fair manner among the other classes. There is no delay guarantees on a per-flow basis. We refer to this requirement to use the same scheduling policy at all combination as a Rate-Controlled ServiceŽ. RCS dis- levels of a link sharing hierarchy, and it is conceiv- ciplinewx 24,62 and briefly discuss some of the impli- able that classes carrying interactive traffic may ben- cations of the RCS discipline. efit from a simple priority scheduling policy as The basic idea of the RCS discipline is to reshape opposed to rate-based schedulersŽ seewx 19 for de- traffic from a flow at every link that it traverses. The tails. . flow need not be reshaped to its original specifica- This being said, the GPS policy can be used in a tion, in fact this turns out to be a particularly bad hierarchical manner to provide both link sharing and ideawx 24 . With the appropriate choice of reshapers it individual QoS guarantees to flows. At the top-level can be shown that the RCS discipline is the most of the hierarchy, the weights reflect the link sharing efficient in terms of guaranteeing end-to-end delays requirements, while the lower level weights are used of all the scheduling policies that are known today. to provide individual QoS guarantees. Whenever an Inwx 24 , it is shown that with an appropriate choice of interior node receives service it is distributed among shaper envelopes the RCS discipline can provide the its child nodes in direct proportion to their weights. same delay guarantees as PGPS and in addition Note that there may be more than two levels in the accept some more flows. As mentioned at the begin- hierarchy. Except for leaf nodes, other nodes only ning of this section, the efficiency of scheduling receive a logical service. The interested reader is algorithms can be measured in terms of the number referred towx 3 for a detailed description of hierarchi- of flows they can support with given delay guaran- cal packet fair queueing algorithms. R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 177

3. Buffer management complexity of the scheme. In addition, it may not always be sufficientŽ seewx 7 for a discussion of this While the scheduling policy does play a big role issue. and additional mechanisms may be needed to in the QoS provided by the network, it is only try to avoid congestion. In this section, we describe a effective if there are sufficient buffers to hold incom- few such buffer management mechanisms like Early ing packets. Since link speeds are currently in the Packet DiscardŽ. EPDwx 51,59 and Random Early order of Gigabits per second, the amount of memory DiscardŽ. REDwx 18 , where, as a preventive measure, required to buffer traffic during transient periods of packets are discarded before the onset of congestion. congestion can be large and exceed the amount of Detecting the onset of congestion and the associ- memory that most routers and switches have. Packets ated preventive measures vary with the link technol- that arrive during these transient periods will have to ogy being used. For example, on ATM links an IP be dropped. If the end-to-end transport protocol is a packet may be segmented into smaller units, e.g. 53 reliable one like TCP, then a lost packet causes a byte cells. In this case, congestion is measured with retransmission as well as a reduction in the size of respect to the buffer's ability to store cells. A single the congestion window. The net result will be a cell loss will result in an entire packet being ren- lower end-to-end QoS for the flow concerned. dered useless, so that it is advantageous to discard an In order to avoid a haphazard behavior when a entire packet's worth of cells at a time. One obvious link experiences congestion, several different buffer approach is to drop all subsequent cells from a management schemes have been proposed. These packet if there is no room in the buffer for an can be roughly classified along the following two arriving cell. This is termed Packet Tail Discarding dimensions: inwx 59 or Partial Packet DiscardŽ. PPD inwx 51 . While 1. When are packet discard decisions made? Typi- such a technique helps avoid unnecessary buffer cally, packet discard decisions are made either waste, substantial packet throughput degradation still upon the arrival of a new packet, or at the onset occurs during periods of very high congestion, e.g., of congestion where currently stored packets may offered traffic above twice the link capacity. This is then be discarded to accommodate a new, higher because in such cases, almost every packet will lose priority packet. at least one cell, and flows with large packets will be 2. What information is used to make packet discard even further penalized. decisions? The main aspect is the granularity of A possible approach to remedy this last problem, the information, i.e., is per flow buffer accounting is to predict the onset of congestion and discard all done and used to discard packets from individual cells of packets that are expected to experience a cell flows, or is only global, per class, information loss. For example, this can be achieved using a kept and used. simple threshold on the buffer occupancy to decide Different designs correspond to different trade-offs whether to accept or drop a packet at the time its between performance and complexity along these first cell is received. This is referred to as Early two dimensions, where the performance of a buffer Packet DiscardŽ. EPDwx 51 and has the benefit that management scheme is measured in terms of its link capacity is not wasted in the transmission of ability to control traffic fairly and efficiently during partial packets. EPD performs better than PPD and periods of congestion. Efficient treatment means that results in an effective throughput of close to the link the buffer management mechanism should avoid vio- capacity with relatively fewer bufferswx 51,59 . lation of a service agreement by losing many ``high However, it turns out that EPD can be unfair to priority''Ž. conformant packets during periods of low bandwidth flows. In particular, if the buffer congestion. For example, this can happen if too occupancy is hovering around the threshold value, many low priority packets are allowed in, and as a then high rate flows are likely to have more opportu- result some packets from an arriving high priority nities at getting their packet acceptedŽ assuming burst are lost. One solution is to allow the high identical packet sizes. . Once a packet is accepted it priority packets to ``push-out''wx 42 the low priority pushes the buffer occupancy above the threshold and ones, but this capability adds to the implementation a subsequent packet from a low bandwidth flow is 178 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 then likely to be dropped. Some of these issues are denoted by maxthand min th. If the average queue wx discussed in 59 , which presents a number of en- size is below minthor above max th, all packets are hancements to the basic EPD scheme aimed at im- accepted or dropped respectively 2. When the aver- proving not only , i.e., the number of com- age queue size is between maxthand min th,an plete packets transmitted, but also fairness in select- arriving packet is dropped with a probability that ing flows that can start buffering a new packet. depends on the average queue size. In general, it is desirable to devise buffer manage- The probability that a packet from a particular ment schemes, that preserve the goodput benefits of flow is dropped is roughly proportional to the flow's EPD while also ensuring fairness in how this good- share of the link bandwidth. Thus a flow that is put is distributed across flows. The fairness of a utilizing a larger share of the link bandwidth is buffer management scheme is a function of how it forced to reduce its rate rather quickly. One of the penalizes packets from non-conformant flows, i.e., main advantages of RED is that it does not require flows sending at a higher rate than they are entitled per flow state to be maintained in the . As a to. Ideally, the scheme should ensure that no single result, RED is relatively simple to implement and flow can grab a disproportionate amount of the can be used in conjunction with the simple FCFS buffer space, thereby affecting the performance level scheduler described in Section 2 to reduce conges- of other flows. If per flow buffer accounting is tion in the network. This can be of significance in performed, it is relatively easy to identify misbehav- the Internet backbone, where there may be hundreds ing flows and take appropriate actions to ensure a of thousands of flows on a given link. fair allocation of the buffer space. However, there is While RED does not discriminate against particu- a cost associated with per flow buffer accounting, lar types of flows, it should be noted that it does not and in some environments where scalability is a always ensure all flows a fair share of bandwidth. In concern, buffer management may need to be done at fact, it turns out that RED is unfair to low speed TCP a coarser level. The penalty is that it is harder to flows. This is because RED randomly drops packets ensure fairness when many flows share a common when the maximum threshold is crossed and it is buffer. In some instances, i.e., with adaptive applica- possible that one of these packets belongs to a flow tions such as TCP, it is, however, possible to ensure that is currently using less than its fair share of some level of per flow fairness without maintaining bandwidth. Since TCP reacts rather strongly to packet per flow state. This was one of the goals of the loss, the lost packet will force further reduction in Random Early DropŽ. REDwx 18 mechanism, which the congestion window resulting in an even lower relies on random dropping decisions when the buffer rate. The Fair Random Early DropŽ. FRED mecha- content exceeds a given threshold, so that heavy nism is presented inwx 44 as a modification to RED in flows experience a larger number of dropped packets order to reduce some of its unfairness. In addition to in case of congestion. Hence RED aims at penalizing the RED thresholds, FRED maintains thresholds for flows in proportion to the amount of traffic they individual flows as well as the current buffer occu- contribute, and therefore preventing any of them pancies for each of the active flows. This is used to from grabbing a disproportionate amount of re- decide which flows are using a larger amount of sources. bandwidth than they are entitled to, so that a selec- Specifically, assuming that flows use TCP as their tive discard mechanism can be effected. This per-flow transport protocol, RED provides a feedback mecha- state also allows FRED to detect misbehaving flows, nism so that sources can be notified of congestion in and it can use this information to prevent them from the router and accordingly reduce the amount of consistently monopolizing the buffers. traffic that they inject into the network. First the average queue size is computed using a low-pass filter with an exponentially weighted moving aver- age, so that the router does not react too quickly to 2 Packets can be alternatively marked, instead of dropping transient phenomena. Next, this queue size is com- them, if packet marking is supported by the network and the pared to a maximum and minimum threshold, often end-stations R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 179

In some environments, notably ATM, it is quite vicewx 60 and the Guaranteed ServiceŽ. GSwx 54 . We natural to maintain state information for each flow or also discuss briefly a recent IETF effort to introduce . Also, the fixed cell size for ATM less rigorous service specifications than those of CL facilitates hardware implementation of complex and GS, to facilitate support for service aggregation. scheduling and buffer management algorithms. For We expand further on this last issue in Section 5. ATM, there are many single chip solutions that perform per-flow scheduling and buffer manage- 4.1. Controlled load serÕice ment, e.g., IBM's CHARM chipwx 13 . Here each virtual circuit can be guaranteed to receive a certain The definition of the Controlled Load service is rate regardless of the other flows that are multi- qualitative in nature. It aims at approximating the plexed on the link. However, as mentioned earlier, service a user would experience from an unloaded this rate guarantee is good only if there is sufficient network, where unloaded is not meant to indicate the memory to buffer incoming cells. One might argue absence of any other traffic, but instead the avoid- for the buffers to be strictly partitioned among each ance of conditions of heavy load or congestion. Of of the virtual circuits but this is not necessarily a significance is the fact that while the Controlled good strategy. Inwx 38 it is shown that a complete Load service specification is qualitative in nature, it partitioning of the buffers results in a higher packet does require a quantitative user traffic specification loss probability than when the buffers are shared in the form of the TSpec of Section 2.1. This is of among all the flows. Of course, complete sharing importance as this represents a key input to the call comes with its own problems, particularly when a admission process, that is required to limit the num- diverse set of flows are to be multiplexed on the link. ber of flows the network is willing to accept and, A good compromise is to guarantee a few buffers to therefore, ensure the desired ``unloaded'' network each individual flow, while retaining a certain frac- behavior. tion of the total buffer space for sharing across the The guarantees that the Controlled Load service different flowswx 38 . As a matter of fact, the relation- provides are essentially a guaranteed sustained trans- ship between buffer allocation and rate guarantee can mission rate equal to the token bucket rate r, and the be made precise. Inwx 33 , it is shown that even with possibility to send occasional burstsŽ of size limited an FCFS scheduler, rate guarantees can be provided by the token bucket depth b. at its peak rate p. by appropriately allocating buffers to flows. In par- There are no explicit delay or even loss guarantees ticularwx 33 , establishes the intuitive result that rate associated with the service, and this allows for some guarantees are in proportion to buffer allocation. flexibility in the call admission process and the Overall, the buffer management schemes cur- associated scheduling and buffer management mech- rently being deployed should ensure that service anisms. guarantees of applications are met, i.e., packets that For example, Controlled Load flows on a given conform to a service contract will rarely be dropped. network link can be supported through a basic FCFS However, substantial differences may exist in how scheduler whose load is controlled using an effective excess traffic is being supported. In particular, net- bandwidth based call admission control, e.g., works that do not differentiate between adaptive and wx25,31,40 , together with a simple threshold based non-adaptive applications and do not support per buffer management scheme. The benefits of such an flow buffer accounting, are likely to provide only approach are its simplicity in terms of both data path relatively poorŽ. unfair performance to excess traffic. and control path. Neither the scheduling nor the buffer management require awareness of individual flows, and the call admission process only involves 4. Service definitions and associated requirements additions and subtractions as flows come and go. Further efficiency in the call admission control can In this section, we review the two services that be achieved, at the cost of some added complexity, have been standardized by the IETF Integrated Ser- by using better models for computing the effective vice Working Group, the Controlled LoadŽ. CL ser- bandwidth of a flow based on its TSpec as suggested 180 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 inwx 34,35 , or by using measurement based call ad- less so for the Guaranteed Service because of its mission control techniques as described inwx 30,37 . stricter service definition. It should be noted that the approach just outlined 4.2. Guaranteed ser ice assumes that excess traffic from individual flows can Õ be readily identified, e.g., using explicit marking as The Guaranteed Service shares with the Con- suggested inwx 9 . If it is not possible to explicitly trolled Load service the use of a TSpec to specify the mark and, therefore, easily identify excess traffic user traffic to which the service guarantees apply, from individual flows, more complex data path but this is where any resemblance stops. While the mechanisms may be required to properly support the Controlled Load service offers qualitative service Controlled Load service. This is required to ensure guarantees, the Guaranteed Service gives not only that excess traffic from one flow does not impact the quantitative but also hardŽ. deterministic service service guarantees of other flows. One possible alter- guarantees. Those guarantees include, for confor- native is to rely on per flow buffer accounting mant packets, lossless 3 transmission and an upper mechanisms as described in Section 3. The tighter bound on their end-to-end delay. The goal of the control of the amount of buffer space each flow can service is to emulate, over a packet-switched net- occupy provides the necessary protection against ex- work, the guarantees provided by a dedicated rate cess traffic, but comes at the cost of maintaining per circuit. Clearly, such guarantees are not warranted flow buffer state. for all applications because of their cost, but they are A further refinement to per flow buffer state is to important for applications with hard real-time re- replace the FCFS scheduler by one of the schedulers quirements, e.g., remote process control, tele-medi- of Section 2, that is capable of providing rate guaran- cine, haptic rendering in distributed CAVEswx 20 , etc. tees to individual flows. This can be a Weighted Fair Requesting the Guaranteed Service is carried out QueueingŽ. WFQ schedulerwx 14,45±47 , or even a using a one pass with advertisementŽ. OPWA ap- Self Clocked Fair Queueing schedulerŽ. SCFQ proach as supported by the RSVPwx 6 protocol. The wx26,27 since strict delay guarantees are not critical process starts with the sender specifying its traffic in for Controlled Load flows. This increases the amount the form of a TSpec, which is then sent in an of per flow state by adding transmission state to associated signalling message, e.g., using an RSVP buffer state, but offers a number of additional bene- PATH message, towards the intended receiverŽ. s . As fits. the message propagates towards the receiverŽ. s , it is It improves the accuracy of the buffer manage- updated by each node on the path so that the charac- ment mechanism by ensuring that packets from each teristics of the path are recordedwx 55 and, therefore, flow get regularly transmitted, so that any increase in available to the receiver once the message reaches it. a flow's buffer occupancy can be readily associated Characteristics of particular interest to the Guaran- with a traffic increase for the flow. In contrast, a teed Service are captured in an ADSPEC, that is FCFS scheduler provides a much less regular service used to record the delay ``contribution'' of the to individual flows, e.g., a flow whose packets are all scheduler used at each node. This contribution is in at the back of the queue must wait until all packets the form of two ``error terms,'' C and D, that ahead of them have been sent. As a result, it is more account for different aspects of a scheduler's behav- difficult to determine if variations in the buffer occu- ior and whose impact is additiÕe along the path of pancy of a flow are caused by an increase in traffic the flow. In other words, the cumulative impact of or are due to service fluctuations. the schedulers of all the nodes on path P is itself In general, the relatively loose definition of the represented by two error terms Ctotand D tot of the guarantees provided by the Controlled Load service form: allows a relatively wide range of implementation s s CtotÝÝC i, D totD i. trade-offs. In particular, most of the mechanisms igP igP described in Sections 2 and 3 can be used to build a system that abides by the Controlled Load specifica- tionswx 60 . As we shall see in the next section, this is 3 Except for losses due to line errors. R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 181

The Guaranteed Service is built around the model from the traffic envelope and the service curve, of rate-based schedulers as first investigated inwx 47 respectively. and discussed in Section 2. Rate based schedulers Specifically, a user with a TSpec of the form attempt to approximate a perfect fluid server, that Ž.b,r, p, M , and which has been allocated a ``service guarantees delay by ensuring the user a minimal rate'' of R at each of a set of schedulers character- service rate at any time, i.e., the GPS model of ized by cumulative error terms Ctotand D tot,is Section 2.5. Under such a model, a scheduler can be guaranteed an upper bound Dà on its end-to-end represented by how it deviates from the behavior of queueing delay of the formwx 23,54 : this perfect fluid server. The error terms C and D of ° y y the Guaranteed Service are used to capture those Ž.Ž.b MpR if p)R, deviationsŽ seewx 47,54 for details. . RpŽ.yr The behavior of a Guaranteed Services scheduler MC s~ qqtot q can be accurately described using the notion of Dà Dtot Ž.4 serÕice curÕes wx11,12 . Analogous to the way a traf- RR fic envelope bounds the amount of input traffic, a MCtot ¢ qqD if pFR. service curve can be used to provide a lower bound RRtot on the amount of service received by a flow at an individual network elementŽ seewx 12,43,53 for defi- A similar expression is available to determine the nitions of service curves. . The Guaranteed Services necessary buffer space at a given node, so as to scheduler can be represented as having a service avoid any packet losswx 23,54 . curve of the rate- formwx 43 , with a latency of In the context of the Guaranteed Service, Eq.Ž. 4 CrRqD, for all values of the rate R. The end-to-end is used by the receiver to determine which service service curve for a given flow is computed using a rate R to request in order to obtain the desired convolution of the service curves at each of the end-to-end delay bound, given the TSpec specified network elements traversed by the flow. An advan- by the user and the Ctotand D tot values associated tage of the service curve methodology is that it with the schedulers on the path. This request for a enables a simple graphical representation of the service rate R is then sent back towards the sender, worst-case delay and buffer requirements. For exam- e.g., using an RSVP RESV message, and processed ple, as shown in Fig. 1, the worst-case delay and at each node which determines if it has enough buffer requirements for a flow at network element i capacity to accept the request, i.e., performs call are computed as the horizontal and vertical distance admission. Call admission varies with the type of scheduler used, and this is one of the areas where different trade-offs between efficiency and complex- ity are possible. Before proceeding further with this issue, we briefly review a number of interesting properties of the Guaranteed Service. First, while the use of a rate-based model intro- duces some limitations in terms of efficiencyŽ see wx24 for an example. , it also affords significant sim- plicity. In particular, the use of a single service rate to be guaranteed at each scheduler on the path, avoids altogether the complex problem of trying to determine and assign individual resource require- ments at each scheduler. Such a ``one size fits all'' approach precludes a lightly loaded scheduler from compensating for a more heavily loaded one, e.g., as is feasible with a delay based scheduler such as EDF, Fig. 1. Delay and buffer calculations for an Ž.b,r, p, M flow. where a large deadline at one node can be compen- 182 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 sated by a small one at another node. However, left the service curve at node i. This shift helps exploiting such potential is a challenging problem 4. reduce not only the delay, but also the buffering Second, the rate-based model does not preclude needed at node i. Furthermore, this reduction in the use of other types of schedulers. For example, a buffering also applies to downstream nodes, at least delay based scheduler such as EDF can be used until the next reshaping pointŽ seewx 23 for a discus- instead, albeit with some constraints on how the sion on the benefits of reshaping. . As a result, the deadlines are to be chosen at each nodeŽ seewx 23 for choice of a particular scheduler needs to take into details. . consideration not only the cost of the scheduler Third, as mentioned earlier, there is a cost associ- itself, but also the cost of other system wide re- ated with the deterministic guarantees that the Guar- sources such as link bandwidth and buffer space. anteed Service provide. For example, as is shown in These trade-offs need to be further combined with wx23 an average load of only 40% is not uncommon control path issues, which can also contribute signifi- on a link carrying Guaranteed Service flows. The cant differences in terms of complexity and effi- remaining bandwidth will most certainly not be left ciency. In order to better illustrate the issues, we unused, i.e., it will be assigned to lower priority focus the discussion on the differences between traffic, but this indicates that the network is likely to rate-based, e.g., WFQ, and delay-based, e.g., EDF, charge a premium for Guaranteed Service. The hard schedulers. Delay-based schedulers have been shown, guarantees it provides may be important to certain e.g.,wx 22,24 , to yield larger schedulable regions than applications, but the cost may not be justified for all. rate-based ones and to, therefore, afford more effi- cient use of link bandwidth when it comes to provid- 4.2.1. Trade-offs in implementing guaranteed serÕice ing hard delay guarantees. However, this comes at We now briefly discuss some of the trade-offs the cost of a more complex call admission process. that are available when implementing the Guaranteed In the case of a rate-based scheduler such as Service. As for the Controlled Load service, trade- WFQ, the call admission decision amounts to deter- offs involve both the data path and the control path. mining if the requested service rate R is available. Data path trade-offs are primarily in terms of This is easily accomplished by keeping track of the scheduler efficiency versus implementation complex- sum of the service rate requests that have been ity, where efficiency is measured through the sched- granted, and ensuring that this sum remains smaller uler's schedulable regionwx 22,24Ž the larger, the than the link speed after the new request is added. In better. , and complexity is a function of the opera- other words, call admission amounts to a simple tions performed when scheduling the transmission of addition and comparison. a packet. For example, as was discussed in Section The case of a delay-based scheduler such as EDF 2, the tighter schedulingŽ. smaller error terms pro- is more involved. First, in the context of the Guaran- vided by the WFQ scheduler when compared to, say, teed Service, the requested service rate R needs to the SCFQ scheduler, comes at the cost of more be translated into a local deadline which the EDF complex computations when scheduling a packet. scheduler can use. As mentioned earlier, this first This affects the implementation cost and speed of the task is easily performed, e.g., using the approach of scheduler. wx23 , so that the bulk of the complexity lies in However, it should be noted that the tighter the determining if the new request with its associated scheduling, the lower the buffer cost. This is illus- deadline and TSpec can be accepted. The exact trated in Fig. 1, which shows the cumulative worst- procedure is as given in Eq.Ž.Ž 3 see alsowx 23. , but is case delay and buffer requirements at node i. Tight- more easily described using an example. ening the scheduling at node i amounts to reducing Fig. 2 provides a graphical representation of the the error terms Ciiand D , which in turn shifts to the call admission procedure used with an EDF sched- uler. It essentially amounts to ensuring that the ``sum'' of the shifted traffic envelopesŽ as specified 4 The optimal assignment of local deadlines for a given end-to- by Eq.Ž.. 1 of all the flows remains at least one end delay bound is known to be intractable. maximum size packet below the line corresponding R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 183

Ž.per flow control. However, these initial works were quickly followed by many proposals aimed at strengthening service guarantees through the use of more sophisticated scheduling and buffer manage- ment algorithms as described in Sections 2 and 3. This evolution was fueled in part by the availability of new components 5 capable of such advanced func- tions, but also by the desire to better control the service experienced by individual flowsŽ seewx 8 for an example of potential discrepancies between global and per flow service guarantees. . Those activities and efforts not withstanding, there Fig. 2. Call admission rule for EDF scheduler. has been a renewed interest and focus on ``simple'' QoS guarantees, and in some instances guarantees to the maximum number of bits that can be transmit- that are even more qualitatiÕe than the Controlled ted at the link rate. The shift applied to each traffic Load service specification. For example, many of the envelope is equal to its local deadline at the EDF recent IETF discussions on these issues 6 have fo- scheduler. Fig. 2 shows the case where two flows cused on defining services that map to different Ž.flow 1 and flow 2 are already in, and a third one levels of ``sensitiÕities'' to delay and loss, i.e., Ž.flow 3 is being added. The complexity of the without being associated to explicit values or guaran- procedure is that it requires both storing the current tees. There are many reasons that have been men- sum of all shifted traffic envelopes, e.g., by keeping tioned to motivate a return to those simple solutions. the associated inflexion points, and checking that One of them is clearly the difficulty of upgrading adding a new shifted envelope does not cause any an infrastructure of the size of today's Internet. The point of the new curve to cross the link capacity line sheer size of the Internet means that it is likely that Žthis can again be done by checking only at inflexion the path of any flow will, at least for some time, points. . There have been some proposals, e.g.,wx 17 , cross portions of the network which are either not with simplified procedures, but the call admission QoS capable, or only support crude service differen- procedure does remain more complex than the sim- tiation capabilities. This weakens the case for strong ple addition of rate-based schedulers. As a result, it per flow end-to-end guarantees. Similarly, support is again important to consider this cost in the context for the signalling capabilities, e.g., RSVP, required of an overall system design. to let users and the network negotiate precise service definitions may also not be widely deployed initially. 4.3. Differentiated serÕice There are, however, more fundamental arguments that have been put forward. The more prevalent one In this section, we briefly discuss some recent is that there is no real need for strong and explicit efforts started in the IETF to introduce ``looser'' QoS guarantees. Instead, proper network engineering service specifications than those that have been dis- and broad , e.g., into a small cussed in the two previous Sections. It is actually number of ``priority'' classes, coupled with the interesting to note that this represents somewhat of a adaptive nature of many applications, should be suf- backwards evolution. Specifically, some of the ear- ficient to offer the necessary functionality. The basic lier works in the area of QoS support had focused on simple rules for call admission, coupled with aggre- gate priority-based structures for service differentia- 5 wx tionŽ see, for examplewx 1,35 , for the description of a For example, the CHARM chip 13 can reshape and schedule general framework based on those premises. . The up to 64,000 individual flows at speeds up to OC-12, and there are today many instances of components with similar capabilities. main reason was that the data path technology was at 6 Those discussions are currently been carried in the Diff-Serv the time not considered capable of finer granularity working group and its mailing list at [email protected]. 184 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 premises and reasoning behind this argument are that result, it may be worth considering different QoS only a small fraction of applications have the need models in different environments. In particular, the for strong guarantees. As a result, adequate provi- potential for aggregation offered by a ``differentiated sioning for their peak traffic load, together with service'' model can be of benefit in the backbone of protectionŽ. through coarse classification from lower the Internet, while the finer grain ``integrated ser- priority traffic, will provide them the desired level of vice'' and RSVP model may be more appropriate for service. Adequate provisioning will also ensure that access and campus networks. The main challenge is the rest of the traffic experiences, most of the time, then to ensure inter-operability between these two adequate service, possibly with again some coarse environments. In the next section, we explore some differentiation. In case of congestion, flows will of these issues and possible approaches to inter-oper- adapt their traffic to the available resources, and ability. continue operating albeit at a lower level of service. The benefits are higher oÕerall efficiency, i.e., more flows getting through, and greater simplicity, i.e., 5. End-to-end QoS: aggregation issues and models minimal signalling support and simple data path mechanisms. As was mentioned in the previous section, scala- The analogy most often used to illustrate this bility requirements are likely to introduce the need model, is a comparison between IP telephony and the for aggregation. This is especially applicable to the circuit-switched telephone system. In cases of unusu- core of the backbone where the potentially large ally high congestion, a circuit-switched system will number of flows and the high speeds of the links, block new calls and may even operate at a lower can stress cost and complexity. This need for aggre- level of throughput because of resources held by gation is by no means a prerogative of IP networks. calls that are being blocked. In contrast, upon detect- For example, the hierarchical structure of the VPI ing congestion, IP telephones may be able to adjust and VCI fields in the header of ATM cellswx 52 , was their coding rate and even be able to still operate primarily intended to support both data path and with higher packet losses and delays. Assuming that control path aggregation. Specifically, forwarding this is achieved without preventing emergency calls and service differentiation decisions can be made by from getting through, e.g., by assigning them to a looking only at the shorterŽ. 8 to 12 bits VPI field higher priority class, the claim is that this will allow instead of the full 28 header bits, and similarly setup more calls to get through. and take down of individual VCs within a VP does There are obviously many assumptions behind the not require processing of individual signalling mes- above model, e.g., level of over-provisioning needed, sages. range of adaptation, etc., and it is not the intent of In the case of an Int-ServrRSVP IP network, the this paper to provide a comprehensive investigation granularity of QoS requests is currently determined of these issues. This would in itself be the topic of an through filters that specify destination address and entire paper. However, there are several other less port number, as well as source address and port controversial perspectives for why a coarser QoS number in some instancesŽ seewx 6 for details. . This may be of value. In particular, the same level of QoS corresponds to a very fine granularity of QoS guar- control may not be required in all environments. For antees, and while this provides end-users with accu- example, it may be warranted to provide tight per rate control, it has been identified as a potential flow delay control in an access switch or router, but problem for scalability. As a result, the goal of such fine control may be an overkill in a backbone aggregation is to allow such individual end-to-end switch running at OC-48Ž. 2.4 Gbps speed or higher. QoS guarantees, but without requiring awareness of Similarly, maintaining awareness of individual 16 individual flows on each and every segment of their kbps flows on an OC-48 link translates into a very path. This goal translates into a number of specific substantial and potentially costly amount of state requirements that must be satisfied by any aggrega- informationŽ there could be up to 150,000 such tion technique. Those requirements are best under- flows. , as well as a significant signalling load. As a stood through an example. R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 185

Consider a network topology consisting of three AS2 should be reversible, so that reservations in AS1 separate Autonomous SystemsŽ. AS , with the two and AS3 can default back to individual flows after edge AS, say, AS1 and AS3, corresponding to local crossing AS2. AS and the middle oneŽ. AS2 representing a back- The last two requirement, R3 and R4, are primar- bone interconnecting the two. For the purpose of our ily control path issues and represents the main inter- discussion, scalability is of concern only in the back- operability challenge that QoS aggregation faces. As bone AS2, i.e., AS1 and AS3 are capable of main- a result, we focus the discussion in the rest of this taining individual flow state information. Aggrega- section on this specific issue, and assume, as sug- tion of individual flows in AS2 should then satisfy gested inwx 9,10,15,16,36,41 , that data path aggrega- the following requirements: tion is achieved through the use of specific bit R1. AS2 should not have to maintain awareness of patterns in the IP header, e.g., the type-of-service individual flows between AS1 and AS3. In- Ž.TOS octet fieldwx 49 is used to specify different stead, AS2 should be able to map individual service classes and drop precedence 7. flows onto few internal service ``classes''. R2. Isolation between flows should be maintained in 5.1. End-to-end control path interoperability AS2, i.e., even when flows are aggregated into a common service class, the excess traffic of one In the context of aggregation, a key aspect of flow should not affect the performance guaran- service guarantees is that the resources allocated to a tees of another flow. set of aggregated flows should reflect the ``sum'' of R3. AS2 should ensure that it satisfies the QoS the reservation requests conveyed in individual reser- requirements of individual flows, e.g., the re- vation messages. There are two generic solutions that sources allocated to a service class in AS2 can be used to satisfy such a requirement, and they should be consistent with the aggregate re- can be broadly categorized as static and dynamic. sources required by all the individual flows Static solutions rely on provisioning and support of mapped onto it. different classes of service on backbone links, so that R4. Aggregation in AS2 should not prevent support ``soft'' guarantees can be provided between given for individual flow reservations in AS1 and ingress and egress points. Dynamic solutions involve AS3. the ability to adjust the level ofŽ. aggregate reserva- Requirement R1 is the core scalability require- tion of each service class on individual backbone ment expressed by AS2. Requirements R2 and R3 links as a function of the actual offered traffic. This specify properties, that have to be satisfied by the allows for ``harder'' service guarantees. mapping of individual RSVP flows onto the coarser Static solutions are obviously simpler and also ``classes'' of AS2. Requirement R2 states that the easily satisfy requirement R4 as dynamic reservation QoS guarantees provided to an individual flow must requests, e.g., RSVP PATH and RESV messages, are remain limited to its conformant packets to avoid not processed in the backbone. As a result, such affecting the service guarantees of other flows. This individual setup messages can be directly propagated in turn implies the need for some means to identify through the backbone, and require minimal addi- non-conformant packet even after flows have been tional processing at the ingress and egress points. For merged, e.g., packet marking as mentioned in Sec- example, update of ADSPEC fields in PATH mes- tion 4.1. Requirement R3 expresses the need for sages can be done using pre-configured information some coupling between the resourcesŽ bandwidth for each ingress. However, static solutions also have and buffer.Ž. and level of service priority assigned to limitations in how they handle changes in the back- a class in AS2, and the aggregation of the individual bone. In particular, route changes in the backbone flows mapped onto that class. Requirement R4ex- presses the important constraint that satisfying scala- bility in AS2, should not come at the expense of 7 Note that another possible approach is to rely on tunnels functionality in AS1 and AS3. In other words, the Ž.layer 2 or layer 3 , but we do not discuss this alternative here, aggregation of control and data path information in and refer the reader towx 32 for more details. 186 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 affect the accuracy of provisioning on backbone cal multiplexing gain when aggregating many flows links, as well as the ability to properly update the may allow a lower reservation level. Overall, the ADSPEC field to characterize the path actually taken situation is similar to what is described in Section through the backbone. 4.1 when using a FCFS scheduler to support Con- As a result, even if static solutions are likely to be trolled Load flows, i.e., the main requirements are to the first ones deployed, it is desirable to provide allocate a sufficient aggregate service rate and to solutions that allow a more accurate control of ser- control the buffer space that excess traffic can oc- vice guarantees while still enjoying the benefits of cupy. aggregation. Satisfying this goal has several require- The situation is more complex for Guaranteed ments and implications in the context of Int- Service flows. The main issue is that the provision of ServrRSVP services: individual delay guarantees is tightly coupled with 1. Disable processing of individual RSVP messages the ability to precisely guarantee a specific clearing in the backbone, while still allowing their identifi- Ž.service rate to a flow. Aggregation affects this cation when they arrive at egress or ingress ability as characterizing how individual flows share routers. the total rate allocated to them is a difficult, if not 2. Identify transparently, i.e., without relying on impossible, task. Such a characterization is feasible continuous interactions with routing, the egress when aggregation is limited to flows with a common routers corresponding to individual flows entering ingress and ingress pointsŽ seewx 32,50 for details. , the backbone at an ingress router. but not when aggregation is done locally at each 3. Properly update the ADSPEC of RSVP PATH backbone link, e.g., all Guaranteed Service flows are messages of individual flows at egress routers. assigned to the same queue. In the latter case, the 4. Reserve the appropriate amount of resources on aggregate service rate on a given backbone link is backbone links to satisfy the QoS requirements of not even known to either the ingress or egress routers individual flows routed over them. associated with the individual flows whose route The first two items are protocol specific, and can through the backbone includes that link. be handled either by using a separate signalling In order to support aggregated Guaranteed Service protocol for setting up aggregate reservation, e.g., flows, it is then necessary to change the ``node seewx 2,5 , or by reusing the RSVP protocol itself, model'' used to represent the backbone. A similar e.g.,wx 32 . In either cases, individual RSVP reserva- problem exist when crossing ATM networks, and tions are not processedŽ. hidden inside the backbone, this has been addressed by the ISSLL working group where separateŽ. aggregate messages are used to wx21 . The approach used by ISSLL is to characterize reserve resources based on the requirements of the an entire ATM network as a single Int-Serv node, individual flows being aggregated on each link. As that contributes only to the D error term, i.e., repre- far as QoS guarantees are concerned, the main chal- sent an ATM network as a delay-only node. The lenges lie in properly specifying aggregate reserva- value selected for the D term of the ATM network, tions and in ensuring that end-to-end service guaran- is an estimate of the delay bound available from the tees of individual flows are preserved. The complex- ATM network between the ingress and egress points. ity of those tasks varies with the type of the service, A similar approach can be used here by representing e.g., Controlled Load and Guaranteed Service, and is each IP backbone router as a delay-only node, which discussed further in the rest of this section. removes the previously mentioned dependency on In the case of an aggregated Controlled Load the unknown aggregate service rates. The main dif- reservation, it is only necessary that the queues ference with the case considered by ISSLL, is that assigned to Controlled Load traffic on backbone backbone IP routers can process and update the links be allocated the proper service rate. Since rate ADSPEC field 8. The characteristics of the path is an additive quantity, aggregate reservations can be based on the sum of the TSpec's of individual flows wx Žsee 61 for a discussion on how TSpec's are 8 This assumes that RSVP is the signalling protocol used to summed. , although the potential benefits of statisti- setup aggregate reservations. R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 187 through the backbone are, therefore, based on the Their inputs have helped shape much of the material nodes actually traversed, instead of relying on an in this paper. estimate of the network characteristics between the ingress and egress points. References 6. Summary wx1 A. Ahmadi, P. Chimento, R. Guerin,ÂÈ L. Gun, B. Lin, R. As described in Sections 2 and 3, a wide range of Onvural, T. Tedijanto, NBBS traffic management overview, data path mechanisms exist that enable support for IBM Systems Journal 34Ž.Ž 4 December 1995 . 604±628. wx QoS guarantees in packet networks. They offer a 2 W. Almesberger, J.-Y. Le Boudec, T. Ferrari, Scalable re- source reservation for the Internet, in: Proc. PROMS- broad choice of trade-offs between complexity, per- MmNet'97, Santiago, Chili, November 1997. formance, and strength of the guarantees being pro- wx3 J.C.R. Bennett, H. Zhang, Hierarchical packet fair queueing vided. New services have been and are being defined algorithms, in: Proc. SIGCOMM, Stanford University, CA, based on the availability of these mechanisms. How- August 1996, pp. 143±156. wx 2 ever, deployment of these services requires more 4 J.C.R. Bennett, H. Zhang, WF Q: worst-case fair weighted fair queueing, in: Proc. INFOCOM, San Francisco, CA, than data path innovation. Enhanced control path March 1996, pp. 120±128. mechanisms are needed to enable the specification of wx5 S. Berson, S. Vincent, Aggregation of Internet integrated service guarantees and the identification of who they services state, Internet Draft, draft-berson-classy-approach- apply to. As described in Section 4, progress has also 01.txt, November 1997, Work in progress. wx been made in developing such control path mecha- 6 R. BradenŽ. Ed. , L. Zhang, S. Berson, S. Herzog, S. Jamin, Resource reSerVation ProtocolŽ. RSVP version 1, functional nisms, but their deployment is facing a number of specification, Request for CommentsŽ. Proposed Standard challenges. RFC 2205, Internet Engineering Task Force, September 1997. In particular, the control path and data path cost wx7 I. Cidon, R. Guerin, A. Khamisy, Protective buffer manage- of QoS can vary greatly as a function of the environ- ment policies, IEEErACM Trans. Networking 2Ž.Ž 3 June ment where QoS is to be provided. As a result, it is 1994. 240±246. wx8 I. Cidon, R. Guerin, A. Khamisy, K. Sivarajan, Cell versus desirable to allow the use of different control path message level performances in ATM networks, Telecommun. and data path mechanisms in different environments. Sys. 5Ž. 1996 223±239. Specifically, it should be possible to support QoS wx9 D. Clark, Adding service discrimination to the Internet, guarantees using either per flow or aggregated mech- Policy 20Ž.Ž 3 April 1996 . 169±181. wx anisms based on the characteristics and capabilities 10 D. Clark, J. Wroclawski, An approach to service allocation in the Internet, Internet Draft, draft-clark-diff-svc-alloc-00.txt, of the environment where those guarantees are to be July 1997, Work in progress. provided. The challenge is then to allow interoper- wx11 R.L. Cruz, Service burstiness and dynamic burstiness mea- ability of those different mechanisms, so that end-to- sures: a framework, J. High Speed Networks 1Ž.Ž 2 1992 . end guarantees can be provided. Issues and possible 105±127. wx approaches to satisfy this goal were discussed in 12 R.L. Cruz, Quality of service guarantees in virtual circuit switched networks, IEEE J. Select. Areas Commun. 13Ž. 6 Section 5. Ž.August 1995 1048±1056. There are many unanswered questions when it wx13 G. Delp, J. Byrn, M. Branstad, K. Plotz, P. Leichty, A. comes to determining the appropriate QoS model for Slane, G. McClannahan, M. Carnevale, ATM function speci- each environment, and this paper certainly did not fication: CHARM introduction ± version 3.0, Technical Re- r attempt to answer them all. Instead, it tried to clarify port ROCH431-CHARM Intro, IBM AS 400 Division, LAN Development, February 1996. trade-offs and requirements in the hope that this wx14 A. Demers, S. Keshav, S. Shenker, Analysis and simulation would help focus future investigations. of a fair queueing algorithm, Journal of Internetworking: Research and Experience 1Ž. January 1990 3±26. wx15 E. Ellesson, S. Blake, A proposal for the format and seman- Acknowledgements tics of the TOS byte and traffic class byte in IPv4 and IPv6 headers, Internet Draft, draft-ellesson-tos-00.txt, November The authors would like to acknowledge their many 1997, Work in progress. colleagues at the IBM T.J. Watson Research Center wx16 P. Ferguson, Simple differential services: IP TOS and prece- for many spirited discussions on the topic of QoS. dence, delay indication, and drop preference, Internet Draft, 188 R. Guerin, V. PerisrComputer Networks 31() 1999 169±189

draft-ferguson-delay-drop-00.txt, November 1997, Work in Vancouver, British Columbia, Canada, September 1998, pp. progress. 29±40. wx17 V. Firoiu, J. Kurose, D. Towsley, Efficient admission control wx34 L. Gun,È An approximation method for capturing complex for EDF schedulers, in: Proc. INFOCOM, Kobe, Japan, traffic behavior in high speed networks, Performance Evalua- March 1997. tion 19Ž.Ž 1 January 1994 . 5±23. wx18 S. Floyd, V. Jacobson, gateways for wx35 L. Gun,È R. Guerin, and congestion congestion avoidance, IEEErACM Trans. Networking 1Ž. 4 control framework of the broadband network architecture, Ž.August 1993 397±413. Computer Networks and ISDN Systems 26Ž.Ž 1 September wx19 S. Floyd, V. Jacobson, Link-sharing and resource manage- 1993. 61±78. ment models for packet networks, IEEErACM Trans. Net- wx36 J. Heinanen, Use of the IPv4 TOS octet to support differen- working 3Ž.Ž 4 August 1995 . 365±386. tial services, Internet Draft, draft-heinanen-diff-tos-octet- wx20 I. Foster, C. KesselmanŽ. Eds. , The Grid: A Blueprint for the 00.txt, October 1997, Work in progress. New Computing Infrastructure, Morgan Kaufman, San Fran- wx37 S. Jamin, P.B. Danzig, S.J. Shenker, L. Zhang, Measure- cisco, CA, 1998. ment-based admission control algorithm for integrated ser- wx21 M.W. Garrett, M. Borden, Interoperation of controlled-load vice packet networks, IEEErACM Trans. Networking 5Ž. 1 service and guaranteed service with ATM, Internet Draft, Ž.February 1997 56±70. draft-ietf-issll-atm-mapping-04.txt, November 1997, Work in wx38 F. Kamoun, L. Kleinrock, Analysis of shared finite storage in progress. a node environment under general traffic wx22 L. Georgiadis, R. Guerin, A. Parekh, Optimal multiplexing conditions, IEEE Trans. Commun. COM-28Ž. 1980 992± on a single link: Delay and buffer requirements, IEEE Trans. 1003. Infor. Theory 43Ž.Ž 5 September 1997 . 1518±1535. wx39 M. Katevenis, S. Sidiropoulos, C. Courcoubetis, Weighted wx23 L. Georgiadis, R. Guerin, V. Peris, R. Rajan, Efficient round-robin cell multiplexing in a general-purpose ATM support or delay and rate guarantees in an Internet, in: Proc. switch chip, IEEE J. Select. Areas Commun. SAC-9Ž. 8 SIGCOMM, San Francisco, CA, August 1996, pp. 106±116. Ž.October 1991 1265±1279. wx24 L. Georgiadis, R. Guerin, V. Peris, K.N. Sivarajan, Efficient wx40 F.P. Kelly, Effective bandwidth at multi-class queues, network QoS provisioning based on per node , Queueing Systems 9Ž. September 1991 5±16. IEEErACM Trans. Networking 4Ž.Ž 4 August 1996 . 482± wx41 K. Kilkki, Simple integrated media accessŽ. SIMA , Internet 501. Draft, draft-kalevi-simple-media-access-01.txt, June 1997, wx25 R.J. Gibbens, P.J. Hunt, Effective bandwidths for the multi- Work in progress. type UAS channel, Queueing Systems 9Ž. September 1991 wx42 H. Kroner,È G. Hebuterne, P. Boyer, A. Gravey, Priority 17±28. management in ATM switching nodes, IEEE Trans. Com- wx26 S.J. Golestani, A self-clocked fair queueing scheme for mun. COM-9Ž.Ž 3 April 1991 . 418±427. broadband applications, in: Proc. INFOCOM, Toronto, wx43 J.-Y. Le Boudec, Application of network calculus to guaran- Canada, June 1994, pp. 636±646. teed service networks, IEEE Trans. Infor. Theory 44Ž. 3 wx27 S.J. Golestani, Network delay analysis of a class of fair Ž.May 1998 . queueing algorithms, IEEE J. Select. Areas Commun. 13Ž. 6 wx44 D. Lin, R. Morris, Dynamics of random early detection, in: Ž.August 1995 1057±1070. Proc. SIGCOMM, September 1997, pp. 127±137. wx28 P. Goyal, H. Chen, H. Vin, Start-time fair queueing: A wx45 A.K. Parekh, R.G. Gallager, A generalized processor sharing scheduling algorithm for approach to flow control in integrated services networks: The networks, in: Proc. SIGCOMM, Stanford University, CA, single-node case, IEEErACM Trans. Networking 1Ž.Ž 3 June August 1996, pp. 157±168. 1993. 344±357. wx29 P. Goyal, H. Vin, Generalized guaranteed rate scheduling wx46 A.K. Parekh, R.G. Gallager, A generalized processor sharing algorithms: A framework, Technical Report TR-95-30, De- approach to flow control in integrated services networks: The partment of Computer Sciences, University of Texas at multiple node case, IEEErACM Trans. Networking 2Ž. 2 Austin, 1995. Ž.April 1994 137±150. wx30 M. Grossglauser, D. Tse, Framework for robust measure- wx47 A.K.J. Parekh, A generalized processor sharing approach to ment-based admission control, in: Proc. SIGCOMM, Sophia flow control in integrated services networks, Ph.D. thesis, Antipolis, France, September 1997, pp. 237±248. Laboratory for Information and Decision Systems, Mas- wx31 R. Guerin, H. Ahmadi, M. Naghshineh, Equivalent capacity sachusetts Institute of Technology, Cambridge, MA 02139, and its application to bandwidth allocation in high-speed February 1992, No. LIDS-TH-2089. networks, IEEE J. Select. Areas Commun. SAC-9Ž.Ž 7 Sep- wx48 V. Peris, Architecture for guaranteed delay service in high tember 1991. 968±981. speed networks, Ph.D. thesis, University of Maryland, Col- wx32 R. Guerin, S. Blake, S. Herzog, Aggregating RSVP-based lege Park, 1997. QoS requests, Internet Draft, draft-guerin-aggreg-rsvp-00.txt, wx49 J. Postel, Internet protocol, Request for Comments, RFC 791, November 1997, Work in progress. Ž.standard , Internet Engineering Task Force, September 1981. wx33 R. Guerin, S. Kamat, V. Peris, R. Rajan, Scalable QoS wx50 S. Rampal, R. Guerin, Flow grouping for reducing reserva- provision through buffer management, in: Proc. SIGCOMM, tion requirements for Guaranteed Delay service, Internet R. Guerin, V. PerisrComputer Networks 31() 1999 169±189 189

Roch GuerinÂÃreceived the ``Diplome Draft, draft-rampal-flow-delay-service-01.txt, July 1997,  Work in progress. d'lngenieur'' from the Ecole Nationale wx51 A. Romanow, S. Floyd, Dynamics of TCP traffic over ATM Superieure des Telecommunications, networks, IEEE J. Sel. Areas Commun. 13Ž.Ž 4 May 1995 . Paris, France, in 1983, and his M.S. and 633±641. Ph.D. from the California Institute of wx52 P. SamudraŽ. Ed. . ATM User-Network Interface Ž UNI . sig- Technology, both in Electrical Engineer- nalling specification, version 4.0, ATM Forum Signalling ing, in 1984 and 1986 respectively. He Working Group, July 1996. recently joined the Department of Elec- wx53 H. Sariowan, A service curve approach to performance guar- trical Engineering of the University of antees in integrated service networks, Ph.D. thesis, Univer- Pennsylvania, where he is Professor and sity of California, San Diego, 1996. holds the chair in telecommunications. wx54 S. Shenker, C. Partridge, R. Guerin, Specification of guaran- Before joining the University of Penn- teed quality of service, Request for CommentsŽ Proposed sylvania, he had been with IBM at the Thomas J. Watson Re- Standard. RFC 2212, Internet Engineering Task Force, search Center, Yorktown Heights, New York, since 1986, where September 1997. he was the Manager of the Network Control and Services depart- ment prior to his departure. His current research interests are in wx55 S. Shenker, J. Wroclawski, General characterization parame- the general area of Quality-of-Service support in high-speed net- ters for integrated service network elements, Request for works, and in particular QoS routing and scheduling and buffer CommentsŽ. Proposed Standard RFC 2215, Internet Engi- management mechanisms. He is also interested in aggregation neering Task Force, September 1997. techniques, in terms of both protocols and service models, for wx56 M. Shreedhar, G. Varghese, Efficient using scalable service deployment. deficit round-robin, IEEErACM Trans. Networking 4Ž. 3 Ž.1996 375±385. wx57 D. Stiliadis, A. Varma, Frame-based fair queueing: A new Dr. Guerin is a member of Sigma Xi and a Senior member of the traffic scheduling algorithm for packet-switched networks, IEEE Communications Society. He is an editor for the IEEErACM in: Proc. SIGMETRICS'96, 1996. Transactions on Networking, and an area editor for the IEEE wx58 D. Stiliadis, A. Varma, Latency-rate servers: A general model Communications Surveys. He is the current chair for the IEEE for analysis of traffic scheduling algorithms, in: Proc. INFO- Technical Committee on Computer Communications, served at the COM, San Francisco, CA, April 1996, pp. 111±119. General Chair for the IEEE INFOCOM'98 conference, and has wx59 J. Turner, Maintaining high throughput during overload in been active in the organization of many conferences and work- ATM switches, in: Proc. INFOCOM, San Francisco, CA, shops such as the ACM SIGCOMM conference, the WEE ATM April 1996, pp. 287±295. Workshop, etc. He was an editor for the IEEE Transactions on wx60 J. Wroclawski, Specification of the controlled-load network Communications and the IEEE Communications Magazine. In element service, Request for CommentsŽ. Proposed Standard 1994 he received an IBM Outstanding Innovation Award for his RFC 2211, Internet Engineering Task Force, September 1997. work on traffic management in the BroadBand Services Network wx61 J. Wroclawski, The use of RSVP with IETF integrated Architecture. His email address is: [email protected]. services, Request for CommentsŽ. Proposed Standard RFC 2210, Internet Engineering Task Force, September 1997. wx62 H. Zhang, D. Ferrari, Rate-controlled service disciplines, J. High Speed Networks 3Ž.Ž 4 1995 . 389±412. Vinod Peris obtained the B. Tech. de- gree in Electrical Engineering from the Indian Institute of Technology, Kanpur, in 1989 and the M.S and Ph.D. degrees in Electrical Engineering from the Uni- versity of Maryland, College Park in 1992 and 1997 respectively. Since Jan- uary 1995 he has been with IBM at the T.J. Watson Research Center, Yorktown Heights, New York, where he is a Re- search Staff Member in the networking department. His research interests span a variety of topics in networking ranging from network security to Quality of Service. He was a co-recipient of the Best Paper Award at the ACM SIGMETRICS'94 conference.