<<

©ARTVILLE

oftware are becoming larger and more Web servers. We embody the control-theoretical methodol- complex. At the same time, they are being de- ogy for software quality-of-service (QoS) provisioning into a ployed in applications where performance guar- middleware called ControlWare. This middleware provides antees are required. Traditional approaches to a generic interface between the computing and control sub- providing these performance guarantees are systems of a software application and automates many not effective for a parts of the feedback control large class of software sys- By Tarek F. Abdelzaher, design and implementation tems.S Recently, for software systems. Middle- has been identified as a prom- John A. Stankovic, Chenyang Lu, ware solutions such as ising theoretical foundation Ronghua Zhang, and Ying Lu ControlWare are needed to re- for performance control in solve the challenges complex software applications, such as real-time scheduling, and provide analytic foundations to enable efficient soft- Web servers, multimedia control, storage managers, power ware performance control in next-generation perfor- control in CPUs, and routing in computer networks. This arti- mance-assured computing systems. cle describes advances in the application of control theory to software systems. We demonstrate the formulation of soft- ware performance assurance problems as those of feedback Software Performance Control control. We describe modeling the software system and pro- Although engineered physical systems carefully address vide an example of performance control in contemporary quality assurance, software system design evolved in a

Abdelzaher ([email protected]), Stankovic, Zhang, and Y. Lu are with the Department of Computer Science, University of Virginia, Charlottesville, VA 22904, U.S.A. C. Lu is with Washington University in St. Louis, MO 63130, U.S.A.

0272-1708/03/$17.00©2003IEEE 74 IEEE Control Systems Magazine June 2003 more ad hoc fashion, with less rigorous guarantees on per- measure current performance reliably, and iv) designing a formance and quality. Most software engineering research controller for the server. Solving these challenges permits is concerned with tools and paradigms that facilitate the de- the mathematical basis of control theory to support the per- velopment of functionally correct software. Functional cor- formance guarantees of software systems. We also demon- rectness was implicitly assumed to be an adequate software strate that a software system can be approximated by a quality metric. linearized model and describe the needed software actua- There are several notable exceptions to the assumption tors and sensors. Importantly, we create a taxonomy of the that functional correctness is adequate. For example, in most significant QoS assurance problems in the software lit- most embedded control systems, an important factor affect- erature and describe how they can be translated into a feed- ing quality of software is the timeliness of software re- back control formulation. sponse. Here, a delayed, but functionally correct, reaction We also present some experimental results for a to physical events in the environment can be as devastating Web-based application that show that the control-theoretic as a physical equipment malfunction. This observation approach is feasible for software systems. Although our ap- gives rise to research that attempts to design software with proach can be applied to other services as well (and we pro- predictable nonfunctional properties such as timeliness, se- vide brief explanations of those), we have chosen to focus on curity, availability, and performance. These properties are controlling the performance of Web applications due to the often referred to as software QoS attributes. increasing importance of the Web, spurred by the phenome- Traditional performance analysis of embedded control ap- nal growth of the Internet. The Web is the largest and most plications relies on worst-case estimates of load and resource visible client-server application in existence today. It is an ex- availability. A system whose performance is guaranteed in cellent vehicle for investigating fundamental problems of dis- worst-case conditions will not violate its performance spec- tributed client-server computing such as that of overload ifications under more favorable circumstances. The notion protection and providing performance guarantees. of QoS guarantees, however, is needed today in a much In the rest of this article, we describe the internal archi- larger class of applications, which, unlike closed embedded tecture of a typical Web server, present its dynamic model, systems, operate in open, unpredictable environments. The and elaborate on the equivalents of sensors and actuators worst-case scenario in such environments is not known a in this computing system. We then map the most important priori or is very pessimistic, rendering traditional software performance assurance problems into those of worst-case analysis impractical. feedback control and piece these elements together in a The need for QoS guarantees in open systems is fueled in case study on Web-server performance control. We also de- part by the of modern global communication scribe briefly other examples of the use of control theory for networks such as the Internet and by the increasing prolifer- QoS guarantees. Finally, we introduce a middleware service ation of globally accessible digital services such as online that generalizes the control-theoretic approach, implement- banking, trading, and distance learning. Such services rep- ing a new paradigm for service performance guarantees in resent points of massive aggregation, which may suffer from software systems. The article concludes with a brief discus- unpredictable loads, potential bottlenecks, and security sion of important results and remaining challenges. breaches such as denial of service attacks. Failure to meet acceptable performance specifications may result in loss of Inside a QoS-Aware Service customers, financial damage, or liability violations. Existing A successful architecture for controlling the performance of approaches for designing performance-guaranteed comput- a computing system begins with an understanding of the dy- ing systems that rely on a priori workload and resource namics that affect performance and the mechanisms avail- knowledge are no longer applicable. Instead, the predomi- able to manipulate those dynamics. Here, we are concerned nant practice for providing QoS assurances today is specifically with performance attributes that involve a no- overdesign. This practice results in costly systems with un- tion of time, since they lend themselves more naturally to a certain assurances regarding performance. Putting perfor- feedback-control framework. Generally, the most common mance guarantees on a solid analytic foundation in such of these performance attributes are classified into two cate- systems is an important challenge. gories depending on whether they are directly proportional In this article, we show how classical feedback control of- or inversely proportional to time. The first category in- fers a solution to the problem of achieving performance cludes performance metrics such as queuing delays, execu- guarantees for this new category of applications and dis- tion latencies, and service response times. The second cuss the challenges involved in realizing this approach. The category includes metrics such as connection bandwidth, main challenges in implementing a feedback-based QoS con- service throughput, and packet rate. We call the two catego- trol solution in computing systems are i) analyzing the soft- ries delay metrics and rate metrics, respectively. There are ware architecture to model it as a feedback , also derived metrics defined as ratios between other met- ii) mapping the particular QoS control problem into a sys- rics, for example, the relative delay of two traffic classes or tem of feedback loops, iii) choosing a proper actuator that the hit ratio (i.e., the fraction of hit rate to the total request can affect server resource allocation and a monitor that can rate) of a cache.

June 2003 IEEE Control Systems Magazine 75 Time-related performance attributes can generally be kernel entity that hands each request to a different worker controlled by adjusting resource allocation. Queuing theory thread. Worker threads that have requests to serve become has often been used to predict performance, given a particu- runnable and are queued for the CPU. The order in which lar resource allocation, or to determine how resources threads get the CPU to execute is determined by the CPU should be allocated to yield a particular performance level scheduling policy. This policy maintains a priority queue [1]. In general, if the service is modeled as a queuing system, called the ready queue. The thread at the top of this queue ex- allocating more resources to the queue’s server will reduce ecutes for a particular time quantum or until it is blocked. Re- the mean service time, decrease the mean queuing delays, quest processing by a worker thread typically entails access and increase the average service rate. Unfortunately, queu- to one or more auxiliary server resources, the most notable ing theory generally requires assumptions about the input being disk input/output (I/O). For example, in a Web server, traffic arrival pattern that are not always accurate, leading disk I/O is usually needed to read the requested Web page to potentially poor predictions regarding performance. For from disk. Access to auxiliary resources blocks the calling example, a significant body of queuing literature applies to thread, at which time it is queued for I/O until the awaited re- source becomes available. Each re- source usually has a queue that We demonstrate that a software system determines the order in which ac- cesses are served. The resource is canbeapproximatedbyalinearized made available to the thread at the top of the queue, at which time the model and describe the needed software corresponding thread becomes runnable again and reenters the CPU actuators and sensors. ready queue. When request process- ing is done, the worker thread sends a Poisson arrivals, whereas many common arrival patterns, response back to the client. Sending the response entails such as those of Web requests, are known to follow a enqueuing data into the outgoing packet queue for transmis- heavy-tailed distribution [2], [3]. Even when all assump- sion on the network. tions are accurate, queuing theory, by , offers only Figure 1(a) illustrates the aforementioned main com- predictions on average behavior. In a QoS-aware system, ponents of a software server, including the input client re- stronger guarantees are often required. For example, it is quest queue, the CPU ready queue, the resource I/O not enough that the frame rate of a streaming be 30 queue, and the outgoing network queue. Numbered ar- frames/s on average. Instead, guarantees are usually re- rows depict the progress of requests from one queue to quired on maximum deviation and recovery time from tran- another. Worker threads are also shown. The black circle sient perturbations around the nominal rate. on each thread represents the current position of thread execution. We are especially interested in the case were Service Architecture the server is operating at a nontrivial load. Hence, re- sponse time is dominated by queuing delays rather than To design feedback loops for software performance control, service times. For the server to yield acceptable perfor- we need to understand the basic components of a software mance, the order of request service times must be much system and how they interact. We focus on an important cat- egory of software systems where feedback control is partic- smaller than the order of tolerable server response times. If it takes C units of time to serve request i, and if D is the ularly important, namely, software servers. In this section, i i we present the typical design of a multithreaded server and maximum tolerable server response time, then typically ∀<< describe how this design yields a system model suitable for ijC,: ij D. We call this condition the liquid task model. feedback control. The model is representative of high-performance servers Consider a distributed client-server system in which a that handle many thousands of requests per second. For succession of requests arrives at a server from clients example, in the case of Web servers, a worst-case tolera- across a communication network. The server acts as a point ble response time would be of the order of seconds to tens of aggregation of client requests. The performance ob- of seconds, whereas a typical request service time would served for a request at the server depends on the path the be of the order of hundreds of microseconds to single mil- request takes inside the server until it is served. In a typical liseconds. The liquid task model represents the limiting multithreaded server, independently schedulable entities case in which tolerable response times are generally fi- called worker threads execute the arriving sequence of cli- nite, yet the service times are practically infinitesimal. ∀→ ent requests. Each thread implements a loop that processes More formally, in the liquid task model, iC:0i and ∀→ incoming requests and generates responses. iC:/ii D 0. Although this model is an idealization, re- Client requests are first read from an input queue (such as sults based on this model hold well in systems exhibiting the server transfer control protocol (TCP) socket queue) by a a large number of small requests.

76 IEEE Control Systems Magazine June 2003 In a high-performance server approximated by a liquid that sampling instant. A server can be modeled by an nth task model, the progress of requests through the server order ARMA difference equation relating uk()to yk(). The queues resembles a fluid flow. The service rate, dNk ()/ t dt, system order n represents the length of history that deter- of stage k, defines the amount of flow through that stage, mines the current server performance. Figure 1(b) suggests where Ntk ()is the total number of requests served by this that the difference equation can be derived from a state stage by time t. The different queues in Figure 1(a) can there- space representation of the server model fore be modeled as capacities that accumulate the corre- =−+ sponding flows. The number of requests queued at stage k, xx()kAk (1 ) buk () denoted V , is a quantity akin to volume, given by yk()= Cx (), k t k VFFdt=−(), where F is the service rate of stage k k ∫−∞ in k k = (i.e., FdNtdtkk()/ ) and Fin is the request arrival rate to where x()k is the state vector and A, b, and C describe the that stage. Queues also offer points at which flows, Fk , can system model. Representing software by be manipulated. Figure 1(b) depicts the server from a con- difference equation models has recently gained much popu- trol perspective where capacities are represented by water larity. For example, in [4] a model of a Web proxy cache is de- tanks. Observe that “valves” in Figure 1(b) represent points rived, and in [5] a model of TCP dynamics is presented in the of control (i.e., actuators, which manipulate the service rates Fk ). We assume in this analogy that flow through the valve depends only on valve Server opening and not on the liquid level. Thus, the ar- Outgoing rangement is perhaps more akin to a pump. Worker Network I/O Note that the fluid flow analogy does not Threads 6 Network I/O Output to Clients make assumptions on how individual requests Queue are prioritized. It simply allows us to describe 4 the dynamics of request flows and queue fill lev- Resource els. In contrast, queuing theory and real-time 3 I/O Queue Access OS Scheduler scheduling theory have well-understood foun- 5 dations for relating aggregate metrics such as Request queue length, total workload, or total utilization 2 CPU Ready Queue Dequeuing to delays seen by individual requests under a particular prioritization policy. Hence, combin- Client ing control theory with real-time scheduling or Request queuing analysis, one can develop feedback Queue loops to maintain appropriate queue fill levels that are guaranteed to produce the desired cli- 1 ent delays under different request prioritization (a) schemes. Most of today’s servers implement first-in, first-out (FIFO) queuing on resources Server 6 (such as socket queues and semaphore queues) Outgoing V4 and are composed of pools of same-priority Worker Network I/O threads or processes. Hence, in this article, we Threads 4 Network Output to Clients assume that queues are FIFO. Multiple client Queue classes may exist, however, each with its own V3 queue of a different priority. Resource 3 I/O Queue Access CPU Queue 5 2 Service Modeling V2 Internet servers described by the liquid task Request model have dynamics akin to those of flow due Dequeuing to their intrinsic queuing structures. This ob- servation motivates the use of difference equa- 1 Client V1 tions to model Internet servers. Let yk()denote Request the average performance (e.g., delay or Queue throughput) reported at the kth sampling in- stant. This variable reflects a measurement car- (b) ried out during the most recent sampling Figure 1. Server architecture: (a) the computing model and (b) the control- interval. Let uk()denote the control input at oriented representation.

June 2003 IEEE Control Systems Magazine 77 presence of network active queue management based on Input Flow Actuators RED gateways [6]. An input flow actuator manipulates the input workload of a

Derivation of computing system dynamics can be difficult server, Fin . In software systems, admission control is a pri- in the absence of complete knowledge of software system mary input flow actuator that affects queue fill levels in the code due to hidden interactions between different compo- server. In the simplest case, admission control limits the nents in such systems. For example, a semaphore used for number of clients who access the server concurrently by can create a waiting queue that affects sys- denying service to some of them, hence reducing the load tem dynamics. Yet the presence of such a semaphore might seen by the server. The actuator accepts as input a control not be discovered without access to system code. Further- variable mthat determines the desired average flow. This is more, deriving system dynamics from first principles may be analogous to controlling the opening of input valve V1 in Fig- impractical in some situations for administrative reasons. ure 1(b) and discarding all requests that overflow. A more in- For example, the model parameters may need to be recom- telligent admission control decides, on a puted every time the server’s hardware or software configu- request-by-request basis, whether service should be rations are changed (e.g., due to a system upgrade), since granted (request admitted) or denied (request rejected), such changes may affect platform speed and consequently perhaps depending on client identity, queue fill level, and the rates and capacities in the system. Moreover, system ad- server utilization. Input flow actuators are used when some ministrators usually lack the expertise to perform con- clients are inherently more important than others such that trol-theoretic modeling. Therefore, a more practical denying service to less important clients is acceptable in fa- approach may be to perform automatic parameter estima- vor of higher priority clients. Examples of the control-theo- tion (e.g., using a least-squares estimator). System identifica- retic formulation of admission control schemes in real-time tion requires adding software modules to instrument a systems are described in [8] and [9]. running software system and iteratively estimate the param- eters of difference equation models based on system input and output. A successful application of the approach is de- Quality Actuators scribed in [7] for a Web proxy cache to determine the relation A more flexible way of controlling flow is to alter the pro- between disk allocation and cache hit ratio. cessing requirements of the request. For example, reducing A key prerequisite to system modeling is to define the ac- the processing requirements increases service rate (i.e., in- = tuators and their manipulated variables (i.e., the valves in creases FdNdtkk/ ), which is desired when the server ap- Figure 1(b)). These actuators represent the main interface proaches overload. To reduce processing requirements, between the control subsystem and the computing subsys- the service offers intermediate “degraded” service levels. tem in the feedback control loop. In the next section, we de- In a Web server, a degraded level of service can correspond scribe the types of actuators in computing systems and to providing an abbreviated version of content or a lower their principles of operation. quality version of images and sound. The approach is often followed manually by editors of various sports and news Web sites such as the Cable News Network (CNN), Resource Allocation for QoS Guarantees www.cnn.com, upon important breaking news. An example Most time-related performance metrics in software systems was the obvious abbreviation of the CNN front page in the can be tied to the state of the queues depicted in Figure 1. first hours following the September 11 events in 2001. Two Delay metrics such as response time and latency are gener- important questions are i) how much content should be ab- ally related to the queue (tank) fill levels. Higher fill levels breviated and ii) what fraction of clients need to receive the imply longer delays and vice versa. Rate metrics such as ser- abbreviated version for the overload to be avoided. An au- vice throughput are generally related to the dequeue rates tomatic Web content adaptation scheme that answers (i.e., flows through the valves). Higher rates imply higher these questions by using feedback control theory is de- throughput. Both sets of metrics can be affected by control- scribed in [10]. The scheme dynamically selects a suitable ling the different flows in the system. The flow (i.e., the ser- content version for each client on a per-request basis de- vice rate) of a stage in the software service depends on the pending on load conditions. Content adaptation has also amount of computing resources allocated to this stage and been discussed at length in multimedia applications. A the operation being performed. Allocating more resources good survey is found in [11]. will generally increase the flow the same way opening a Generally, a quality adaptation actuator offers a flexible valve would. Thus, to control performance, we need actua- tradeoff between delay and quality. Since servers don’t have tors that manage resource allocation or alter the functional- unlimited queue space, some compromise is often unavoid- ity of the server in a way that manipulates the rate at which able to prevent server queue overflow at extreme load con- work is done. The feasibility of implementing such actua- ditions. An advantage of quality adaptation actuators over tors in a computing system is one basic reason why control admission control schemes is that service is provided to all theory is applicable. Actuators in computing systems fall clients, albeit potentially at a degraded level. For effective into three basic types as described below. load reduction, it is important that the actuator control the

78 IEEE Control Systems Magazine June 2003 bottleneck resource. For example, if the bottleneck re- objective of this approach is to ensure that individual loops source is the processor,CPU-intensive operations should be may operate in isolation without violating the constant-sum reduced. If the bottleneck is the network, the number of sent invariant. We demonstrate how the above is achieved with bytes should be reduced. The identity of the bottleneck can an example. Consider a multiclass service that offers some be measured dynamically by measuring the utilization of classes of requests better performance than others by allo- different resources. cating resource space among different request classes ap- To approximate flow, a quality adaptation actuator propriately. Let the measured performance of request classi should provide a continuous range of service rates, Fk , even be Hi . One can think of Hi as the output of the ith control when the server exports only a finite (small) number of dif- loop. A very common model for discrimination in a ferent service levels. Consider a general server with M dis- multiclass environment is the relative differentiated ser- crete service levels that differ in their consumption of the vices model [13], [14]. In this model, a differentiation policy bottleneck resource. Let these levels be numbered 1,...,M specifies that the measured performance of different K = from lowest quality to highest quality. The level 0 may be classes should be related by the expression: HH12::: Hn added to denote the special case of request rejection. The CC12: : ... : Cn , whereCi is the importance weight of classi.Ina actuator accepts as input the control variable m in the range control-theoretic formulation, we define the relative perfor- =+++L [,0 M ]. An essential challenge in actuator design is that of a mance, Ri , of class i as RHHHii/(12 H n ). It deter- unique mapping from m to the manipulated variables; mines how the class is performing relative to other classes. namely, the fraction of clients served at each QoS level to The desired relative performance of class i should be produce smooth flow control. RCCCC=+++/(L ). This represents the set point iidesired 12 n To resolve the aforementioned challenge, the authors of for the class. This formulation is akin to ratio control [15]. The difference RR− is the performance error e of class [12] propose to decompose the fractional value of input, m, iidesired i into an integral part I and a fraction F , such that mI=+ F. i. Note that the aggregate performance error of the system is The two nearest integers to m (namely, I and I +1, where always zero, because ImI<<+1) determine the two most appropriate service ∑∑eRR=−() levels at which clients must be served under the given load 11≤≤in i ≤≤in idesired i conditions. The fractional part F determines the fraction of ∑ Ci ∑ Hi clients served at each of the two levels. In effect, m is inter- = 1≤≤in − 1≤≤in CC+++L C HH+++L H preted to mean that a fraction 1 − F of clients must be served 12 n 12 n =−= at level I and a fraction F at level I +1. The scheme works 11 0. well when the input average request arrival rate does not change quickly. Server utilization increases (possibly The aforementioned problem formulation was used in nonlinearly) when m is increased and vice versa. At the up- the differentiated caching services architecture described per extreme, mM= , all requests are given highest quality in [4] and [7]. The authors developed a Web proxy cache service. At the lower extreme, m = 0, all requests are re- that can give some Web content preferential treatment. In jected. Hence, the actuator changes the amount of load on a this example, the performance metric being controlled is server with discrete service levels in a continuous fashion, cache hit ratio and the resource being allocated is disk depending on its input m. space. There are n content classes in the system. Each class of content i is assigned a different amount of cache storage ∑ si , such that iis is the total size of the cache. A controller is Resource Reallocation Actuators invoked at fixed intervals for each class at which it corrects A different way of controlling flow (other than quality adapta- resource allocation to that class based on the measured per- δ tion) is to alter the amount of resources available for the pro- formance error.To compute the correction ski[]in resource cessing of the request queue. This important category of allocation, the controller uses a linear function fe()i , where = δ resource management actuators typically arises in servers f ()00. If the computed correction ski[]is positive, the δ with multiple classes of clients. In such servers, it is often de- space allocated to class i is increased by|[]|ski . Otherwise, sired to maintain some constant ratio between the perfor- it is decreased by that amount. Since the function f is linear, ∑=∑ ∑= mance of different client classes. The computing resources are iifek( [ ]) f ( ii ek [ ]). Since we showed that iiek[] 0,it ∑== usually partitioned among these classes such that the sum of follows that iifek([])() f00. Hence, the sum of correc- all partitions is constant and equal to the total resource capac- tions across all classes is zero (i.e., the constant-sum invari- ity of the service. We call this condition the constant-sum in- ant is maintained). variant. Individual actuators can alter the partitioning, but the Many computing systems with multiple client classes total resource allocation must remain constant. We call such use similar per-class feedback loops [16], [17]. The differ- actuators resource reallocation actuators. ence between these systems lies generally in the resource In a multiclass server, a separate control loop is often as- being allocated by the actuators. For example, instead of sociated with each client class. An actuator in each loop al- manipulating storage allocation, actuators can be built to ters the resources allocated to it. An important design manipulate the number of worker processes (and hence

June 2003 IEEE Control Systems Magazine 79 CPU capacity) allocated to a class [16] or the fraction of net- QoS control problems such as absolute convergence work link bandwidth allocated to a flow [17]. guarantees, performance isolation, statistical multiplexing, This section discussed the natural existence of actuators prioritization, relative differentiated service guarantees, in computing systems, which makes it possible to imple- and optimization guarantees. The fundamental building ment the “valves” that appear in Figure 1(b). Another impor- block in these templates is one that implements the basic tant cornerstone of applying feedback control to computing (absolute) convergence guarantee. Interconnecting such systems is the existence of a natural translation from com- blocks can lead to formulating more complex guarantees mon QoS assurance problems into those of feedback con- such as relative guarantees, prioritization, and optimization trol. This topic is covered in the next section. as feedback control problems.

QoS Mapping A cornerstone of a control-theoretic paradigm for QoS guar- The Absolute Convergence Guarantee antees in software systems lies in the ability to convert com- Since it is impossible to achieve absolute guarantees in a mon resource management and software performance system where load and resources are not known a priori, we assurance problems into feedback control problems. One define the absolute guarantee problem as one of conver- can think of each QoS control problem as having a corre- gence to a specified performance. The statement of the sponding control-loop instantiation that describes how this problem is to ensure that a performance metric, R,i)con- particular QoS control problem is solved using feedback verges within a specified exponentially decaying envelope control. We call such an instantiation a control-loop tem- to a fixed value, Rdesired, and that ii) the maximum deviation − plate. Here we describe control-loop templates for the main RRdesired is bounded at all times, as shown in Figure 2(a).

R Approximate System Model Resource Specified Maximum Deviation Performance Allocation Error Correction Actual Performance, R Actual R Performance des. Time Set Point Actuator Software Performance Controller (Resource Allocator) System

Specified Decay Envelope Performance Measured Sensor Performance (a) (b)

Approximate System Model Admitted Unused First-Class Clients Approximate Capacity Correction Class System Model First-Class Resource Admitted Allocation Consumption Unused Clients Actuator Software Capacity Controller (Resource Allocator) System mi Resource Virtual-Estate Consumption Allocation Actuator Software Controller (Resource Allocator) System Performance Measured Sensor Consumption of First-Class Clients mb Performance Sensor Leftover Measured Capacity Admitted Consumption of First-Class Clients Unused Second-Class Clients Capacity Total Correction Class Leftover Resource Capacity Consumption Admitted Best-Effort Actuator Software Clients Controller (Resource Allocator) System Best-Effort Resource Allocation Actuator Software Consumption Controller (Resource Allocator) System Performance Measured Sensor Consumption of Second-Class Clients Performance Leftover Sensor Capacity Measured Consumption of Second-Class Clients

(c) (d)

Figure 2. Control loop templates: (a) the absolute guarantee specification, (b) basic loop, (c) prioritization, and (d) excess capacity management.

80 IEEE Control Systems Magazine June 2003 The absolute convergence guarantee is translated into the be an admission controller for requests for the particular control loop shown in Figure 2(b). The loop samples the mea- site. One control loop is needed per Web site. sured performance, compares it to the desired value Rdesired, and uses the difference to induce changes in resource alloca- tion via the actuator AR(). The absolute convergence guaran- The Prioritization Guarantee tee loop is the elementary building block from which all other To see how prioritization can be implemented using feedback QoS assurances follow. In the context of time-related perfor- control, consider a sampled system in which a large number mance metrics, it is interesting to classify the convergence of new service requests is introduced at each sampling time. guarantee loops depending on the performance variable be- Requests are classified by priority and are enqueued in a sep- ing controlled. As is the case with physical plants, the con- arate queue depending on their class. The scheduling policy trolled output of interest affects the model of the system and depletes one queue at a time in priority order during the sam- whether or not the control loop is linear. pling interval. By the end of the sampling interval, some queues will be fully depleted, at most one queue will be par- • Rate and queue-length control: To a first approxima- tially depleted (the one the scheduler was working on when tion, rate metrics and queue length are easiest to con- the next sampling time arrived), and the rest will remain un- trol because they result in linear feedback loops. The touched. To emulate this effect with control loops, imagine (flow) rate can be controlled directly by the actuators. each queue being a separate pipe with its own control valve. Queue length can be linearly controlled by controlling Given server capacity, the valves of queues that can be fully the flow. The simplest example of a loop of this cate- depleted are saturated in the open position. The valve of the gory is a server utilization control loop. Such a loop, next queue is controlled to dequeue exactly the right fraction for instance, can be used to maintain high server utili- of requests within the sampling interval to fully utilize any zation while avoiding overload. In this section we leftover server capacity. All the remaining (lower priority) make several uses of utilization control as a basic valves are closed. Below we illustrate a feedback control building block for other types of guarantees. scheme that achieves the above. Note that the scheme is un- • Delay control: Delay guarantees are more difficult to conventional in that at most one loop is operating at any provide. This is because delay is inversely propor- given time outside of saturation limits. The rest are saturated tional to flow. If a request arrives at a queue of lengthQ, in either the fully open or the fully closed position as ex- with a dequeue rate of r, the queuing delay d of the re- plained above. Integrator antiwindup is used to prevent un- quest is dQr= / . Generally, since the rate r changes Q reasonable integral action buildup in the controller during over time, the delay of this request is ddqr=∫ / . The 0 saturation periods. inverse relation between the manipulated variable (rate) and the delay makes the control loop nonlinear. Let there be M priority classes defined within a server Note that, although level control (i.e., queue fill-level such that priority 1 is highest and priority M is lowest. Col- control) is a very common problem in physical plants, lectively, clients of the server are allocated a target utiliza- * one generally does not care how long it takes a particu- tionU . This capacity should be made available to clients in lar liquid molecule to leave the tank. Thus, the nonlin- priority order. A resource allocation loop can be created ear delay control problem is less commonly that gives the entire server capacity to the highest priority encountered in physical . We believe class. The unused capacity of each class is measured and that an efficient solution to this type of nonlinearity will treated as the set point for the resource allocation to lower have great impact on software QoS-control research. priority classes. If this capacity is not enough, these clients are degraded or rejected accordingly by the actuator in their utilization control loop. The architecture is described for a two-class server in Figure 2(c). One control loop is The Resource Reservation Guarantee needed per class. In some applications, it is necessary to guarantee that cer- For illustration, consider the operation of a system with tain resources be reserved. The absolute guarantee solu- two priority classes. When the high-priority class is getting tion, described above, is particularly useful for resource more than its resource share, the controller error of the reservation for the purpose of isolating applications that high-priority loop is negative. In this case, i) the class is con- share common resources. For example, in a Web-hosting ap- trolled to reduce its consumption to the allotted share, and plication, the Web server may offer a “virtual estate,” such ii) the lower priority class has a zero resource allocation. as a percentage of server capacity, for sale to hosted Web The first condition is ensured by the loop of the higher prior- sites. The particular capacity allocation purchased by a ity class. The second is ensured by the other loop since its given site can be enforced using the control loop in Figure set point will be negative in this case, causing its actuator 2(b). The amount of server capacity owned by the site will (e.g., the admission controller) to saturate at a fully closed represent the utilization control-loop set point. The actual position. Conversely, when the higher priority class does amount of resources used can be measured dynamically. not consume all its allocated resources, the error (leftover The difference controls the actuator, which in this case can resources) is positive, which is the set point to the lower pri-

June 2003 IEEE Control Systems Magazine 81 ority loop. The actuator of the high-priority class is satu- antee is to formulate per-class relative performance set rated at a fully open position, admitting all requests of this points in the form RCCCC=+++/(L )and compare iidesired 12 n =+++L class. The loop for the lower priority class makes sure only each against RHii/( HH12 H n ). Note that this loop as many requests are admitted as permitted by the leftover (same as in Figure 2(b), but with the set point defined above) resources from the high-priority class. Hence, the loops em- is nonlinear, since the sensor divides outputs of different ulate priority semantics. loops to provide feedback on the relative performance of in- dividual classes, as well as because flow is inversely propor- The Statistical Multiplexing Guarantee tional to delay. One shortcoming of resource reservation is that it can lead It is possible to formulate the relative guarantee problem to unnecessary resource underutilization when some re- in a way that avoids the former nonlinearity. In this case, each output H is multiplied by the constant WCCCC= K / . source owner does not use all the resources allocated to it. i ini12 This is objectionable if some other owner is starving for re- Note that when the performance metric is satisfied (i.e., when HH: :K : H= CC : : ... : C), the ratio HC/ is equal for all sources in the meantime. Hence, a statistical multiplexing 12nn 12 ii classes. Thus, the products HW are equal for each class i. scheme is needed whenever spare capacity exists on the ii machine. Individual resource allocations should be en- Let the set point be the average of these products (i.e., RHWn=∑n ()/). Hence, when the performance met- forced only when the entire machine is overloaded. These desired i = 1 ii ric is satisfied, all HW are equal to the set point. When the two requirements can be simultaneously satisfied by the ii metric is not satisfied, the error e of class i is given by control loop set in Figure 2(d). i eR=− HW. A control loop is designed per class with To describe how statistical multiplexing is achieved, as- iiidesired sume that the controller output in the utilization control the objective of driving the error of that class to zero. The control loop in this case is linear, since Wi for a given class is loop of a server i is mi . All surplus requests are treated as a lower priority server, called the best effort server. Let the constant. However, the control loops of different classes are controller output of the utilization control loop of the best coupled through their common set point, which is a weighted sum of the outputs of all loops. effort server be mb . Sharing of spare capacity is achieved us- ing a variation of a min-max selector scheme [15] in which Finally, yet another way of defining the control loops is to the objective is to minimize degradation of premium traffic. define set points on pairwise relative performance metrics CC n − In this scheme, the service level of a request for a given ii/ − 1. Only 1loops are then needed. Each loop reduces eCC=− HH first-class serveri is determined by the actuator of premium the corresponding error iii//−−11 iito zero. In > this case, each loop decides on the ratio of resource alloca- traffic if mmiband with the actuator of best effort traffic otherwise. Thus, the request is handled according to the tion between two adjacent classes. Resources are then glob- ally reallocated such that i) all pairwise ratios are satisfied higher of mi and mb . When the first-class server is over- > and ii) the sum of allocated resources is equal to the total re- loaded while the machine as a whole is not, mmbi. Conse- quently, incoming requests are served with quality source capacity. determined by mb , which is higher than that warranted by m , thus utilizing excess machine capacity. On the other i The Utility Optimization Guarantee hand, if the machine is overloaded, mm< . Consequently, bi Another type of performance guarantee addressable us- the quality of content delivered by serveri is determined by ing a control-theoretic framework is that of utility optimi- m . Thus, the individual server is prevented from exceeding i zation. Following a microeconomic model [18], consider a its capacity allocation. The mechanism allows smooth and computing service that produces an amount of work w. informed switching between a mode of operation where an Let the benefit per unit of work be k. Hence, the total util- individual server i is allowed to exceed its capacity alloca- ity U produced by the service is Ukw= . Let the resource tion and a mode of operation where it is required to stay consumption of the service be some nonlinear function, within its original capacity. gw(), which represents a measure of cost. It is desired to achieve the maximum net profit (i.e., maximize The Relative Guarantee kw− g() w ). Assuming a concave cost function, gw(), the In some applications, it is desirable that the ratio between the profit is maximized when the marginal utility is equal to performance of different classes of work be fixed; for exam- the marginal cost, or when (()/)dg w dw= k. The equation ple, it may be that the delays of two traffic classes in a net- can be solved for w, which then becomes the control set work should be fixed at a ratio of 3:1. This fixed ratio is a good point, R. In a computing example, w may be the desired candidate for the performance set point, R. We can therefore server utilization, the desired workload size, or other pose relative guarantees as a variation of the ratio control metric, depending on the problem formulation. The re- problem [15]. Generally, if there are n request classes in the sulting feedback loop is illustrated in Figure 2(b), where system, and if Hi is the measured performance of class i, the the set point is derived as described above. K = relative guarantee specifies that HH12: : : Hnn CC 12 : : ... : C, In summary, we have identified the main types of perfor- whereCi is the weight of classi. One way to achieve this guar- mance guarantees that cover the performance requirements

82 IEEE Control Systems Magazine June 2003 of many software systems. For each type of guarantee, we over the TCP connection; 3) the server processes the have specified a feedback control-loop template that can be request and generates a response; 4) the server sends the used as a basis for providing that type of guarantee. We be- response back to the client over the TCP connection; 5) to lieve that these templates, and the nonlinearities they ex- amortize the overhead of TCP connection establishment, hibit, provide an interesting opportunity for exploring new the TCP connection is left open after the response has been control techniques that target the characteristic couplings transmitted in anticipation that the same TCP connection and nonlinearities of loops implementing guarantees on time- can be reused by a following HTTP request from the same related software performance metrics. client; if a new HTTP request arrives within a configurable keep alive interval (e.g., 15 s), the TCP connection is kept Feedback-Based Solutions open; otherwise the connection is closed. This feature is In this section, we bring together the concepts introduced called “persistent connections.” above by describing several successful applications of the From the client’s perspective, the delay of an HTTP re- control-theoretic approach in the software domain. We first quest includes three components: the connection delay on give a detailed example involving Web servers and then briefly the server for establishing a TCP connection with a server introduce other applications on Internet servers, CPU sched- process, the processing delay on the server for local compu- uling, networking, and microprocessor architectures. The suc- tations (network protocol processing, open/read files, and cess of feedback control theory in these different domains running CGI scripts), and the network delays for transmit- demonstrates the generality and strength of the control-theo- ting the TCP connection request, the HTTP request, and the retic framework when applied to computing systems. response over an established TCP connection. In this exam- ple, we focus on controlling the connection delay. The con- A Detailed Example of Relative nection delay can be significant in overload conditions Delay Guarantees in Web Servers when all server processes are tied up with existing connec- As the Internet becomes commercialized, it generates an in- tions. When the server is highly loaded, all new TCP connec- centive for e-businesses to provide guaranteed shorter ser- tion requests are queued in the TCP listen queue until they vice delay to premium customers, either because they pay are accepted by a server process. The connection delay of a higher fees or because they are more important to the busi- service class can be controlled by manipulating the process ness than the other customers. Web servers have to handle budget Bki()(i.e., the number of server processes allocated disturbances caused by highly unpredictable workload. For to class k in the sampling period (,())kT k+1 T ). Increasing example, the population in each customer class often varies the process budget of a class leads to a shorter connection dramatically at runtime, yet offered performance must re- delay for this class. main constant. A relative guarantee scheme is used to provide relative In this section, we present an example of applying feed- delay guarantees in the Apache server. The connection de- back control theory to provide relative delay guarantees on lay ratio of two adjacent classes i and i −1 is controlled by a Web servers. Every HTTP request over the Internet belongs controllerCRi . ForCRi , the controlled variable is the desired to a class i. A desired relative delay Wi is assigned to each connection delay ratio Vk()= Ck ()/ C− () k between classi. The service delayCk()of classi is defined as the av- iii1 i classesi andi −1, and the reference is the desired delay ratio erage delay of established connections of class i measured WW/ − . The manipulated variable is the process budget ra- within the kth sampling period ((kTkT−1 ) , ), where T is the ii1 tio Uk()= B− ()/() k Bkbetween class i −1 and i. Note that constant length of the sampling period. A relative delay ii1 i the class of the denominator and the numerator is reversed guarantee requires that Ck()/() Ck= W / Wfor any ji ji in our notations ofVk()andUk()due to the inverse relation classes j and i. For example, if class 0 has a desired relative i i between service rate and delay. At runtime, the controller weightW =1and class1has a desired relative weightW = 2, 0 1 CR periodically updates the process ratioUk()and invokes the relative delay guarantee requires that the delay of class i i a resource reallocation actuator to allocate server pro- 0 be half that of class 1. cesses to different classes according to Uki(). The goal is to

control the delay ratioVki()to remain close to the reference Sensors and Actuators Wkii()/ W− 1 () k. The first step in the feedback control design is to identify the controlled and manipulated variables in Web server systems. We focus on the Apache server and HTTP 1.1 re- quests. Apache is currently the most popular Web server System Identification software. The server runs a pool of server processes listen- As an instance of the server architecture described earlier, ing to a common TCP port. Each server process can only the Web server has dynamics caused by the queuing of TCP handle one TCP connection at any time instant. The HTTP connection requests. To compute the dynamic model be- 1.1 protocol works as follows: 1) a client (e.g., a Web tween the connection delay ratio and the process budget browser) establishes a TCP connection with a server pro- ratio, we explore the system identification approach. We ap- cess; 2) the client submits an HTTP request to the server proximate the server with an ARMA difference equation:

June 2003 IEEE Control Systems Magazine 83 n n =−+− timated model. Figure 3(b) shows that the prediction of the Vk()∑∑ aVkj ( j ) bUkj ( j ). j ==11j estimated model is consistent with the actual relative delay throughout the 30-min run. These results demonstrate that the real Apache server can be modeled as a second-order In an nth-order model, 2n parameters {ab , | j=1 ,..., n } jj difference equation. also show that an esti- need to be estimated. We estimate the open-loop dynamics mated first-order model had a larger prediction error (Fig- of the Apache Web server using pseudorandom digital ure 3(c)) than the second-order model, whereas an white noise as input. The input signal randomly changes the estimated third-order model (Figure 3(d)) did not improve process ratio between two different levels. A least-squares the modeling accuracy. We chose the second-order model estimator is invoked periodically to estimate model param- as a tradeoff between the model accuracy and . eters at every sampling instant. Note that although the general server model shown in We conduct system identification experiments to model Figure 1(b) suggests a high-order system, the server is usu- a real Apache Web server on five Linux PCs connected with a ally dominated by two important bottlenecks. The first is 100-Mb/s Ethernet. A Linux PC runs an Apache server to- the client request queue on the server’s input port. Hun- gether with our system identification software, and four dreds of requests are often queued there until they are Linux PCs run a SURGE workload generator [19] to simulate dequeued by some server process. The second queue is the 400 clients. Figure 3(a) shows that the estimated parame- CPU ready queue. More than a hundred server processes ters of a second-order model at successive sampling in- are usually dequeuing requests concurrently. The ready stants in a 30-min run (the system identification is started at queue thus has a large number of entries. Together the two 2 min after the run starts to give SURGE time to fully start queues give rise to a second- order system, as verified by up). The estimates of the parameters (aabb ,,,)converge 1212 the above . to (0.74, −0.37, 0.95, −0.12). To verify the accuracy of the model, we reran the experi- ment using a white noise input with a different seed and Evaluation of the Closed-Loop Server compared the actual delay ratio to that predicted by the es- We use the digital root-locus method to design a PI control- ler for the difference equation

s model. The PI controller guar- antees stability and zero 1 a1 steady-state error and has a a2 0 settling time of 4.5 min. The b1 b2 feedback control loop is im- –1 0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800 plemented by modifying the

Estimated Parameter Time [s] Apache server software. The (a) 8 detailed system identifica- 6 tion, design, and implementa- Actual 4 Estimate tion are described in [16]. 2

Delay Ratio Due to the highly unpredict- 0 0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800 able Internet client access pat- Time [s] terns, it is critical for a Web (b) 8 server to achieve robust relative 6 delay guarantees in the face of Actual 4 Estimate disturbances caused by chang- 2 ing workloads. We run experi- Delay Ratio 0 ments on the Linux PC testbed 0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800 to compare the performance of Time [s] (c) the open-loop server with the 8 closed-loop server for changing 6 client populations. Both the 4 Actual Estimate 2 open-loop and the closed-loop Delay Ratio 0 servers are tested with the same 0 120 240 360 480 600 720 840 960 1080 1200 1320 1440 1560 1680 1800 workload. Each experiment Time [s] starts with the nominal work- (d) load generated by 200 basic Figure 3. System identification results for an Apache server; (a) estimated model parameters (class 1) clients and 100 pre- (second-order model), (b) modeling error (second-order model), (c) modeling error (first-order mium (class 0) clients. To test model), and (d) modeling error (third-error model). the servers’ disturbance rejec-

84 IEEE Control Systems Magazine June 2003 tion capabilities, the number of premium clients is suddenly in- robust QoS guarantees and disturbance rejection capabilities in creased from 100 to 200 at 870 s. Internet servers. The open-loop server has a fixed process budget ratio that is hand-tuned to 0.83 through extensive system profil- ing based on a nominal workload. The process budget ratio Other Applications in Computing Systems of the closed-loop server is arbitrarily initialized to 1 with- Feedback control theory has been successfully applied to a out hand tuning. The reference to the controller is set to 3, broad spectrum of computing systems. The application do- which requires the connection delay of the basic clients to mains cover different resources that need to be controlled be three times that of the premium clients. (network bandwidth, CPU cycles, storage spaces, and I/O The performance of the open-loop server is shown in Fig- bandwidth), as well as different metrics and types of QoS guar- ure 4(a). The open-loop server performs well in the beginning antees. We now briefly summarize some recent application ex- of the experiment because its process budget ratio has been amples, providing references for more detailed . fine-tuned for the nominal workload. However, after the num- ber of premium clients suddenly increases to 200 at 870 s, the Internet Servers connection delay of the premium clients increases signifi- In addition to the Web example presented earlier, several cantly. Consequently, the connection delay ratio drops from recent papers [10], [12], [20] present a control-theoretical the reference and remains below 1 (i.e., the basic clients re- approach to Web server resource management through ceive a shorter delay than the premium clients!) in the rest of Web-content adaptation. Feedback control loops were de- the experiment. This result shows that the open- loop server veloped to achieve absolute convergence guarantees, ca- cannot reject disturbances caused by a changing workload. pacity reservation, and prioritization on request rate and The performance of the closed-loop server is shown in delivered bandwidth for different service classes. In [21], Figure 4(b). In the beginning of the experiment, the connec- multi-input, multi-output (MIMO) control was designed to tion delay ratio of the closed-loop server converges to achieve absolute convergence guarantees on both mem- around the reference by adjusting its process budget ratio. ory and CPU utilization in Web servers. In other work, feed- In response to the population change at 870 s, the feedback back control and fuzzy control were applied to control the control loop allocates more processes to the premium cli- input queue length in Lotus e-mail servers with admission ents while reducing the number of processes for the basic control [22], [23]. clients. By 1,140 s, the connection delay ratio settles around the reference and stays close to it for the rest of the run. This CPU Scheduling result demonstrates that the closed-loop server is able to Feedback control theory has been used for CPU scheduling reject the disturbance caused by instantaneous changes in in real-time systems, such as multimedia and embedded client populations. The re- covery time is close to the de- signed settling time of 4.5 5 min. The server remains stable 4 throughout the run, and the

10 3 Reference connection delay ratio re- CmCm Vk() mains close to the reference at 2 Uk() steady state. 1 In summary, we have shown 0 the effectiveness of feedback Delay Ratio (0 )/ (120 ) 240 360 480 600 720 840 960 1080120013201440156016801800 Time [s] control theory in achieving QoS (a) guarantees in a representative 5 Internet server. We successfully 4 developed effective software 3 Reference sensors and actuators to Vk() 2 Uk()

instantiate the relative guaran- ()and()

Vk1 Uk tee control template. We have 0 shown that the dynamics of a 0 120 240 360 480 600 720 840 960 1080120013201440156016801800 complex Internet server can be Time [s] modeled through system iden- (b) tification. The implementation and evaluation on real server Figure 4. Comparing the performance of the open-loop and closed-loop servers: connection delay systems demonstrate that feed- = = ratio Vk() Ck10 ()/ Ck (); process budget ratio Uk() Bk01 ()/ Bk (); (a) open-loop server and (b) back control loops can provide closed-loop server.

June 2003 IEEE Control Systems Magazine 85 control systems. Steere et al. [24] developed a feed- veloped for various software performance assurance prob- back-based CPU scheduler that coordinates the CPU cycles lems. A natural question is whether any generalization of these allocated to the consumer and supplier threads to guaran- solutions is possible. Indeed it seems that a small set of QoS tee the fill level of buffers. In [9] and [25], feedback control guarantee types covers a wide range of applications, and for real-time scheduling were developed to provide each of these classes of guarantees, we can define control-the- deadline miss ratio guarantees for real-time applications oretic templates that are amenable to online solutions. In com- with unknown task execution times. Feedback control puter systems where many applications can make use of real-time scheduling has also been extended to handle dis- similar services, these services are often provided in terms of tributed systems [8]. All the above scheduling algorithms middleware. In our work, we have developed a middleware are designed for absolute convergence guarantees. called ControlWare [29] that supports multiple types of QoS guarantees, each based on feedback control. Storage Management Storage is another critical resource in many server systems. ControlWare In [4], a relative guarantee template was developed to pro- ControlWare is a middleware QoS-control architecture vide a relative hit ratio guarantee in a Web proxy cache based on control theory, motivated by the needs of perfor- through a resource reallocation actuator that dynamically mance-assured Internet services. ControlWare allows the changes the disk space allocation for different service user to express QoS specifications offline, maps these speci- classes. Recently, adaptive control was applied to the same fications into appropriate feedback control-loop sets, tunes Web proxy cache to improve its portability and robustness loop controllers analytically to guarantee convergence to via automatic controller tuning [7]. specifications, and connects loops to the right performance The Aqueduct system [26] featured a feedback control sensors and actuators in the application such that the de- loop that regulated the speed of background data migration sired QoS is achieved. A main novelty of our middleware lies in enterprise storage servers while bounding the perfor- in isolating the software application programmer from con- mance impact on front-end applications. Aqueduct pro- trol-theoretic concerns while utilizing this theory to achieve vided absolute convergence guarantees on the I/O latency the desired QoS guarantees. At the same time, ControlWare of front-end applications during data migration. isolates the control engineer from the software task of inter- facing the controller to the controlled software system and designing software performance sensors and actuators. A Network Routers conceptual representation of ControlWare components At the network layer, control theory was applied to packet used in the process of designing software performance-as- flow control in Internet routers. Hollot et al. [5] applied surance loops is shown in Figure 5(a). control theory to analyze the RED active queue manage- ControlWare contains a library of macros written in our to- ment on IP routers. Their control analysis provided guid- pology description language, each formulating a particular ance for tuning the RED algorithm, which had previously type of QoS guarantee as a feedback control problem. The li- been a difficult problem. Recently, a feedback-based algo- brary is extensible in that a control engineer can transform a rithm was developed for quantitative assured forwarding new guarantee type into a macro that describes the corre- services [17] to provide absolute and proportional differ- sponding loop interconnection topology and stores that entiation of loss, service rates, and packet delays on macro in the middleware’s library. Currently, the library in- Internet routers. Li and Nahrstedt [27] developed a hierar- cludes macros for absolute convergence guarantees, relative chical architecture that integrates an upper layer fuzzy differentiated service guarantees, prioritization, and optimi- control and a lower layer packet rate control to achieve ab- zation guarantees. Each macro, like a block diagram, includes solute convergence guarantees on tracking precision in components such as sensors, actuators, and controllers. distributed visual tracking systems. ControlWare contains a library of common sensors and actu- ators that can be used in these software control loops. The li- brary is extensible in that it is possible for an application Microprocessor Architecture programmer to add new sensor and actuator types. Feedback control theory has also been applied in micro- Overall, ControlWare can express many common guaran- processor architectures. In [28], absolute convergence tee types required in performance-assurance software by guarantees were given on CPU chip temperatures by apply- casting them appropriately as feedback control problems. ing control-theoretic techniques to microprocessor ther- Once the control loops are instantiated from the QoS specifi- mal management. cation, the middleware uses textbook techniques to esti- Middleware for QoS Control mate system models and determine appropriate feedback As shown earlier, over the last several years, an increasing controller parameters for guaranteed convergence of the number of individual feedback control solutions have been de- control loops to the specified performance.

86 IEEE Control Systems Magazine June 2003 SoftBus: The ControlWare Backbone Sensors typically amount to a modest instrumentation of Several implementation challenges had to be overcome to cre- application code. For example, a sensor measuring the re- ate ControlWare, including interoperability. To promote quest rate on a particular site can be implemented as a sim- interoperability, the engineering community standardized ple . A sensor measuring delay can be implemented open-layered interface architectures such as the Fieldbus [30], as a moving average of the difference between two time which greatly simplify the interconnection of sensors, actua- stamps. Often the measured metric is already available as a tors, and controllers in a system. Similarly, a variable maintained by the controlled software service (e.g., crucial step in developing an open middleware layer for soft- queue length) or the operating system (such as CPU utiliza- ware QoS control is to provide a similar generic application tion). To implement the sensor, one only needs to pass the programming interface (API) and communication backbone to value to the middleware. The main challenge, however, is interface the computing and control subsystems. This back- the design of the actuator. To meet this challenge, our bone must be appropriate for distributed software rather than middleware includes a generic resource manager (GRM) for a physical bus. We call this backbone a SoftBus. that can be thought of as an all-purpose actuator or a “ge- ControlWare implements a SoftBus that provides a com- neric control valve.” The actuator interfaces to any resource mon interface for efficient information exchange between queue (i.e., “pipe”) such that resource allocation (i.e., software performance sensors, actuators, and controllers “flow”) can be controlled in a simple, unified, yet customiz- across machines and address spaces. The sen- sors, actuators, and controllers are viewed as inter- changeable plug-in modules. Note that these modules are not physical devices (such Control QoS Configuration as a temperature sensor), but rather software Contract File components that conceptually act like a sensor or actuator (e.g., a software load sensor might invoke an operating system call to measure CPU Control Loop Controller QoS Mapper Sys ID utilization). Modules connected to SoftBus Composition Design need not know each other’s locations and need not worry about distributed communication. Underneath the common API, different informa- tion exchange mechanisms are developed for different situations. This layered architecture is Controller Monitor GRM depicted in Figure 5(b). Software QoS Control Loops

ControlWare Library Plug-In Sensors and Actuators (a) In Softbus, we support two types of software sensors and actuators: passive and active. A passive sensor or actuator is just a function or Monitor Actuator Monitor Actuator software component that returns sample data Application or accepts a command when called by the con- Middleware SoftBus troller. An active sensor or actuator, in contrast, is a process or thread that may be running in its Generic own address space. It is usually awakened peri- Resource Manager odically by the operating system scheduler to Controller Controller (GRM) perform sensing or actuation. For example, an idle-CPU-time sensor may be implemented as an ControlControl Group GroupControl Group active sensor process that runs at the lowest priority and computes the percentage of time it White Model Controller has been executing. If a controller needs to com- Noise QoS Group Estimator Design Timer Generator Mapper Management municate with such a sensor, some kind of IPC Service should be used instead of a direct function call. System Identification Controllers are designed completely independ- ent of the identity of the sensors and actuators. (b) From the controller’s perspective, the software system being controlled acts like a regular phys- Figure 5. ControlWare: (a) development methodology and (b) interfacing to ical plant. the SoftBus.

June 2003 IEEE Control Systems Magazine 87 able manner. In the following, we shed some light on the ge- • Time-multiplexed resources. Such resources as CPUs neric resource manager in ControlWare. cannot be accessed by multiple users at the same time. Instead, access has to be serialized and multi- Generic Resource Manager plexed in time. For such resources, the GRM main- Our GRM is designed for use with Internet servers such as tains per-class queues. A classifier classifies the Web servers, DNS servers, mail servers, and proxy cache requests and puts them into the different queues. The servers. As mentioned earlier, in these applications, it is quota refers to the dequeue rate of each class as a the order and quantity of resource consumption that de- fraction of server capacity. cide the quality of the service. Therefore, the actuator • Space-multiplexed resources. Memory and disk space must control the access to the resources quantitatively. are examples of such resources. They can be used by The actuator uses quota, which is one of the knobs it ex- multiple users at the same time. Typically, each class ports, to represent the resources allocated to each class. It has a quota indicating the amount of the resource ded- can change the QoS of different classes by changing the icated to it. A queue may or may not be necessary in quota dynamically at runtime. The definition of “quota” de- this scenario. In many cases, requests are admitted pends on the type of resource under consideration. Gener- until the quota is used up, at which point further re- ally, server resources can be categorized into two quests are simply rejected. An interesting alternative categories according to their access patterns: is that when some class’s quota is used up and a new request from this class arrives, a previ- ously allocated resource amount can be revoked from other clients and trans- Resource Queue ferred to this new request. Cache manage- Classifier Resource Request Manager Allocator ment is an example of this phenomenon, where newly cached pages replace older pages in the cache. Queue Quota Policy Manager In summary, the generic resource manager understands the notion of traffic classes and ex- ports the abstraction of resource quota to rep- Actuator resent the amount of logical resources allocated to a particular class. The action of the manager (a) lies in controlling resource quota allocations. The structure of the generic resource man- GRM ager is shown in Figure 6(a). In the figure, the InsertRequest (.....){ Classifier and Resource Allocator are provided Application by the application. The Resource Allocator If (queue of this class is not empty){ class =classify (request) buffer this request does resource allocation. The Queue Manager insertRequest (...,class,...) return maintains one queue for each class, governed by a certain queuing policy. The Quota Manager } if (this class still has quota){ maintains a resource quota for each class. allocProc(...) To use GRM, the application must export update quota usage three interfaces: allocProc, rejectProc, and } else { Resource Allocator revokeProc. allocProc is called to execute ap- buffer this request plication-specific resource allocation meth- } allocProc(...){ } ods. When a previously arrived request has do resource allocation to be rejected, rejectProc is called to do resourceAvailable (.....){ } cleanup work (for example, close the net- When some resource available update quota usage resourceAvailable(...) work connection). revokeProc is called when get requests from class a resource previously allocated to some cli- that still has quota ents needs to be revoked (such as the case allocProc(...) with cache replacement policies). } Figure 6(b) summarizes the interaction be- tween GRM and the application. When some re- (b) source is requested by the application, the request is first classified by the Classifier. After Figure 6. Generic resource manager: (a) structure of GRM and (b) resource that, the request is passed to GRM by calling allocation procedure. insertRequest. GRM controls resource allocation

88 IEEE Control Systems Magazine June 2003 by checking the request against two constraints: the queue retic formulation of software assurance problems, the feasi- length and the quota constraint. If the queue for the given class bility of linear modeling, and the feasibility of actuation in is empty and the class has quota, the request is satisfied imme- practical application scenarios, we believe that the founda- diately via the function call allocProc to the resource allocator, tion is now in place for applying the wealth of control-theo- and the quota is updated accordingly. If the request can’t be retic results in the Internet application domain. satisfied immediately, it will be buffered in its queue. When Many remaining issues and challenges warrant further some resource becomes available, the application calls research; for example, how to model nonlinearities pecu- resourceAvailable to notify GRM, which will try to satisfy as liar to computing systems when they cannot be adequately many pending requests as possible. handled by the linearization techniques we employed? It is important to mention that quota is a purely logical And how can these nonlinearities be accounted for in con- concept. Unlike the traditional resource reservation sys- troller tuning? The most widespread nonlinearity in soft- tem, in our middleware the mapping of quota to physical re- ware systems is the inverse relation between rate and source consumption need not be known. In effect, the GRM delay that complicates the analysis of delay control loops, is a logical queuing, admission control, and resource alloca- a key step toward reasoning about and controlling tempo- tion policy interface with a backend that is capable of exe- ral behavior. Although we focused on fixed-parameter con- cuting a primitive service function such as assigning a trollers, of significant interest is the possible application of request to a service process. The GRM generalizes the ex- adaptive control and robust control techniques to handle pression of various resource allocation policies in a com- parameter variations and load uncertainties. Another in- mon framework and makes it possible to control logical teresting possibility is the use of a predictive control quota allocations by simple feedback controllers such that framework, or one where queuing-theoretic prediction is performance constraints are met. The control loop is guar- integrated with feedback-based correction to achieve the anteed to converge because of the way controllers are de- desired software performance. Finally, further examples, signed, which is the advantage of using a control-theoretic theoretical foundations, experimental evidence, and prac- approach. Most important, the physical mapping of quota tical experience are needed to establish the true potential to actual resource consumption need not be known for cor- and limitations of applying feedback performance control rect operation, which separates this approach from re- to different computing systems. This remains an important source reservation systems. focus for our future research. In summary, ControlWare is novel middleware that sup- ports QoS guarantees based on control theory. Although the Acknowledgments middleware exists, many open questions remain and many This work was supported, in part, by the National Science improvements are possible. For example, control templates Foundation under grants ANI-0105873, CCR-0093144, and for distributed systems, for adaptive controllers, and fault CCR-0098269, by the DARPA NEST program under grant conditions are some of the interesting issues that must be F33615-01-C-1905, and by the MURI award N00014-01-1-0576. answered and then added to the ControlWare libraries. References [1] T.G. Robertazzi, Computer Networks and Systems: Queuing Theory and Per- Conclusions formance Evaluation, 3rd ed. New York: Springer Verlag, 2000. In this article, we have presented the necessary foundations [2] V. Paxton and S. Floyd, “Wide-area traffic: The failure of Poisson model- ing,” IEEE/ACM Trans. Networking, vol. 3, pp. 226-244, June 1995. for using a control-theoretic framework to achieve QoS [3] M.E. Crovella and A. Bestavros, “Self-similarity in World Wide Web traffic: guarantees in software systems. With the growing popular- Evidence and possible causes,” IEEE/ACM Trans. Networking, vol. 5, pp. 55-79, Dec. 1997. ity of the Internet, providing performance guarantees in [4] Y. Lu, A. Sexana, and T. Abdelzaher, “Differentiated caching services; A open software systems has become increasingly important. control-thoeretical approach,” in Proc. 2001 Int. Conf. Distributed Computing We illustrated the successful application of the control Systems, 2001, pp. 615-622. [5] C. V.Hollot, V.Misra, D. Towsley, and W.Gong, “A control theoretic analysis framework by practical examples in which guarantees were of red,” in Proc. IEEE Infocom, Anchorage, AK, Apr. 2001, pp. 1510-1519. provided within a software service. We analyzed the general [6] S. Floyd and V. Jacobson, “Random early detection gateways for conges- tion avoidance,” IEEE/ACM Trans. Networking, vol. 1, no. 4, pp. 397-413, Aug. architecture of software servers for purposes of feedback 1993. control. We have shown that high-performance servers can [7] Y.Lu, T.Abdelzaher,C. Lu, and G. Tao, “An adaptive control framework and be approximated in practice by a liquid task model. The its application to differentiated caching services,” in Proc. Int. Conf. Quality of Service, Miami Beach, FL, May 2002, pp. 23-32. model gives rise to a fluid representation of workload in [8] J.A. Stankovic, T.He, T.F. Abdelzaher, M. Marley, G. Tao, S.H. Son, and C. Lu, which the server is modeled as a cascaded flow through a “Feedback control scheduling in distributed systems,” in Proc. IEEE Real-Time Systems Symp., London, U.K., Dec. 2001, pp. 59-70. pipeline of tanks of different capacities. We identified typical [9] C. Lu, J.A. Stankovic, G. Tao, and S.H. Son, “Feedback control real-time actuators that control these flows within the server and il- scheduling: Framework, modeling, and algorithms,” J. Real-Time Syst., vol. 23, lustrated software protocols that implement these actua- no. 1/2, May 2002. [10] T.F. Abdelzaher, K.G. Shin, and N. Bhatti, “Performance guarantees for tors. We also presented a taxonomy of the most important Web server end-systems: A control-theoretical approach,” IEEE Trans. Paral- QoS assurance problems in the software literature and de- lel Distrib. Syst., vol. 13, pp. 80-96, Jan. 2002. [11] C. Aurrecoechea, A. Cambell, and L. Hauw, “A survey of QoS architec- scribed how they can be translated into a feedback control tures,” in Proc. 4th IFIP Int. Conf. Quality of Service, Paris, France, Mar.1996, pp. formulation. Having shown the feasibility of a control-theo- 138-151.

June 2003 IEEE Control Systems Magazine 89 [12] T.F. Abdelzaher and N. Bhatti, “Web server QoS management by adaptive neering from the University of Michigan in 1999 and the NSF content delivery,” in Proc. Int. Workshop on Quality of Service, London, U.K., June 1999, pp. 216-225. CAREER Award in 2001. He is a co-editor of IEEE Distributed [13] C. Dovrolis, D. Stiliadis, and P. Ramanathan, “Proportional differentiated Systems Online and a guest editor of Computer Communica- services: Delay differentiation and packet scheduling,” in Proc. SIGCOMM, 1999, pp. 109-120. tion and the Journal of Real-Time Systems. He has also served [14] C. Dovrolis and P. Ramanathan, “Proportional differentiated services, on many conference committees and is the designated in- Part II: Loss rate differentiation and packet dropping,” in Proc. Int. Workshop ventor of a European patent on adaptive Web servers. Quality of Service, Pittsburgh, PA, June 2000, pp. 53-61. [15] K.J. Åström and T. Hägglund, PID Controllers: Theory, Design, and Tuning, 2nd ed. Instrument Society of America, 1995. John A. Stankovic is the BP America Professor and Chair of [16] C. Lu, T. Abdelzaher, J. Stankovic, and S. Son, “A feedback control ap- proach for guaranteeing relative delays in Web servers,” in Proc. IEEE the Computer Science Department at the University of Vir- Real-Time Technology and Applications Symp., Taipei, Taiwan, June 2001, pp. ginia. He is a Fellow of both the IEEE and the ACM. He re- 51-62. [17] N. Christin, J. Liebeherr, and T.F. Abdelzaher, “A quantitative assured for- ceived the IEEE Real-Time Systems Technical Committee's warding service,” in Proc. IEEE Infocom, New York,NY,June 2002, pp. 864-873. Award for Outstanding Technical Contributions and Leader- [18] W.A. McEachern, , 5th ed. South-Western College Publishing, ship. He also serves as treasurer for the Board of Directors 1999. [19] P. Barford and M.E. Crovella, “Generating representative Web workloads of the Computer Research Association. Before joining the for network and server performance evaluation,” in Proc. Performance University of Virginia, he taught at the University of Massa- ‘98/ACM SIGMETRICS ‘98, Madison, WI, 1998, pp. 151-160. [20] T.F. Abdelzaher, “An automated profiling subsystem for QoS-aware ser- chusetts, where he won an outstanding scholar award. He vices,” in Proc. IEEE Real-Time Technology and Applications Symp., Washing- has also held visiting positions in the Computer Science De- ton, D.C., June 2000, pp. 208-217. partment at Carnegie Mellon University, at INRIA in France, [21] Y. Diao, N. Gandhi, J.L. Hellerstein, S. Parekh, and D.M. Tilbury, “MIMO control of an Apache Web server: Modeling and controller design,” in Proc. and Scuola Superiore S. Anna in Pisa, Italy. He was the edi- American Control Conf., Anchorage, AK, May 2002, pp. 4922-4927. tor-in-chief for IEEE Transactions on Distributed and Parallel [22] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury, T. Jayram, and J. Bigus, “Using control theory to achieve service level objectives in performance Systems and is a co-editor-in-chief for the Real-Time Systems management,” in IFIP/IEEE Int. Symp. Integrated Network Management, Seat- Journal. His research interests are in distributed computing, tle, WA, May 2001, pp. 841-854. real-time systems, operating systems, and ad hoc networks [23] Y. Diao, J.L. Hellerstein, and S. Parekh, “Using fuzzy control to maximize profits in service level management,” IBM Syst. J., vol. 41, no. 3, pp. 403-420, for pervasive computing. He received his Ph.D. from Brown 2002. University. [24] D.C. Steere, A. Goel, J. Gruenberg, D. McNamee, C. Pu, and J. Walpole, “A feedback-driven proportion allocator for real-rate scheduling,” in Operating Systems Design and Implementation, 1999, pp. 145-158. Chenyang Lu is an assistant professor in the Department of [25] C. Lu, J.A. Stankovic, G. Tao, and S.H. Son, “The design and evaluation of a feedback EDF control scheduling algorithm,” in Proc. IEEE Real-Time Systems Computer Science and Engineering at Washington Univer- Symp., Phoenix, AZ, Dec. 1999, pp. 56-67. sity in St. Louis. He received a B.S. in computer science from [26] C. Lu, G.A. Alvarez, and J. Wilkes, “Aqueduct: Online data migration with the University of Science and Technology of China in 1995, a performance guarantees,” in Proc. USENIX Conf. File and Storage Technolo- gies, Monterey, CA, Jan. 2002, pp. 219-230. Master’s in computer science from the Chinese Academy of [27] B. Li and K. Nahrstedt, “A control-based middleware framework for qual- Sciences in 1997, and a Ph.D. in computer science from the ity of service ,” IEEE J. Select. Areas Commun.,vol.17,pp. 1632-1650, Sept. 1999. University of Virginia in 2001. His current research interests [28] K. Skadron, T. Abdelzaher, and M. Stan, “Control-theoretic techniques include real-time systems and middleware, wireless sensor and thermal RC modeling for accurate and localized dynamic thermal networks, Internet servers, and network storage. A theme of mangement,” in Proc. Int. Symp. High Performance Computer Architecture, Cambridge, MA, Feb. 2002, pp. 17-28. his research is developing adaptive distributed software [29] R. Zhang, C. Lu, T.F. Abdelzaher, and J.A. Stankovic, “Controlware: A systems that provide performance-assured services in un- middleware architecture for feedback control of software performance,” in Proc. 2002 Int. Conf. Distributed Computing Systems, Vienna, Austria, July 2002, predictable environments. pp. 301-310. [30] A. Chatha, “Fieldbus: The foundation for field control systems,” Contr. Eng., vol. 41, no. 6, pp. 47-50, May 1994. Ronghua Zhang received a B.S. degree from Huazhong Uni- versity of Science and Technology, Wuhan, China, and an Tarek F. Abdelzaher received his B.Sc. and M.Sc. degrees M.S. degree in computer science from the Institute of Soft- in electrical and computer engineering from Ain Shams Uni- ware, Chinese Academy of Sciences, Beijing, China. Cur- versity, Cairo, Egypt, in 1990 and 1994, respectively. He re- rently, he is working toward his Ph.D. in computer science at ceived his Ph.D. in 1999 from the University of Michigan in the University of Virginia, Charlottesville. computer science with a specialization in real-time systems. He is an assistant professor in the Department of Computer Ying Lu received her B.S. in computer science from South- Science at the University of Virginia. His current research in- west Jiaotong University, Chengdu, China, in 1996, and her terests include real-time systems, sensor networks, qual- M.S. in computer science from Jinan University, Guangzhou, ity-of-service control, networking, multimedia applications, China, in 1999. She is currently working on her Ph.D. degree next-generation Web architecture, fault tolerance, and de- in computer science at the University of Virginia. Her re- pendable computing. He is the author and co-author of more search interests include applying control theory to than 45 refereed publications. He received the Distin- Internet-based services, adaptive QoS architecture design, guished Achievement Award in Computer Science and Engi- content distribution networks, and storage systems.

90 IEEE Control Systems Magazine June 2003