Predicting the CPU Availability of Time-Shared Unix Systems

Predicting the CPU Availability of Time-shared Unix Systems on the Computational Grid ¡ ¢ £ Rich Wolski, Neil Spring, and Jim Hayes Abstract One vexing quality of these ensemble systems is that their performance characteristics vary dynamically. In par- In this paper we focus on the problem of making short ticular, it is often economically infeasible to dedicate a large and medium term forecasts of CPU availability on time- collection of machines and networks to a single application, shared Unix systems. We evaluate the accuracy with which particularly if those resources belong to separate organiza- availability can be measured using Unix load average, the tions. Resources, therefore, must be shared, and the con- Unix utility vmstat, and the Network Weather Service tention that results from this sharing causes the deliverable CPU sensor that uses both. We also examine the autocor- performance to vary over time. To make the best use of the relation between successive CPU measurements to deter- resources that are at hand, an application scheduler (be it a mine their degree of self-similarity. While our observations program or a human being) must make a prediction of what show a long-range autocorrelation dependence, we demon- performance will be available from each. strate how this dependence manifests itself in the short and In this paper, we examine the problem of predicting medium term predictability of the CPU resources in our available CPU performance on Unix time-shared systems study. for the purpose of building dynamic schedulers. In this vein, the contributions it makes are: ¨ an exposition of the measurement error associated 1 Introduction with popular Unix load measurement facilities with respect to estimating the available CPU time a process Improvements in network technology have made the dis- can obtain, tributed execution of performance-starved applications fea- ¨ a study of one-step-ahead forecasting performance ob- sible. High-bandwidth, low-latency local-area networks tained by the Network Weather Service [29, 30] (a form the basis of low-cost distributed systems such as distributed, on-line performance forecasting system) the HPVM [14], the Berkeley NOW [9], and various Be- when applied to CPU availability measurements, owulf [25] configurations. Similarly, common carrier sup- port for high-performance wide-area networking is fueling ¨ an analysis of this forecasting performance in terms of an interest in aggregating geographically dispersed and in- the autocorrelation present between consecutive mea- dependently managed computers. These large-scale meta- surements of CPU availability, and, computers are capable of delivering greater execution performance to an application than is available at any one of ¨ verification of this analysis through its application to their constituent sites [4]. Moreover, performance-oriented longer-term forecasting of CPU availability. distributed software infrastructures such as Globus [12], Le- gion [18], Condor [26], NetSolve [7], and HPJava [6] are Our results are somewhat surprising in that they demon- attempting to knit vast collections of machines into com- strate the possibility of making short and medium term pre- putational grids [13] from which compute cycles can be dictions of available CPU performance despite the presence obtained in the way electrical power is obtained from an of long-range autocorrelation and potential self-similarity. electrical power utility. Since chaotic systems typically exhibit self-similar performance characteristics, self-similarity is often interpreted as ¤ Supported by NSF grant ASC-9701333 and Advanced Research an indication of unpredictability. The predictions we ob- Projects Agency/ITO under contract #N66001-97-C-8531. ¥ email: [email protected] tained, however, exhibited a mean absolute error of less than ¦ email: [email protected] 10%, typically making them accurate enough for use in dy- § email: [email protected] namic process scheduling. In the next section (Section 2), we detail the accuracy 2.1 Measurement methods obtained by three performance measurement methodolo- gies when applied to a collection of compute servers and For each system, we use the NWS CPU monitor to pe- workstations located in the U.C. San Diego Computer Sci- riodically generate three separate measurements of current ence and Engineering Department. Section 3 briefly de- CPU availability. The first is based on the Unix load av- scribes the Network Weather Service (NWS) [30, 29] – erage metric which is a one-minute smoothed average of a distributed performance forecasting system designed for run queue length. Almost all Unix systems gather and re- use by dynamic schedulers. The NWS treats measurement port load average values, although the smoothing factor and histories as time series and uses simple statistical techniques sampling rates vary across implementations. The NWS sen- to make short-term forecasts. Section 3 also presents the sor uses the utility uptime to obtain a load average read- analysis the forecasting errors generated by the NWS fore- ing without special access privileges2. Using a load average casting system for the resources we monitored in this study. measurement, we compute the available CPU percentage as Finally, we conclude with a discussion of these results in . § "! # the context of dynamic scheduling for metacomputing and ¢¡£ ¥¤§¦¨ © ¦¨ $&% /#¢#¥0 ¤(' ¦¨)* £+ ¥¡¢,- # computational grid systems, and point to future research di- "! rections in Section 4. (1) indicating the percentage of CPU time that would be available to a newly created process. Fearing load average 2 Measurements and measurement error to be insensitive to short-term load variability, we implemented a second measurement technique based on the utility vmstat which provides periodically updated readings 213 /4 For dynamic process scheduling, the goal of CPU pre- ¤(+¥¦¨ of CPU time, consumed time, and consumed diction is to gauge the degree to which a process can be ex- 1/5 1/67 38 time, presented as percentages. pected to occupy a processor over some fixed time interval. 21/ 34 1/5 1/67 38 For example, if “50% of the CPU” is available from a time- . :<;§=¨= ¥¡£ ¥¤(¦9 © ¦¨ 7 >¤§+¥¦¨ ?- -BA 4C- shared CPU, a process should be able to obtain 50% of the 4@- time-slices over some time interval. Typically, the availabil- (2) ity percentage is used as an expansion factor [11, 23, 15, 1] where 4 is a smoothed average of the number of running to determine the potential execution time of a process. If processes over the previous set of measurements, and A is only 50% of the time-slices are available, for example, a a weighting factor equal to 21/ 34 time. The rationale is that process is expected to take twice as long to execute as it a process is entitled to the current idle time, and a fair share would if the CPU were completely unloaded1. We have of the user and system time. However, if a machine is used used CPU availability to successfully schedule parallel pro- as a network gateway (as was the case at one time in the grams in shared distributed environments [24, 2]. Other UCSD CSE Department) user-level processes may be de- scheduling systems such as Prophet [27], Winner [1], and nied CPU time as the kernel services network-level packet MARS [15] use Unix load average to accomplish the same interrupts. In our experience, the percentage of system time purpose. that is shared fairly is directly proportional to the percentage In this section, we detail the error that we observed of user time, hence the A factor. while measuring CPU availability. We report results gath- Lastly, we implemented a hybrid sensor that combines ered by monitoring a collection of workstations and com- Unix load average and vmstat measurements with a small pute servers in the Computer Science and Engineering De- probe process. The probe process occupies the CPU for a partment at UCSD. We used the CPU monitoring facilities short period of time (currently 1.5 seconds) and reports the currently implemented as part of the Network Weather Ser- ratio of the CPU time it used to the wall-clock time that vice [30]. Each series of measurements spans a 24 hour passed as a measure of the availability it experienced. The period of “production” use during August of 1998. Despite hybrid runs its probe process much less frequently than it :<;§=¨= §D ¥¡¥ ¥¤§¦¨ © ¦9 §D the usual summertime hiatus from classwork, the machines measures ¥¡¥ ¥¤§¦¨ © ¦9 and in this study experienced a wide range of loads as many of as these quantities may be obtained much less intrusively. the graduate students took the opportunity to devote them- The method (vmstat or load average) that reports the selves to research that had been neglected during the school CPU availability measure closest to that experienced by the year. As such, we believe that the data is a representative probe is chosen to generate all measurements until the next sample of the departmental load behavior. 2The current implementation of the NWS runs completely without priv- ileged access, both to reduce the possibility of breeching site security and 1This relationship assumes that the execution interval is suf®ciently to provide data that is representative of the performance an ªaverageº user large with respect to the length of a time-slice so that possible truncation can obtain. For a more complete account of the portability and implemen- and round-off errors are insigni®cant. tation issues associated with the NWS, see [30]. probe is run. In the experiments described in this paper, Host Name Load Average vmstat NWS Hybrid § the NWS hybrid sensor calculated ¢¡£ ¥¤§¦¨ © ¦¨ thing2 9.0% 11.2% 11.1% §D : ;(=¨ = and ¥¡¥ ¥¤§¦¨ © ¦9 every 10 seconds (6 times per thing1 6.4% 7.5% 6.1% minute), and probed once per minute. The method that was conundrum 34.1% 32.7% 4.4% most accurate with respect to the probe was used for the beowulf 6.3% 6.5% 7.5% subsequent 5 measurements each minute.

Predicting the CPU Availability of Time-Shared Unix Systems

A Comprehensive Review for Central Processing Unit Scheduling Algorithm

The Different Unix Contexts

Oracle® Linux 7 Monitoring and Tuning the System

Chapter 3. Booting Operating Systems

Comparing Systems Using Sample Data

A Simple Chargeback System for SAS® Applications Running on UNIX and Linux Servers

UNIX OS Agent User's Guide

CS414 SP 2007 Assignment 1

SUSE Linux Enterprise Server 11 SP4 System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 11 SP4

System Analysis and Tuning Guide System Analysis and Tuning Guide SUSE Linux Enterprise Server 15 SP1

Cgroups Py: Using Linux Control Groups and Systemd to Manage CPU Time and Memory

Machine Learning for Load Balancing in the Linux Kernel Jingde Chen Subho S