Evaluation Issues in Autonomic Computing
Total Page:16
File Type:pdf, Size:1020Kb
Evaluation issues in Autonomic Computing Julie A. McCann, Markus Huebscher Department Of Computing, Imperial College London {jamm, mch1} @doc.ic.ac.uk Abstract • Self-optimization: an autonomic comput- ing system optimises its use of resources. Autonomic Computing is a concept that brings It may decide to initiate a change to the together many fields of computing with the system proactively (as opposed to reactive purpose of creating computing systems that are behaviour) in an attempt to improve per- reflective and self-adaptive. In this paper we formance. draw upon our experience of this field to dis- • Self-healing: an autonomic computing cuss how we can attempt to evaluate auto- system detects and diagnoses problems. nomic systems. By looking at the diverse sys- What kinds of problems are detected can tems that describe themselves as autonomic, be interpreted broadly: they can be as low- we provide an introduction to the concepts of level as a bit-error in a memory chip Autonomic Computing and describe some (hardware failure) or as high-level as an achievements that have already been made. We erroneous entry in a directory service then discuss this work in terms of what is nec- (software problem) [3]. If possible, it essary to evaluate and compare such systems. should attempt to fix the problem, for ex- We conclude with a definitive set of metrics, ample by switching to a redundant compo- which we believe are useful to evaluate nent or by downloading and installing soft- autonomicity. ware updates. However, it is important that as a result of the healing process the system is not further harmed, for example 1. Introduction by the introduction of new bugs or the loss Autonomic computing is generally consid- of vital system settings. Fault-tolerance is ered to be a term first used by IBM in 2001 to an important aspect of self-healing. Typi- describe computing systems that are said to be cally, an autonomic system is said to be self-managing [1]. However the concept of reactive to failures or early signs of a pos- self-management and adaptation in computing sible failure. systems has been around for some time. The • Self-protection: an autonomic system pro- event of the combination of object-oriented tects itself from malicious attacks but also programming paralleled with component-based from end users who inadvertently make software engineering essentially paved the way software changes, e.g. by deleting an im- toward autonomic computing [49]. That is, it portant file. The system autonomously can be argued that without the concepts of tunes itself to achieve security, privacy dynamic re-binding of components a system and data protection. Thus, security is an cannot effectively reconfigure itself to adapt to important aspect of self-protection, not just improve its service. in software, but also in hardware (e.g. When reviewing the current state-of-the art TCPA [23]). A system may also be able to in autonomic systems, the concept of self- anticipate security breaches and prevent management usually groups into having four them from occurring in the first place. basic properties: self-configuration, self- Self-management requires that a system optimization, self-healing and self-protection. monitor its components (internal knowledge) Here is a brief description of these properties and its environment (external knowledge), so (for more information, see [1] and [2]): that it can adapt to changes that may occur, • Self-configuration: an autonomic comput- which may be known changes or unexpected ing system configures itself according to changes where a certain amount of artificial high-level goals, i.e. by specifying what is intelligence may be required. desired, not necessarily how to accomplish However, there is no agreed definition of it. This can mean being able to install itself what an Autonomic system is, their evaluation based on the needs of a given platform and and moreover comparison, is difficult. Fur- the user. thermore the very emergent nature of such systems adds further complexity to the evalua- tion of such systems. This paper is an attempt to look at Autonomic computing and try to way, as many settings such as web proxy highlight areas, which can be used to compare server addresses or VPN security settings must performance and derive some form of metrics. still be entered manually). PCs on a LAN can The structure of this paper is as follows. discover other devices, such as printers, and Initially, in section 2 we survey the area of install their drivers automatically (under the Autonomic computing to attempt to build a right conditions). Further many consider data- map of the subject. To this end we provide an base query optimisers an early form of auto- introduction to the concepts of Autonomic nomic computing [4]. Company Title Salient feature IBM eLiza servers and mainframes respond to unexpected changes and system components’ failure autonomously [3]. LEO autonomic query optimiser for IBM’s DB2 database systems [38]. LEO monitors queries as they execute and compares its estimates of the cost for each step in the query execution plan with actual results. The detection of estimation errors can also trigger reoptimisation of a query in mid-execution. Blue Gene/L IBM’s a supercomputer to be released in 2005, will implement self-healing and self- (BG/L), management [3] using application programmers placing checkpoints in code. IceCube The server tracks its components’ health and maximum capacity. It can then autono- mously distribute data among the available nodes. Failed nodes (bricks) are worked around automatically, [17] Sun Jini network allow Systems experiencing failure of a component could find components on the network with the required functionality that are still functioning and then reallocate resources to them autonomously Grid Engine Deadline policies can be used to make sure that projects nearing the deadline receive Enterprise Edi- a greater share of resources. Then share based policies consider the accumulated tion resource usage of a user, so that if a user “over-uses” resources, then the grid will lower the entitlement of that user to resources for a certain period of time [39]. HP Superdome adding self-healing properties to replace a processor if there are too many errors to automatically fix Figure 1. Sample Commercial Autonomic Effort Computing and describe some research that is However our definition of what we mean taking place in various fields of computing and by autonomic computing is that of a self- some achievements that have already been adaptive system as opposed to an adaptive made, section 3. After that in section 4, we system. Therefore, here standard query opti- concentrate on research in the field of software misers would not be considered as providing engineering and describe projects that focus on autonomicity. However if while a query was adding autonomic behaviour to software sys- running and the DBMS was monitoring the tems. Finally, in sections 5 and 6 we combine query’s execution and deciding on a different this work together with a discussion on per- query plan, then we would consider that auto- formance evaluation and benchmarking, taking nomic. Nevertheless, we realise that the into account our experiences of measuring boundary from adaptive to self-adaptive sys- autonomic systems and provide some initial tems is fuzzy. ideas on how such systems can be compared. 3. Why Autonomic computing 2. Autonomic computing today In trying to understand how to evaluate an The ideas behind autonomic computing are not autonomic system one much understand the new. In fact, it is possible to find some aspects reason we would want such a system. This of autonomic computing already in today’s allows us to compare whether or not the objec- software products [2]. For instance, Windows tive has been met. XP optimises its user interface (UI) by creating The main reason for large blue-chip com- a list of most often used programs in the start panies, like IBM, being interested in auto- menu. Thus, it is self-configuring in that it nomic computing is the need to reduce the cost adapts the UI to the behaviour of the user, and complexity of owning and operating an IT although in a fairly basic way, by monitoring infrastructure [4]. In particular, there is a need what programs are called most often. It can to alleviate the complexity with which system also download and install new critical updates administrators of IT services are faced today. without user intervention, sometimes without The aim is to allow administrators to specify restarting the system. Therefore, it also exhib- high-level policies that define the goals of the its basic self-healing properties. DHCP and autonomic system, and let the system manage DNS services allow devices to self-configure itself to accomplish these goals. At present, to access a TCP/IP network (albeit in a limited system administrators must tweak hundreds of settings and often spend weeks before getting a in particular adhoc networking [52]. For ex- system to run optimally. Autonomic systems ample, Liu and Martonosi [22] discuss the are also faster at adapting to changes to the problem of propagating software updates in a environment, e.g. by distributing its resources wireless network of devices that are spread differently when a critical-project requires over a large area and are not all reachable from more CPU processing power. Furthermore, as a base station. Sensors co-operate to propagate information systems in enterprises grow larger, software updates