REJUVENATION

A TERM PAPER FOR CSC 532 SOFTWARE ENGINEERING

Submitted By: Swetha Kendyala

ABSTRACT: INTRODUCTION: Software rejuvenation is the concept As software continues to become of gracefully terminating an larger and more complex, it is application and immediately restarting becoming the dominant source of it at a clean internal state. In a client- system failures. Even though all server type of application where the software-induced system failures have server is intended to run perpetually their cause in flexible design faults, for providing a service to its clients, the sheer complexity of modern day rejuvenating the server process software along with inherent periodically during the most idle time limitations in testing make it of the server increases the availability practically impossible to produce of that service. In a long-running truly fault-free software. Among computation-intensive application, various kinds of software faults rejuvenating the application "bugs" of a particularly elusive nature periodically and restarting it at a have come to light. These bugs, previous checkpoint increases the commonly named "Heisenbugs" are likelihood of successfully completing characterized by their non- the application execution. software deterministic activation, i.e. a second rejuvenation is used in a billing data execution of the software, even with collection subsystem of a the same data, may not result in a telecommunications operations failure. Transient software failures of system and other continuously- this nature are reported in many running systems and scientific instances in the field. The reason applications in AT&T are described. behind the Heisenbug's elusiveness, during testing as well as in the operational phase, is the dependence of their activation on the operational or a hardware reboot in the worst environment. case, to reclaim memory. Since Since exactly the same operational rejuvenation typically involves an environment which led to error and overhead, an important research issue failure is unlikely to be reproduced, is to determine when and how often the failure upon a second execution is the software should be rejuvenated. avoided. This is especially true, if the Performability modeling of software environment is deliberately changed rejuvenation enables us to answer this or "cleaned". question. Following this reasoning, software rejuvenation has recently been proposed to avoid failures caused by Heisenbugs. software rejuvenation is the "periodic preemptive rollback of continuously running applications to prevent failures in the future" The implementation of this idea involves "cleaning up the in-memory data structures, respawning the processes at the initial state, logging administrative records, etc." A typical example where rejuvenation can be beneficial is when the Software experiences "memory leaks". Causing a continuous reduction in the amount of free memory. Rejuvenation then would consist of garbage collection, Software Rejuvenation To counteract the phenomenon of , a proactive approach of fault management, called “software rejuvenation” is introduced. It involves occasionally terminating an application or a system, cleaning its internal state and restarting it. This process removes the accumulated errors and frees up resources, thus preventing in a proactive manner, the unplanned and potentially expensive system outages APPOROACHES TO SOFTWARE due to the software aging. Since the REJUVENATION preventive action can be done at optimal times, for example when the Software rejuvenation can be divided load on the system is low, it reduces broadly into two approaches as the cost of system downtime follows. compared to reactive recovery from Open-loop approach: In this failure. Thus, software rejuvenation is approach, rejuvenation is performed a cost-effective technique for dealing without any feedback from the with software faults that include system. Rejuvenation in this case, can protection not only against hard be based just on elapsed time failures, but against performance (periodic rejuvenation) and/or degradation as well. instantaneous/cumulative number of jobs on the system. Closed-loop approach: In the closed- approach is best suited for systems loop approach, rejuvenation is whose behavior is fairly deterministic. performed based on information on The on-line closed-loop approach, on the system “health”. The system is the other hand, performs on-line monitored continuously (in practice, analysis of system data collected at at small deterministic intervals) and deterministic intervals. The analysis is data is collected on the operating done after every new set of data is system resource usage and system collected to estimate time to activity. This data is then analyzed to rejuvenate. This approach is very estimate time to exhaustion of a general and can work with systems resource which may lead to a with unpredictable behavior or whose component or an entire system behavior cannot be easily determined. degradation/crash. This estimation In this case, future system behavior is can be based purely on time or can be computed based on the current system based on both time and system parameter values and weighted workload. Another approach to historical values. This classification of estimate the optimal time to approaches to rejuvenation is shown rejuvenation could be based on in the adjacent figure. system failure data. The closed-loop approach can also be classified based on whether the data analysis is done off-line or on-line. Off-line data analysis is done based on system data collected over a period of time (usually weeks or months). The analysis is done to estimate time to rejuvenation. This off-line analysis CONCLUSION: REFERENCES: Software rejuvenation process 1."Software rejuvenation: analysis, removes the accumulated errors and module and applications" frees up operating system resources, By Huang, Y.; Kintala, .; Kolettis, .; thus preventing in a proactive manner, Fulton,N.D.;Fault-Tolerant the unplanned and potentially Computing, 1995. FTCS-25. Digest of expensive system outages due to the Papers., Twenty-Fifth International software aging. Since the preventive Symposium on 27-30 June 1995 action can be done at optimal times, Page(s):381 – 390 for example when the load on the 2. "Optimal software rejuvenation system is low, it reduces the cost of policy with discounting" By Dohi, T.; system downtime compared to Danjou, T.; Okamura, H.; reactive recovery from failure. Thus, Source:Dependable Computing, 2001. software rejuvenation is a cost- Proceedings. 2001 Pacific Rim effective technique for dealing with International Symposium on 17-19 software faults that include protection Dec. (2001 Page(s):87 - 94 ) not only against hard failures, but against performance degradation as well. Numerous examples of software 3. "Analysis of software cost models rejuvenation exists in real-life with rejuvenation" By Dohi, T.; applications. More recently, Goseva-Popstojanova, K.; Trivedi, rejuvenation has been implemented K.S.; High Assurance Systems in.IBM’s xSeries servers to improve Engineering, 2000, Fifth IEEE performance and availability. International Symposim on. HASE 2000 15-17 Nov. 2000 Page(s):25 - 34]

4."A dynamic programming algorithm Jun2004, Vol. 87 Issue 6, p23-31, 9p for software rejuvenation scheduling under distributed computation circumstance" By Okamura, H.; Iwamoto, K.; Dohi, T.; Parallel and Distributed Systems, 2005. Proceedings. 11th International Conference on Volume 2, 20-22 July 2005 Page(s):493 - 497 Vol. 2

5."Performance analysis of software rejuvenation" By Fan Xin-yuan; Xu Guo-zhi; Ying Ren-dong; Zhang Hao; Jiang Le-tian; Parallel and Distributed Computing, Applications and Technologies, 2003. PDCAT'2003. Proceedings of the Fourth International Conference on 27-29 Aug. 2003 Page(s):562 ? 566

6."Estimation of discrete-time software rejuvenation schedule based on the cost-effectiveness" By Dohi, Tadashi Iwamoto, Kazuki Okamura, Hiroyuki Kaio, Naoto Electronics & Communications in Japan, Part 3: Fundamental Electronic Science;