Rebootless Kernel Patching with Ksplice Uptrack at BNL Christopher Hollowell James Pryor Jason Smith

Introduction Ksplice/Oracle Uptrack is a software tool and update subscription service which allows system administrators to apply security and bug fix patches to the running on servers/workstations without rebooting them. The software is actively being used at the RHIC/ATLAS Computing Facility (RACF) at Brookhaven National Laboratory (BNL), on roughly 2,000 hosts running Scientific Linux and Enterprise Linux.

Uptrack At RACF Primarily being utilized on our processor farm.

Figure 1 - Ksplice Uptrack Web Status GUI Also installed on critical hosts where reboots are highly disruptive. Oracle Acquisition Oracle acquired Ksplice on July 21, 2011. Rebooting our processor farm systems requires advance . Continued support for existing customers' Red Hat -Most of our users do not checkpoint the progress of Enterprise and Scientific Linux installations. their batch or interactive jobs. -No impact on our operation so far. -Many jobs executed on our farm are long running: -New customers may only obtain licenses for often 24 hours or more. Uptrack use on Oracle Enterprise Linux. -A large percentage of the nodes have dual roles as -If enough people in the HEP/NP community dCache, XROOTD, or HDFS data servers. request RHEL/SL Uptrack licenses, perhaps we can change Oracle's position? Scheduling reboots, and implementing them in a “rolling” manner (subsets of the farm at a time) can take Still possible to download the “raw” Ksplice utility days or weeks. -Software released under the GNU GPL. -Our facility supports many experiments: requires -Allows one to create and insert their own interaction and compromise with several different rebootless kernel patches. computing coordinators. -Assuming there are no changes affecting the -Takes a considerable amount of administrative time layout or semantics of data structures, source to implement and execute. patches from RHEL can be used without -Systems are exposed to critical kernel vulnerabilities modification. until they are rebooted. -According to Ksplice's engineers, this is usually the case for security updates. Experience With Uptrack -http://oss.oracle.com/ksplice/software We've had a positive experience with the software since we began using it in April 2011. How Ksplice Works -Updates generally become available on or to Compiles the kernel from the original source with and the day they are released by the Scientific Linux without the specified . (SL) developers. -Object files are compared for differences. -Occasionally Uptrack updates are available even -”-ffunction-sections” passed to gcc to simplify object before they are released for SL. code analysis: each function is assigned its own ELF -Uptrack has minimized the time we are exposed to section. critical kernel vulnerabilities, and eliminated user Kernel modules are impact. inserted which contain -Uptrack updates have caused no instabilities/crashes the modified functions, on our hosts. and code which -Ksplice/Oracle has been very flexible with us. replaces the first -They modified their software such that it would instruction of each function with a locally caching site proxy. All our original function in systems obtain their updates through this host. memory with a “jmp” -While only security updates are automatically instruction to the provided through the Uptrack subscription patched version. service, a bug fix update was provided when we requested it: Before the running -RHEL Bugzilla #596548: kernel is modified, it is dentry_stat->nr_unused corruption quiesced by calling “stop_machine()”. Applying Updates Checks are also Very straightforward. performed to verify that -Run “uptrack-upgrade -y” on the hosts to be updated. the stack traces of all -Status monitoring via a supplied web interface. kernel threads do not -”uptrack-show” displays local updates. contain references to Figure 2 – Using the “raw” Ksplice utility to create and replaced functions. install a custom rebootless patch to sys_open() on a machine at BNL. When loaded, the change causes the kernel to log calling name, pid, and filename each time the () system call is executed.