Non-intrusive, Out-of-band and Out-of-the-box Systems Monitoring in the Cloud
Sahil Suneja Canturk Isci Vasanth Bala University of Toronto IBM T.J. Watson Research IBM T.J. Watson Research [email protected] [email protected] [email protected] Eyal de Lara Todd Mummer t University of Toronto IBM T.J. Watson Research [email protected] [email protected]
ABSTRACT Categories and Subject Descriptors The dramatic proliferation of virtual machines (VMs) in dat- K.6.4 [Management of Computing and Information acenters and the highly-dynamic and transient nature of VM Systems]: System Management—Centralization/ decentral- provisioning has revolutionized datacenter operations. How- ization; D.4.7 [Operating Systems]: Organization and ever, the management of these environments is still carried Design—Distributed systems;C.5.0[Computer System out using re-purposed versions of traditional agents, origi- Implementation]: General; C.4 [Performance of Sys- nally developed for managing physical systems, or most re- tems]: Design studies cently via newer virtualization-aware alternatives that re- quire guest cooperation and accessibility. We show that these existing approaches are a poor match for monitoring Keywords and managing (virtual) systems in the cloud due to their de- Virtualization; Virtual Machine; Cloud; Data Center; Mon- pendence on guest cooperation and operational health, and itoring; Analytics; Agentless; VMI their growing lifecycle management overheads in the cloud. In this work, we first present Near Field Monitoring (NFM), our non-intrusive, out-of-band cloud monitoring and analyt- 1. INTRODUCTION ics approach that is designed based on cloud operation prin- Cloud computing and virtualization technologies are dra- ciples and to address the limitations of existing techniques. matically changing how IT systems operate. What used NFM decouples system execution from monitoring and ana- to be a relatively static environment, with fixed physical lytics functions by pushing monitoring out of the targets sys- nodes, has quickly transformed into a highly-dynamic en- tems’ scope. By leveraging and extending VM introspection vironment, where (clusters of) virtual machines (VMs) are techniques, our framework provides simple, standard inter- programmatically provisioned, started, replicated, stopped faces to monitor running systems in the cloud that require no and deprovisioned with cloud APIs. VMs have become the guest cooperation or modification, and have minimal effect processes of the cloud OS, with short lifetimes and a rapid on guest execution. By decoupling monitoring and analyt- proliferation trend [24]. ics from target system context, NFM provides “always-on” While the nature of data center operations has changed, monitoring, even when the target system is unresponsive. the management methodology of these (virtual) machines NFM also works “out-of-the-box” for any cloud instance as has not adapted appropriately. Tasks, such as performance it eliminates any need for installing and maintaining agents monitoring, compliance and security scans, and product dis- or hooks in the monitored systems. We describe the end-to- covery amongst others are carried out using re-purposed ver- end implementation of our framework with two real-system sions of tools originally developed for managing physical sys- prototypes based on two virtualization platforms. We dis- tems or via newer virtualization-aware alternatives that re- cuss the new cloud analytics opportunities enabled by our quire guest cooperation and accessibility. These approaches decoupled execution, monitoring and analytics architecture. require a communication channel, i.e., a hook,intotherun- We present four applications that are built on top of our ning system, or the introduction of a software component, framework and show their use for across-time and across- i.e., an agent,withinthesystemruntime.Therearetwokey system analytics. problems with the existing approaches: First, proliferation of VMs and their ephemeral, short- lived nature, makes the cost of provisioning and maintaining hooks and agents a major pain point. Moreover, the mon- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed itoring extensions “pollute” the end-user system, intervene for profit or commercial advantage and that copies bear this notice and the full cita- with guest execution, and potentially open up new points of tion on the first page. Copyrights for components of this work owned by others than vulnerability. A recent observation from Amazon Web Ser- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- vices [3] highlights how agent operation and maintenance publish, to post on servers or to redistribute to lists, requires prior specific permission issues can impact managed systems. In this case, an incom- and/or a fee. Request permissions from [email protected]. plete maintenance update for DNS configuration in some of SIGMETRICS’14, June 16–20, 2014, Austin, Texas, USA. Copyright 2014 ACM 978-1-4503-2789-3/14/06 ...$15.00. the agents, coupled with a memory leak issue led to a per- http://dx.doi.org/10.1145/2591971.2592009 . formance degradation for part of Amazon storage services. This observation drives the first research question we address VM console-like interface enabling administrators to query in our work: How can we perform IT operations (monitor- system state without having to log into guest systems, as ing, compliance, etc.) without relying on guest cooperation well as a handy “time travel” capability for forensic analy- or in-VM hooks? sis of systems, and (iv) a hypervisor-paging aware out-VM Second, existing techniques work only so long as the mon- virus scanner that demonstrates how across-stack knowledge itored systems function properly, and they fail once a system of system state can dramatically improve the operational becomes unresponsive—exactly when such monitoring infor- efficiency of common management applications like virus mation is the most valuable. A recent Google outage [12] scan. Finally, we present a quantitative evaluation showcas- presents a prime example to the effect of this limitation, ing NFM’s high accuracy, monitoring frequency, reliability where a significant portion of Google’s production systems and efficiency, as well as low impact on monitored systems. became unresponsive due to a dynamic loader misconfigu- The rest of the paper is organized as follows. Section ration. As a result of this, none of the in-system agents 2summarizesthetechniquesemployedtodayforenterprise could publish data outside, neither was it possible to log in virtual systems monitoring. Section 3 gives a high level view to the impacted systems for manual diagnosis. Thus,it was of our solution architecture and discusses its benefits over extremely difficult to get system information when it was existing alternatives. Section 4 gives the implementation most crucial. This observation drives our second research details. Section 5 describes the applications we have built to question: How can we monitor and manage systems even demonstrate NFM’s capability. Section 6 evaluates NFM’s when they become unresponsive or are compromised? performance. Section 7 presents related work and Section 8 To address these research challenges, this paper intro- offers our conclusions. duces Near Field Monitoring (NFM), a new approach for system monitoring that leverages virtualization technology 2. EXISTING TECHNIQUES to decouple system monitoring from system context. NFM System monitoring has been a major part of enterprise IT extends VM introspection (VMI) techniques, and combines operations. We categorize the various techniques employed these with a backend cloud analytics platform to perform today as follows: monitoring and management functions without requiring ac- cess into, or cooperation of the target systems. NFM crawls 1. Application-specific in-VM agents for monitoring guest VM memory and disk state in an out-of-band manner from systems. outside the guest’s context, to collect system state which is 2. Application specific hooks for remotely accessing and mon- then fed to the analytics backend. The monitoring functions itoring systems. simply query this systems data, instead of accessing and in- 3. Limited black-box metrics collected by the virtualization truding each running system. In stark contrast with existing layer without guest cooperation. techniques, NFM seamlessly works even when a system be- comes unresponsive (always-on monitoring), and does not 4. General purpose agents or hooks that provide generic in- require the installation and configuration of any hooks or VM information through the virtualization layer. agents in the target system (it works out-of-the-box). Unlike Existing cloud monitoring and management solutions em- the in-VM solutions that run within the guest context and ploy one or more of the above methods to deliver their ser- compete for resources allocated to the guest VMs, NFM is vices. For example, Amazon’s CloudWatch [2] service falls non-intrusive and does not steal guests’ cycles or interfere under the third category in its base operation, while it can be with their actual operation. We believe our approach lays extended by the end users with in-VM data providers (as in the foundation for the right way of systems monitoring in the first category) to provide deeper VM-level information. the cloud; very much like how we monitor processes in an OS Other solutions such as Dell Quest/VKernel Foglight [14] today. NFM is better suited for responding to the ephemeral is a combination of all three - guest remote access, in-VM nature of VMs and further opens up new opportunities for agents, and hypervisor level metrics exported by VM man- cloud analytics by decoupling VM execution and monitor- agement consoles like VMware vCenter and Red Hat Enter- ing. prise Management. To mitigate the limitations, and in par- There are three primary contributions of our work. First, ticular, intrusiveness of custom in-VM techniques, an emerg- we present our novel non-intrusive, out-of-band cloud mon- ing approach has been the use of general purpose agents or itoring and analytics framework, which we fully implement backdoors inside the VMs that supply in-VM state informa- on real systems. It provides a service to query live as well tion to the virtualization layer (fourth category). Various as historical information about the cloud, with cloud mon- virtualization extensions such as VMware tools [48], VM- itoring and analytics applications acting as clients of this Safe API [50], VMCI driver [47] and vShield endpoint [49] service. Our work presents how we can treat systems as doc- follow this approach. Other solutions also employ similar uments and leverage familiar paradigms from the data an- techniques and are summarized in Section 7. alytics domain such as document differencing and semantic There are important caveats with either of these options. annotations to analyze systems. We also develop methods First, in-VM solutions are only as good as the monitored for low-latency, live access to VM’s memory with optional system’s operational health. This poses an interesting co- consistency support, as well as optimizations that enable nundrum, where the system information becomes unavail- subsecond monitoring of systems. Second, we build several able exactly when it is most critical—when the VM hangs, is applications on top of our NFM framework based on actual unresponsive or subverted. Second, in-VM solutions face an enterprise use cases. These include (i) a cloud topology dis- uphill battle against the emerging cloud operation principles— covery and evolution tracker application, (ii) a cloud-wide ephemeral, short-lived VMs—and increasing VM prolifer- realtime resource monitor providing a more accurate and ation. Their maintenance and lifecycle management has holistic view of guests’ resource utilization, (iii) an out-of- become a major pain point in enterprises. Furthermore, 1. Decoupled Execution and Monitoring: By decou- pling target system execution from our monitoring and analytics tasks, we eliminate any implicit dependency be- tween the two. NFM operates completely out of band, and can continue tracking system state even when a sys- tem is hung, unresponsive or compromised. 2. No Guest Interference or Enforced Cooperation: Guest context and cycles are precious and belong to the Figure 1: Introspection and analytics architecture end user. Unlike most agent- or hook-based techniques that are explicitly disruptive to system operation, our de- both agents and hooks modify the target systems, interfere sign is completely non-intrusive. NFM does not interfere with their operation, consume end-user cycles and are prime with guest system operation at all, nor does it require any candidates for security vulnerabilities due to their external- guest cooperation or access to the monitored systems. facing nature. Third, even with generic-agent-based app- 3. Vendor-agnostic Design: Our design is based on a roaches, the problems of guest cooperation and intrusion generic introspection frontend, which provides standard do not completely disappear—although mitigated to some crawl APIs. Our contract with the virtualization layer is extent. However, these create a new challenge that goes only for base VMI functions—commonly available across against one of the key aspects of cloud computing: porta- hypervisors—exposing VM state in its rawest form (disk bility. Generic agents/hooks require specialization of VMs blocks and memory bytes). We do not require any custom for the virtualization layer providing the monitoring APIs, APIs or backdoors between the VM and the virtualization which leads to vendor locking. Additionally, custom solu- layer in our design. All the data interpretation and cus- tions need to be designed to work with each such agent tom intelligence are performed at the analytics backend, provider. which also means simplified manageability as opposed to maintaining in-VM agents or hooks across several VMs. 3. NFM’S DESIGN NFM’s key differentiating value is its ability to provide In addition to alleviating some of the limitations of ex- monitoring and management capabilities without requiring isting solutions, our design further opens up new oppor- guest system cooperation and without interfering with the tunities for cloud monitoring and analytics.First, guest’s runtime environment. Architecturally, our overall the decoupling of VM execution and monitoring inherently framework, depicted in Figure 1, can be viewed as an intro- achieves computation offloading,wherethemonitoring/anal- spection frontend and an analytics backend, which enables ysis computation is carried out outside the VM. This enables the VM execution - monitoring decoupling. The frontend us to run some heavy-handed, complex analytics,suchasfull provides an out-of-band view into the running systems, while compliance scans with no impact on the actual systems. Sec- the backend extracts and maintains their runtime state. The ond, many existing solutions actually track similar system analogy is that of a Google like service running atop the features. By providing a singular data provider (backend cloud, enabling a query interface to seek live, as well as datastore) for all analytics applications, we eliminate redun- historical information about the cloud. Cloud monitoring dancy in information collection. Third, by shifting the scope and analytics applications then simply act as the clients of of collected systems data from individual VMs to multiple this service. We develop introspection techniques that build cloud instances at the backend, we enable across-VM analyt- upon and extend traditional VMI approaches [20, 30], to ics,suchasVMpatterns,topologyanalysis,withsimplean- gain an out-of-band view of VM runtime state. The analyt- alytics applications running against the datastore. Fourth, ics backend triggers VM introspection and accesses exposed alongwith VM-level metrics, NFM is also exposed to host- VM state via Crawl APIs.ThroughtheseAPIs,ourCrawl level resource accounting measures, enabling it to derive a Logic accesses raw VM disk and memory structures. VM holistic and true view of VMs’ resource utilization and de- disk state is accessed only once, before crawling a VM for mand characteristics. the first time, to extract a small amount of information on NFM supports arbitrarily complex monitoring and analyt- the VM’s OS configuration. Crawl Logic uses this infor- ics functions on cloud VMs with no VM interference and no mation to access and parse raw VM memory to create a setup requisites for the users. Our framework is designed to structured, logical view of the live VM state. We refer to serve as the cornerstone of an analytics as a service (AaaS) this crawl output as a frame,representingapoint-in-time cloud offering, where the users can seamlessly subscribe and view of the VM state. All frames of all VMs are stored in unsubscribe to various out-of-the-box monitoring and ana- a Frame Datastore that the cloud monitoring and manage- lytics services, with no impact on their execution environ- ment applications run against, to perform their core func- ments. These services span a wide range, from simple re- tions, as well as to run more complex, cloud-level analytics source and security monitoring to across-the-cloud (anony- across time or across VMs. By decoupling monitoring and mous) comparative systems analysis. Users would have the management from the VM runtime, we enable these tasks to choice to opt into this service, paying for the cycles the hy- proceed without interfering with VM execution. The VMs pervisor and the monitoring/analytics subsystem spends on being monitored are never modified in our approach, nor is behalf of them. One consideration for such a service is pri- the hypervisor or the VMM. vacy for end users, however the guests already relinquish the NFM is designed to alleviate most of the issues with exist- same level of control to the existing agents running inside ing solutions (Section 2). We follow three key principles to their systems, we only argue in favour of the same level of achieve this in a form that is tailored to the cloud operation trust without the downside of having a potentially insecure principles: foreign entity installed inside their runtime environment. One limitation of NFM is a by-product of its reliance on decoupled from VM execution, and the upstream logic is VMI to crawl VM memory state, which involves interpret- transparent to the underlying virtualization technology. ing kernel data structures in memory. These data struc- Enhancements. (i) While our methods for exposing VM tures may vary across OS versions and not be publicly doc- state generally have negligible latencies for most cloud moni- umented for proprietary OSes. We discuss the tractability toring and analytics applications, we build further optimiza- of this in our implementation discussion (Section 4.2). Also, tions to our techniques to curb down their latencies. Among as NFM focuses on the OS-level system view, application- these, one notable optimization is selective memory extrac- level (e.g., a MapReduce worker’s health) or architectural tion for applications that rely on near-realtime information. (e.g., hardware counters) state are not immediately observ- The key insight behind this is that most applications actu- able via NFM unless exposed by the guest OS. Although, ally require access to a tiny fraction of VM memory space, the hypervisor might allow measuring certain architectural and therefore we can trade offcompleteness for speed. Af- states (e.g., perf counters) for a guest from outside. ter an initial full crawl, we identify VM memory regions that hold the relevant information, and in subsequent iterations 4. IMPLEMENTATION we opportunistically crawl only those sections. For dump- based handles, we batch relevant distributed memory regions We implement NFM on two real-system prototypes based together, to optimize the dump size and minimize dump cy- on two virtualization platforms, Xen and KVM. Currently cles. Since some data structures like task lists, dynamically we only support Linux guests, support for other OSes is grow, we add additional buffering and resizing heuristics to discussed at the end of Section 4.2. Our overall implemen- accommodate these. Also, several distributed memory re- tation can be broken into four parts: (i) exposing VM run- gions contained the relevant kernel data structure into small time state, i.e., accessing VM memory and disk state and number of larger sized chunks. Overall, with such optimiza- making them available over the crawl APIs; (ii) exploiting tions, we further reduce our overall latency impact by an VM runtime state, i.e., interpreting the memory and disk order of magnitude. For example, for one of our realtime images with the Crawl Logic to reconstruct the guest’s run- applications, CTop,thisapproachenablessubsecondgran- time information; (iii) persisting this VM state information ularity realtime system monitoring even with heavyweight, in the Frame Datastore for the subscribed applications; and dump-based methods, reducing overall latencies from few (iv) developing high level applications to build on top of the seconds to milliseconds. extracted VM runtime state. (ii) We do not expect the guest OS to be in some steady 4.1 Exposing VM State state while extracting its live runtime state from outside the guest context. Since we avoid making any monitoring Most hypervisors provide different methods to obtain a specific changes to the guest and not enforce any guest co- view of guest VM memory. VMWare provides VMSafe APIs operation, lack of synchronization with the guest OS can [50] to access guest memory. Xen [7] provides userspace rou- potentially lead to inconsistency of views between the ac- tine (xc_map_foreign_range) via its Xen Control library tual runtime state that exists inside the VM and what gets (libxc)tore-mapaguestVM’smemoryintoaprivileged reconstructed from the view exposed outside, eg. a pro- VM. We use this technique to obtain a live read-only handle cess terminating inside the guest while its memory mappings on the VM’s memory. KVM [26] does not have default sup- (mm_struct)arebeingtraversedoutside.Totacklethis,we port for live memory handles. Although other techniques [8, build additional routines for (optional) consistency support 22] are able to provide guest memory access by modifying for KVM guest memory access via ptrace()-attach/detach QEMU, our goal is to use stock hypervisors for general- on the QEMU container process. This trades offminor VM ity. For KVM, options exist to dump VM memory to a stuns for increased consistency. Our experiments show that file, via QEMU’s pmemsave memory snapshotting, or mi- even with a heavy (10 times/s) fork and kill workload con- grate-to-file, or libvirt’s dump. While the VM’s memory suming significant CPU, memory and network resources in- can be exposed in this manner, the overheads associated side a VM, inconsistency of extracted state occurs not very are non-trivial, as the guest gets paused during the dump often—in about 2% of crawler iterations, while extracting duration. Therefore, for KVM, we develop an alternative full system runtime state (Section 6.1). solution to acquire a live handle on VM memory, while in- ducing negligible overhead on the running VM. As KVM is part of a standard Linux environment, we leverage Linux 4.2 Exploiting VM State memory management primitives and access VM memory via The backend attaches to the VM view exposed by the QEMU process’ /proc/