The Ganglia Distributed Monitoring System: Design, Implementation, and Experience
Total Page:16
File Type:pdf, Size:1020Kb
The Ganglia Distributed Monitoring System: Design, Implementation, and Experience Matthew L. Massie Brent N. Chun David E. Culler Univ. of California, Berkeley Intel Research Berkeley Univ. of California, Berkeley [email protected] [email protected] [email protected] Abstract growth, system configurations inevitably need to adapt. In summary, high performance systems today have sharply di- Ganglia is a scalable distributed monitoring system for high- verged from the monolithic machines of the past and now performance computing systems such as clusters and Grids. face the same set of challenges as that of large-scale dis- It is based on a hierarchical design targeted at federations of tributed systems. clusters. It relies on a multicast-based listen/announce proto- col to monitor state within clusters and uses a tree of point- One of the key challenges faced by high performance to-point connections amongst representative cluster nodes to distributed systems is scalable monitoring of system state. federate clusters and aggregate their state. It leverages widely Given a large enough collection of nodes and the associated used technologies such as XML for data representation, XDR computational, I/O, and network demands placed on them for compact, portable data transport, and RRDtool for data by applications, failures in large-scale systems become com- storage and visualization. It uses carefully engineered data monplace. To deal with node attrition and to maintain the structures and algorithms to achieve very low per-node over- health of the system, monitoring software must be able to heads and high concurrency. The implementation is robust, quickly identify failures so that they can be repaired either has been ported to an extensive set of operating systems and automatically or via out-of-band means (e.g., rebooting). In processor architectures, and is currently in use on over 500 large-scale systems, interactions amongst the myriad com- clusters around the world. This paper presents the design, putational nodes, network switches and links, and storage implementation, and evaluation of Ganglia along with expe- devices can be complex. A monitoring system that captures rience gained through real world deployments on systems of some subset of these interactions and visualizes them in inter- widely varying scale, configurations, and target application esting ways can often lead to an increased understanding of a domains over the last two and a half years. system’s macroscopic behavior. Finally, as systems scale up and become increasingly distributed, bottlenecks are likely to 1 Introduction arise in various locations in the system. A good monitoring system can assist here as well by providing a global view of Over the last ten years, there has been an enormous shift the system, which can be helpful in identifying performance in high performance computing from systems composed of problems and, ultimately, assisting in capacity planning. small numbers of computationally massive devices [18, 12, 11, 19] to systems composed of large numbers of commodity Ganglia is a scalable distributed monitoring system that components [3, 9, 4, 7, 6]. This architectural shift from the was built to address these challenges. It provides scalable few to the many is causing designers of high performance monitoring of distributed systems at various points in the ar- systems to revisit numerous design issues and assumptions chitectural design space including large-scale clusters in a pertaining to scale, reliability, heterogeneity, manageability, machine room, computational Grids [14, 15] consisting of and system evolution over time. With clusters now the de federations of clusters, and, most recently, has even seen facto building block for high performance systems, scale and application on an open, shared planetary-scale application reliability have become key issues as many independently testbed called PlanetLab [21]. The system is based on a hier- failing and unreliable components need to be continuously archical design targeted at federations of clusters. It relies on accounted for and managed over time. Heterogeneity, previ- a multicast-based listen/announce protocol [29, 10, 1, 16] to ously a non-issue when running a single vector supercom- monitor state within clusters and uses a tree of point-to-point puter or an MPP, must now be designed for from the be- connections amongst representative cluster nodes to feder- ginning, since systems that grow over time are unlikely to ate clusters and aggregate their state. It leverages widely scale with the same hardware and software base. Manage- used technologies such as XML for data representation, XDR ability also becomes of paramount importance, since clus- for compact, portable data transport, and RRDtool for data ters today commonly consist of hundreds or even thousands storage and visualization. It uses carefully engineered data of nodes [7, 6]. Finally, as systems evolve to accommodate structures and algorithms to achieve very low per-node over- 1 heads and high concurrency. The implementation is robust, ing and evolution of systems over time implies that hardware has been ported to an extensive set of operating systems and and software will change. This, in turn, requires addressing processor architectures, and is currently in use on over 500 issues of extensibility and portability. clusters around the world. The key design challenges for distributed monitoring sys- This paper presents the design, implementation, and eval- tems thus include: uation of the Ganglia distributed monitoring system along Scalability: The system should scale gracefully with with an account of experience gained through real world de- the number of nodes in the system. Clusters today, for ployments on systems of widely varying scale, configura- example, commonly consist of hundreds or even thou- tions, and target application domains. It is organized as fol- sands of nodes. Grid computing efforts, such as Tera- lows. In Section 2, we describe the key challenges in building Grid [22], will eventually push these numbers out even a distributed monitoring system and how they relate to dif- further. ferent points in the system architecture space. In Section 3, we present the architecture of Ganglia, a scalable distributed Robustness: The system should be robust to node and monitoring system for high-performance computing systems. network failures of various types. As systems scale in In Section 4, we describe our current implementation of Gan- the number of nodes, failures become both inevitable glia which is currently deployed on over 500 clusters around and commonplace. The system should localize such the world. In Section 5, we present a performance analysis failures so that the system continues to operate and de- of our implementation along with an account of experience livers useful service in the presence of failures. gained through real world deployments of Ganglia on sev- eral large-scale distributed systems. In Section 6, we present Extensibility: The system should be extensible in the related work and in Section 7, we conclude the paper. types of data that are monitored and the nature in which such data is collected. It is impossible to know a priori 2 Distributed Monitoring everything that ever might want to be monitored. The system should allow new data to be collected and mon- In this section, we summarize the key design challenges itored in a convenient fashion. faced in designing a distributed monitoring system. We then discuss key characteristics of three classes of distributed sys- Manageability: The system should incur management tems where Ganglia is currently in use: clusters, Grids, and overheads that scale slowly with the number of nodes. planetary-scale systems. Each class of systems presents a For example, managing the system should not require a different set of constraints and requires making different de- linear increase in system administrator time as the num- sign decisions and trade-offs in addressing our key design ber of nodes in the system increases. Manual configura- challenges. While Ganglia’s initial design focus was scalable tion should also be avoided as much as possible. monitoring on a single cluster, it has since naturally evolved Portability: The system should be portable to a variety to support other classes of distributed systems as well. Its of operating systems and CPU architectures. Despite use on computational Grids and its recent integration with the recent trend towards Linux on x86, there is still wide the Globus metadirectory service (MDS) [13] is a good ex- variation in hardware and software used for high perfor- ample of this. Its application on PlanetLab is another, one mance computing. Systems such as Globus [14] further which has also resulted in a reexamination of some of Gan- facilitate use of such heterogeneous systems. glia’s original design decisions. Overhead: The system should incur low per-node over- 2.1 Design Challenges heads for all scarce computational resources includ- Traditionally, high performance computing has focused on ing CPU, memory, I/O, and network bandwidth. For scalability as the primary design challenge. The architec- high performance systems, this is particularly impor- tural shift towards increasingly distributed and loosely cou- tant since applications often have enormous resource pled systems, however, has raised an additional set of chal- demands. lenges. These new challenges arise as a result of several fac- tors: increased physical