Monitoring Architecture for Public Cloud Computing Data Centers

J Grid Computing (2016) 14:283–297 DOI 10.1007/s10723-015-9357-4 IaaSMon: Monitoring Architecture for Public Cloud Computing Data Centers Juan Gutierrez-Aguado · Jose M. Alcaraz Calero · Wladimiro Diaz Villanueva Received: 2 June 2014 / Accepted: 23 November 2015 / Published online: 4 March 2016 © The Author(s) 2016. This article is published with open access at Springerlink.com Abstract Monitoring of cloud computing infrastruc- control and data planes of OpenStack, a well-known tures is an imperative necessity for cloud providers stack for cloud infrastructures. As a result, our new and administrators to analyze, optimize and discover monitoring architecture is able to extend the exiting what is happening in their own infrastructures. Current Nagios functionalities to fit in the monitoring of cloud monitoring solutions do not fit well for this purpose infrastructures. The proposed architecture has been mainly due to the incredible set of new requirements designed, implemented and released as open source to imposed by the particular requirements associated to the scientific community. The proposal has also been cloud infrastructures. This paper describes in detail empirically validated in a production-level cloud com- the main reasons why current monitoring solutions puting infrastructure running a test bed with up to do not work well. Also, it provides an innovative 128 VMs where overhead and responsiveness has been monitoring architecture that enables the monitoring carefully analyzed. of the physical and virtual machines available within a cloud infrastructure in a non-invasive and trans- Keywords Cloud computing · Monitoring · parent way making it suitable not only for private Distributed monitoring · Network management · cloud computing but also for public cloud comput- Infrastructure-as-a-service ing infrastructures. This architecture has been validated by means of a prototype integrating an existing enterprise-class monitoring solution, Nagios, with the 1 Introduction Cloud computing is changing radically the way in J. Gutierrez-Aguado · W. Diaz Villanueva Departament d’Informatica,` Universitat de Valencia,` Avda. which businesses, governments, researches and con- De la Universitat, s/n 46100 Burjassot, Valencia, Spain sumers are using computational power. Cloud Compu- J. Gutierrez-Aguado ting enables them to make better use of their own com- e-mail: [email protected] putational resources (private cloud) and to rent computational resources to third-parties on-demand (pub- W. Diaz Villanueva e-mail: [email protected] lic cloud) to satisfy their constantly changing computational requirements. Public cloud infrastructures are J. M. Alcaraz Calero () associated to scenarios where cloud users do not have School of Engineering and Computing, University any control over the management of the physical of the West of Scotland, Paisley Campus, Paisley, PA1 2BE, Scotland topology of the infrastructures where they are renting e-mail: [email protected] virtual machines (aka VMs), i.e. virtual topologies. 284 J. Gutierrez-Aguado et al. However, users have control over their rented VMs a transparent monitoring of the customer s VMs, and and what services are being executed therein. at same time, an agent-based monitoring of the rest of From the point of view of the cloud provider, it the resources such as physical machines, hard disks, is imperative to analyze, optimize and discover what etc. The following list of requirements summarizes the is happening in their entire infrastructure. To do so, key and unique features and requirements that make monitoring tools are essential to identify anomalies, the monitoring of public cloud computing infrastruc- analyze behavior, optimize infrastructure resources, tures a challenge. Some of these items are already provide feedback to consumer, check infrastructure available in almost all the current monitoring solu- healthy, perform auto-scaling of resources [11], allo- tions; however, some others are simply not supported cate resources in multi-clouds [9, 15], or monitor nowadays: Service Level Agreement [3]. The vast majority of monitoring tools available in the market do not fit – R1. To perform a transparent monitoring of VMs for the purpose of monitoring cloud infrastructures. (no tools installed in the customer s VM) On the one hand, their architectural design usually – R2. To perform an agent-based monitoring of the imposes the installation of a software agent in the management VMs resources to be monitored to extract metrics. However, – R3. To perform an agent-based monitoring of the cloud users are especially reluctant to run any kind physical machines of services/software in their rented VMs for providing – R4. To enable correlation between metrics from information to third parties so that this requirement VMs and physical machines where they are allo- is simply not acceptable in cloud environments. On cated. the other hand, current monitoring tools do not deal – R5. To quickly adapt against frequent changes in efficiently with the new life-cycle associated to vir- the virtual topology using efficient auto-discovery tual topologies. For example, current monitoring tools protocols. assume that monitored resources will remain always – R6. To quickly adapt against IP address re- with the same IP address. Then, if such address is not assignation. responding, it is assumed that resources are shutdown – R7. To quickly adapt against changes in VM state or failing. However, this is not true at all in cloud (VM life-cycle). infrastructures where IP addresses are being highly – R8. To integrate the monitoring architecture with reused and dynamically assigned to different VMs in the management plane of the cloud stack to keep a matter of seconds. Traditional monitoring solutions continuously synchronized the status of the cloud will not realize of this re-use of IP addresses and will stack and the monitoring tool. consider the monitored resource always to be same – R9. To integrate the monitoring architecture with one even when they are now monitoring a completely the data plane of the cloud stack to keep continu- different resource. These facts make difficult for cloud ously synchronized the status of the VMs and the providers to implement effective monitoring solutions monitoring tool. for their infrastructure and it is required the design – R10. To be high scalable, suitable for monitoring of novel non-intrusive monitoring solutions running large amount of resources efficiently. in a transparent way from the point of view of the cloud users whereas they provide accurate informa- More than 50 different monitoring solutions have tion for the cloud provider. This is exactly the main been analyzed in this work. None of them meet all motivation of this research work. Our contribution these requirements simultaneous. It is not our purpose is, to the best of our knowledge, the first attempt to to provide a completely new monitoring framework integrate both the control and data planes of a cloud designed from scratch. An analysis of existing moni- computing infrastructure with an existing monitoring toring tools has been done (lately explained in detail tool to fit in the monitoring of the completely new in Section 2) to select a good candidate to be extended life-cycle associated to virtual cloud infrastructures. and adapted to fulfill all these requirements and thus This novel integration provides an effective monitor- making it suitable for the monitoring of cloud infras- ing solution for public cloud infrastructures allowing tructures. According to a recent study performed by IaaSMon: Monitoring Architecture for Public Cloud Computing Data Centers 285 Dataloop,1 Nagios is the dominant monitoring tool for design of the monitoring architecture proposed. Then, cloud infrastructures in the market even if it has not yet Section 6 provides implementation details of the pro- completely being adapted for such purpose. Nagios posed architecture. Section 7 describes the different has been selected as a base monitoring software to test beds carried out and the intensive testing done be extended in this research work due to its flexibil- in order to validate both the architecture and proto- ity, world-wide acceptance, suitability for large-scale type presented. And finally, Section 8 describes some deployments, due to the incredibly large number of conclusions about this contribution. extensions available and specially due to the fact that auto discovery algorithms of new resources is completely customizable. 2 Related Works The proposed architecture described in this contribution provides support for all the above list of Open source monitoring solutions for cloud infras- requirements. To achieve it, the architecture is based tructures are really scarce and only a few proofs of on the integration between the monitoring tool, and concept are available. Brandt et al. [2] provide OVIS, a its resource discovery protocol, and the control and distributed monitoring infrastructure for High Perfor- data planes of the cloud computing infrastructure. mance Computing (HPC). Wuhib and Stadler [18]also This integration is in fact our main contribution. The provide a distributed monitoring framework for large architecture has been prototypically implemented by cloud infrastructures. Kertesz [8] provides an archi- means of the integration between OpenStack, a well- tecture where different cloud providers collect infor- known enterprise-class cloud computing stack used mation to decide where

Load more