Setup of a Ganglia Monitoring System for a Grid Computing Cluster

Aufbau eines Ganglia-Überwachungssystems für ein Grid-Computing-Cluster

von Oleg Ivasenko

Bachelorarbeit in Physik vorgelegt der Fakultät für Mathematik, Informatik und Naturwissenschaften der RWTH Aachen im September, 2015 angefertigt im Physikalischen Institut IIIb bei Prof. Dr. Achim Stahl

Zusammenfassung

Das RWTH Grid Cluster ist ein Tier 2/3 Rechenzentrum im WLHC Grid mit über zwei hundert Rechnern. Diese müssen bei Betrieb überwacht werden, um aufkommende Probleme analysieren zu können. Da das aktuell laufende System gewisse Inkompatibilitäten zu neuen Versionen des laufenden Betriebssystems zeigt, soll es durch Ganglia ersetzt werden. In dieser Arbeit wird der Aufbau des Ganglia Monitoring Systems und die Anpassung dessen an die Besonderheiten des RWTH Grid Clusters vorgenommen. Das Ganglia Monitoring System überwacht das gesamte Cluster und zeigt sich weiterhin skalierbar für zukünftige Erweiterungen.

Abstract

The RWTH Grid Cluster is a Tier 2/3 grid site in the WLHC Grid with over two hundred com- puters. Those have to be monitored while operating to analyze occurring problems. The currently running monitoring system shows incompatibilities with newer versions of the running and thus should be replaced by Ganglia. In this thesis the setup and the customization of the Ganglia Monitoring System to suit the RWTH Grid Cluster speci needs is detailed. The Ganglia Monitoring System monitors the whole Cluster and proves itself as further scalable for future extensions.

Contents

List of Figures I

List of Tables II

List of Listings II

1 Introduction 1 1.1 Worldwide LHC Computing Grid ...... 2 1.2 RWTH Grid Cluster ...... 3

2 Monitoring 5 2.1 Ganglia Monitoring System ...... 5 2.1.1 gmond ...... 6 2.1.2 RRDtool ...... 11 2.1.3 gmetad ...... 13 2.1.4 gweb ...... 15

3 Deployment 17 3.1 Installation ...... 17 3.2 Conguration ...... 17 3.2.1 Database ...... 17 3.2.2 Network Topology ...... 18 3.2.3 IO Optimisations ...... 18 3.3 Modules ...... 19 3.3.1 Torque ...... 20 3.3.2 Maui ...... 22 3.3.3 dCache ...... 23 3.4 Ganglia Web ...... 25 3.5 Extensions ...... 26 3.5.1 Job Monarch ...... 26

4 Conclusion 29

List of Figures

1.1 CERN's site plan [6] ...... 1 1.2 Hierarchical structure of the WLCG ...... 2 1.3 RWTH Grid Cluster overview ...... 3

2.1 Ganglia Monitoring System overview ...... 6 2.2 Structural overview of gmond ...... 6 2.3 File structure of a Round Robin Database ...... 12 2.4 Normalization and consolidation of reported data ...... 13 2.5 Structural overview of gmetad. LT: listening threads, IP: interactive ports . . . . . 13 2.6 Structure of the directory containing the RRDs ...... 14 2.7 Main navigation tabs of gweb ...... 15 2.8 Views of the main tab of Ganglia Web ...... 16

3.1 Impact of the monitoring system on the hosting node ...... 19 3.2 Optimized metric_callback function ...... 20 3.3 Update function for node statistics ...... 20 3.4 Update function for general job statistics ...... 20 3.5 Update function for jobs by queue statistics ...... 21 3.6 Update function for jobs by owner statistics ...... 21 3.7 Update function for job eciency by owner statistics ...... 22 3.8 Update function for reservations statistics ...... 23 3.9 Update function for fair share statistics ...... 24 3.10 dCache pool usage collection query ...... 24 3.11 dCache queues collection query ...... 25 3.12 Graph generated from the nodes statistics ...... 26 3.13 dCache view from the `Views' tab ...... 27 3.14 Job Monarch user interface ...... 28

4.1 Fully implemented Ganglia Monitoring System ...... 29

I

List of Tables

1.1 List of the 13 Tier 1 grid sites[29] ...... 2

2.1 List of default metrics: reformatted output of gmond -m part 1 ...... 9 2.2 List of default metrics: reformatted output of gmond -m part 2 ...... 10

3.1 RWTH Grid Cluster network topology ...... 18

List of Listings

2.1 gmond global settings ...... 7 2.2 gmond cluster settings (Worker Nodes) ...... 7 2.3 gmond host settings ...... 7 2.4 gmond udp send settings ...... 8 2.5 gmond udp receive settings ...... 8 2.6 gmond tcp accept settings ...... 8 2.7 gmond modules settings ...... 8 2.8 gmond collection group settings ...... 10 2.9 Skeleton python module[18] ...... 11 2.10 gmetad settings ...... 14 3.1 Output of showq -r ...... 22 3.2 Output of showres -n ...... 23 3.3 Output of mdiag -f (truncated) ...... 23 3.4 Node Report denition ...... 25 3.5 dCache view denition ...... 26 3.6 Job Monarch conguration ...... 27

II

1| Introduction

Build by the European Organization for Nuclear Research (CERN), the Large Hadron Collider (LHC) is the largest experimental facility in existence. It is located in a tunnel beneath the Swiss French border near Geneva, on the former site of the Large Electron-Positron Collider (LEP). In contrast to the LEP the LHC is able to accelerate much heavier particles and achieve signicantly higher collision energies. The LHC was build with collision energies of 14 TeV in mind, which corresponds to protons traveling at 99.9999991% the speed of light. Initially it was operated at much lower energies, and even this year as phase two of the experiment is started it still operates at 13 TeV. Higher collision energies enable physicists to observe physics on a much smaller scale with the possibility to generate even heavier and possibly previously unobserved particles. Just this month (August 2015) a paper[2] on the rst observation of a pentaquark was submitted for peer review.

CMS

North Area ALICE LHC LHCb 40 TT41TT

TI8 neutrinos TI2 TT10 SPS ATLAS CNGS TT60 Gran Sasso

TT2 AD BOOSTER ISOLDE p p East Area

n-ToF LINAC 2 PS CTF3 neutrons e– LINAC 3 LEIR Ions Figure 1.1: CERN's site plan [6]

Inside the LHC, two beams of protons or lead ions in two separate tubes are accelerated to nearly the speed of light in opposite direction from each other. A beam consists of 2808 bunches and each bunch contains about 1011 protons. 9593 superconducting magnets[5] keep the two beams on a circular trajectory of 27 km in circumference and aect the form of the beams. To maximize the collision rate the beams are highly concentrated at the four experiment sites (g. 1.1). These are the only sections of the LHC where the two tubes intersect and collisions are possible. 1. Compact Muon Solenoid (CMS)[7] is a general purpose detector with a solenoid magnet at its core that forces charged particles onto a circular trajectory. From the curvature of the trajectory the momentum of the particle can be determined. The goals of the CMS experiment are the study of the standard physics model, the search for extra dimensions and new particles. 2. ATLAS[4] has the same purpose as CMS, but is build up dierently. The magnetic eld in ATLAS is weaker and engulfs fewer of the sub-detectors. 3. Large Hadron Collider beauty (LHCb)[17] is used to examine the dierences of particles and antiparticles, particularly the b quark, with the goal to explain the absence of antimatter in the universe.

1 1.1. WORLDWIDE LHC COMPUTING GRID CHAPTER 1. INTRODUCTION

Country Tier 1 Grid site

Tier 0 Tier 3 Tier 1 Canada TRIUMF Germany KIT Tier 2 Spain PIC France IN2P3 Italy INFN Nordic countries Nordic Datagrid Facility Netherlands NIKHEF / SARA Republic of Korea GSDC at KISTI Russian Federation RRC-KI and JINR Taipei ASGC United Kingdom GridPP US Fermilab-CMS US BNL ATLAS

Figure 1.2: Hierarchical structure of the WLCG Table 1.1: List of the 13 Tier 1 grid sites[29]

4. A Large Ion Collider Experiment (ALICE)[3] examines quark-gluon plasma. A phase that matter is believed to have been in just after the Big Bang.

There are two more smaller experiments: TOTal Elastic and diractive cross section Measure- ment (TOTEM) that is placed in the same cavern as the CMS detector and Large Hadron Collider forward (LHCf) that is placed in the same cavern as the ATLAS detector. Both measure the same events as CMS and ATLAS respectively, but at a much more acute angle of diraction. As of 2015 the data from TOTEM and CMS will be combined to increase accuracy[32].

As soon as protons collide in one of the interaction points new particles are formed and decay into other particles. The resulting cascade of particles is interacting with the detectors. To be able to detect dierent particles the large detectors like CMS consist of a multitude of smaller sub- detectors that each have the ability to detect certain types of particles. Each collision is considered as one event. About 600 million events occur every second[16], which is why the data has to be ltered with hardware and software lters to retain only the events deemed as interesting (e.g. where four muons were detected, which is a typical signature of a certain Higgs boson dacay). Even with proper selection those experiments collect as much as 15 petabytes of data per year or 1 gigabyte per second. In phase two runs, which start this year, data rates of 6 GB/s are expected on average and peaking rates of up to 10 GB/s[36] are possible. Storing, distributing and analyzing this amount of data goes beyond the scope of what a single institution could provide in resources, which is why an international network of computing facilities that share their resources was initiated.

1.1 Worldwide LHC Computing Grid

The Worldwide LHC Computing Grid (WLCG)[35] is the world's largest computing grid. Combin- ing the resources from multiple international and regional grids, it provides the necessary resources to process and store the data generated by the LHC experiments. The contributors are mostly universities and other research facilities, or just single departments of those. Since their partner- ship can not accurately be described as an organization the term `virtual organization' was coined.

2 CHAPTER 1. INTRODUCTION 1.2. RWTH GRID CLUSTER

The term has no distinct denition but reects the operational parameters of the cooperation in a sense since the contributors share computational resources, as a single organization would, while remaining fully detached and autonomous otherwise. While the computing grid makes it possible for all contributors to benet from a more ecient utilization of the resources, it also introduces new challenges. The resources are spread over the globe and vary greatly in type, but still have to work harmoniously. A whole software stack, the so called `middleware', had to be build to make the resource management and distribution fair and ecient. On the other hand having the resources de-localized reduces the risk of data loss and a full failure of the system.

The WLCG is organized in a centralized hierarchical levels, Tier 0 to Tier 3 (g. 1.2).

• Tier 0 consists solely of the facility at CERN. There all of the collected data is stored and backed up on magnetic tape. CERN's computing facilities provide about 20% of the WLCG's computing power.

• Tier 1 facilities (tab. 1.1) are connected directly to CERN via the LHC Optical Private Network (LHCOPN)[29]. Each of the 13 facilities stores a subset of the experimental data related to the experiment they are aliated with. They make that data available to scientists to analyze at their respective sites but also distribute it further to Tier 2 facilities.

• Tier 2 facilities are mostly universities and national resources that provide their storage and computing power to the WLCG. There are about 160 of them. Tier 2 facilities store an even smaller subset of the data. Their primary task is analysis and simulation of the experiments.

• Tier 3 resources not controlled by the experiments. Those can be private networks or even single computers that provide resources to the WLCG. They are mainly an entry point for scientists to interact with the grid.

The RWTH Grid Cluster is considered to be both Tier 2 and Tier 3.

1.2 RWTH Grid Cluster

The RWTH Grid Cluster is build up of over 200 worker nodes (WN) that provide over 3700 cores that constantly crunch jobs and 51 storage servers (SE) that provide over 2 petabytes of disk space to store both raw and processed data. Neither the worker nodes, nor the storage servers are homogeneous. 129 WN are 8-core machines, 50 are 24-core machines and 48 are 32-core machines. The machines with more cores provide more internal storage and memory as well. Each storage server contains multiple storage pools, with the actual number varying between 1 and 5.

jobs Torque and Maui CE WN WN manage resources and schedule jobs Worker Nodes WN WN 227 Nodes 3768 Cores WN WN status

status BDII les le requests Grid Cluster WLCG

pool pool Storage Pools status pool pool 51 Servers les dCache dCache >2 Petabyte provide access to and optimize storage pool pool

Figure 1.3: RWTH Grid Cluster overview

3 1.2. RWTH GRID CLUSTER CHAPTER 1. INTRODUCTION

All computers in the RWTH Grid Cluster are running Scientic 6.5[28], a for scientic usage optimized distribution of Red Hat Enterprise Linux.

Apart from the WN and storage servers a few special nodes that run the middleware are nec- essary to run the RWTH Grid Cluster. In gure 1.3 can be seen how these nodes interact inside the RWTH Grid Cluster and connect it to the WLCG.

The rst special node is the computing element (CE). The CE's tasks are the job scheduling and resources management within the RWTH Grid Cluster. TORQUE Resource Manager[31] is used to monitor and distribute the resources Maui Cluster Scheduler[19] manages the job queues. This means TORQUE decides whether the resources to run a job are available and MAUI decides when and on which machine a job runs. Inside Maui dened user groups have set fair share targets and quality of service rankings that dene how MAUI schedules the jobs. The CE can both receive jobs from a central job management system in the WLCG as well as from user interfaces directly.

The dCache headnode (dCache) is running the dCache[10] server. dCache is a storage man- agement system: it combines all the storage servers into a single distributed le system. It reserves storage on all storage servers and separates it into so called pools. On the headnode itself a table of every le in every pool is maintained. If a le request is issued, dCache decides where the le is to be retrieved from or written to. When a le is requested frequently, dCache can create a copy of the le in a dierent pool on a dierent storage server to increase throughput and o oad some of the trac. dCache provides dierent so called doors to access the data. Each door represents a dierent access types (e.g. NFS, dCap, SRM, GSi-FTP) to the data. It can transfer les both to the WLCG as well as from it.

Both the dCache headnode and the computing element report the current state of the RWTH Grid Cluster to the node running the Berkeley Database Information Index (BDII) which publishes it to the WLCG. This data is used in the WLCG to check whether the RWTH Grid Cluster can currently run jobs or provide storage.

4 2| Monitoring

To maintain the grid operating it is important to monitor it. Arising problems need to be recog- nized early on to prevent loss of data, unnecessary maintenance and recovery.

The most popular tool to monitor networks is Nagios[21]. It already surveys every single computer in the RWTH Grid Cluster in form of the software Check_MK[14]. In case mea- sured values reach a threshold an alarm is triggered and the administrator is notied via email. Nagios is perfect for real time monitoring of single computers but not the grid as a whole. To compensate that weakness the monitoring tool LEMON[15] is running alongside it. LEMON is the historically rst monitoring system deployed in the RWTH Grid Cluster and was kept running after Check_MK was introduced as it compliments it well. Since it is based around an Oracle[23] database, a commercial and non-open product and, more importantly is causes problems when the OS is upgraded, it is necessary to nd a more open alternative that can replace LEMON eventually and ideally improve functionality as well.

There is a multitude of tools that are suited to ll LEMON's role, the most popular of which are MonALISA[20] (MONitoring Agents using a Large Integrated Services Architecture) and Ganglia[11]. Both provide a highly modular, extensible structure and can even be deployed alongside each other. The open source monitoring system Ganglia seems the most tting for this task as it provides a simple yet appealing web interface as the user front end and a very slim and modular monitoring agent for the collection of data. The tool was created with a very small footprint to minimize performance impact on the monitored machine and very few dependencies so it can be compiled to run on multiple platforms. Ganglia is fully congurable solely through modications in cong- uration les which makes it exceptionally easy to adapt and deploy it on large systems like the RWTH Grid Cluster.

2.1 Ganglia Monitoring System

First, it is important to note that in a Ganglia environment the terms grid and cluster are used dierently than in context with the RWTH Grid Cluster and the last chapter. In Ganglia terms a cluster is a group of monitored hosts that share similarities in conguration. The WN can and should be grouped as a cluster. A grid is a hierarchical level above the cluster, so the RWTH Grid Cluster as a whole is considered a grid within Ganglia. The Ganglia Monitoring System consists of three components (g. 2.1).

• gmond The ganglia monitoring daemon is installed on every single host and collects metric data. It also distributes the data within each cluster to provide the data to the poller with a single connection per cluster.

• gmetad The ganglia meta daemon polls every cluster for metric data and writes that data into the database using the RRDtool.

• gweb is the web based user front end for Ganglia. It accesses the database and draws graphs using the RRDtool but also polls gmetad for extra data that is not metric and not stored in the database (e.g. the OS of a host).

5 2.1. GANGLIA MONITORING SYSTEM CHAPTER 2. MONITORING

user

gmond gmond gmond gmond gweb

Cluster 1 Database RRDtool

gmetad gmond gmond gmond gmond

Cluster 2 Figure 2.1: Ganglia Monitoring System overview

Every component of the Ganglia Monitoring System has a limited purpose and is operable without the others. The general structure of the monitoring system works is clear, but to deploy it, it is important to understand how each component works en detail.

2.1.1 gmond gmond is Ganglia's monitoring agentalthough it conceptually is quite dierent from traditional agents as the one present in LEMON. It's not just waiting for a polling daemon to invoke a request to collect the data but does so autonomously and even centralizes the data for all hosts within a cluster on a single, or multiple machines. It has to be installed on every single host that needs to be monitored. The installation process is the same for every host as the complete set up happens via conguration les. Similar hosts are grouped into clusters and share most of not all conguration les.

Internally gmond can be separated into two parts as can be seen in g. 2.2. The modules each collect multiple metrics in a multithreaded fashion and send that data via UDP[34]. Dispatched from the modules a hash table is created in memory that contains metric data about every host that sends data to it. Even the locally collected data has to be transferred over UDP as there is no internal link in gmond to transfer that data. Per default the gmond sends collected data to a multicast address and receives data from other hosts through it as well (see g. 2.2a). This setup makes Ganglia work without modications on smaller networks but it is not t for large networks. Receiving data costs CPU cycles and memory. As a monitoring tool gmond should

Multicast group UDP UDP UDP UDP module module module module

module module module module

module module module module . metric data . metric data . metric data . metric data

gmond TCP gmond TCP gmond TCP gmond TCP (a) Multicast conguration (b) Unicast conguration with one gmond (right) deaf

Figure 2.2: Structural overview of gmond

6 CHAPTER 2. MONITORING 2.1. GANGLIA MONITORING SYSTEM

use as few of both as possible. To reduce the impact that collecting the data of hundreds of hosts would cause, gmond can be congured as deaf to make it stop listening or mute to make it stop collecting and sending metric data. A deaf node will not listen to any incoming data, not even the data that was collected locally.

Another aspect is network trac. It is far more ecient to dene single hosts or instances of gmond that are responsible for the collection of data and then send the data directly to them instead of going through a multicast group.

As can be seen in gure 2.2 gmond provides access to the metric data via TCP as well, which is used by gmetad when polling for the clusters metric data and can be used to get a XML dump of that data using telnet with the command telnet , which is very helpful when debugging.

The default location for gmond conguration les is /etc/ganglia/gmond.conf. The cong- urations are grouped in sections: global, cluster, host, modules and collection groups. Initially all parameters are dened within a single le but for convenience it is practical to split them by group into separate les. The global section denes the behavior of the gmond locally.

1 g l o b a l s { 2 daemonize =yes/ ∗When false gmond will run in foreground ∗/ 3 s e t u i d = yes/ ∗ Set UID to user ∗/ 4 user =ganglia/ ∗ d e f i n e user ∗/ 5 debug_level =0/ ∗Run gmond in foreground with debug messages ∗/ 6 max_udp_msg_len = 1472/ ∗Max size of messages to other gmond∗/ 7 mute = no/ ∗ D i s a b l e collecting and sending metric data ∗/ 8 deaf = no/ ∗ D i s a b l e listening to metric data ∗/ 9 allow_extra_data = yes/ ∗ Metrics used by gweb and other tools ∗/ 10 host_dmax =86400/ ∗ s e c s. Removes host from web interface in1 day ∗/ 11 host_tmax =20/ ∗ s e c s. Host considered down after4x this time ∗/ 12 cleanup_threshold = 300/ ∗ s e c s. Data retention time ∗/ 13 gexec = no/ ∗ gexec available ∗/ 14 send_metadata_interval = 0/ ∗ s e c s. Interval to anounce metadata about metrics.( must be >0 for unicast) ∗/ 15 } 16 i n c l u d e ("/etc/ganglia/conf.d/ ∗ .conf") Listing 2.1: gmond global settings The nal line imports all conguration les from the indicated sub-directory. Splitting the conguration le up into multiple ones is wise as it provides more structure to the settings and makes it possible to quickly change settings by overwriting les without aecting all settings.

The next dening section is cluster. This section is always the same for every node within a cluster and denes what cluster a gmond instance belongs to. Apart from the name attribute all settings are only relevant for the gweb interface.

1 c l u s t e r { 2 name ="Worker Nodes"/ ∗Name of the cluster ∗/ 3 owner ="Physik"/ ∗Owner of the cluster ∗/ 4 l a t l o n g ="50.778370 6.048490"/ ∗ Coordinates of the cluster ∗/ 5 u r l ="unspecified"/ ∗ Hyperlink for more information on the cluster ∗/ 6 } Listing 2.2: gmond cluster settings (Worker Nodes) The host section provides one single attribute. It only aects a key in gweb. If the location string is not formatted properly, gweb will display the location to be unknown. The machines are not bound to a physical location, so the location is left as is.

1 host { 2 l o c a t i o n ="unspecified"/ ∗gweb expects ",," ∗/ 3 } Listing 2.3: gmond host settings

7 2.1. GANGLIA MONITORING SYSTEM CHAPTER 2. MONITORING

The UDP send channel denes which hosts or muticast groups gmond should share the data it collects with. It is possible to dene multiple channels, but each channel has to be either unicast or multicast - the attributes mcast_join and host can not be used both within the same send channel. The ttl (time to live) is an optional parameter, but important for multicast congurations. It limits the amount of hops data can make and still remain valid to prevent old data from being passed back and forth. A hop is the transmission over a network element. The lower the ttl, the sooner a data packet is considered outdated and no longer passed on.

1 udp_send_channel { 2 bind_hostname = yes/ ∗ use the hostname instead ofIP Adress ∗/ 3 mcast_join = 239.3.11.71/ ∗ m u l t i c a s t adress to accept data from. ∗/ 4 /∗ host= ∗// ∗ h o s t s to accept data from. ∗/ 5 port = 8649/ ∗ port to recieve data through from other gmond∗/ 6 t t l = 1/ ∗ time−to−l i v e. Limits the amount of hops ∗/ 7 } Listing 2.4: gmond udp send settings

The UDP receive section is analogous to the send section. Additionally it is possible to dene an access control list, to make sure not every node is able to write to the current node.

1 udp_recv_channel { 2 mcast_join = 239.3.11.71/ ∗ m u l t i c a s t adress to accept data from. ∗/ 3 /∗ host= ∗// ∗ h o s t s to accept data from. ∗/ 4 port = 8649/ ∗ port to recieve data through from other gmond∗/ 5 bind =239.3.11.71/ ∗∗ / 6 } Listing 2.5: gmond udp receive settings

To make the metric data accessible to gmetad it is important to dene at least one TCP accept channel. Apart from the port all parameters are optional. It is possible to dene an access control list, and the interface to be used.

1 tcp_accept_channel { 2 port = 8649/ ∗ Port for access to xml data. Used by gmetad ∗/ 3 } Listing 2.6: gmond tcp accept settings

The remaining sections dene the metrics collected by gmond and the modules that provide them. Out of the box gmond oers and pre-congures the modules seen in table 2.1 and 2.2. Each module provides a set of metrics. The command gmond -m displays all currently collected metrics. To load a module a module element needs to be added to the modules section. The only non optional attributes are name and path. If the module was not written in C/C++, the language needs to be specied as well.

1 modules { 2 module { 3 name ="exaple_module"/ ∗Name of the module ∗/ 4 language ="C/C++"/ ∗ Language the module was written in ∗/ 5 enabled ="yes"/ ∗ Status of the module ∗/ 6 path ="modexaple.so"/ ∗Path to the module ∗/ 7 params ="Example"/ ∗ Extra parameter ∗/ 8 param ParamName {/ ∗ Parameters passed on to the module ∗/ 9 value = 25 10 } 11 } 12 } Listing 2.7: gmond modules settings

To make gmond collect metric data it is not enough to load the module. It is necessary to dene collection groups. Each collection group can contain multiple metrics. The attributes collect_once and collect_every are mutually exclusive. If the metric value does not change unless the machine reboots, (e.g. the number of cores), there is no point in collecting that data more often than once, when gmond starts up. In that case collect_once is set to `yes'. Otherwise

8 CHAPTER 2. MONITORING 2.1. GANGLIA MONITORING SYSTEM

module metric description

core_metrics gexec gexec available location Location of the machine heartbeat Last heartbeat

cpu_module cpu_steal [Percentage of time spent waiting for CPU access. (Rel- evant in VM)] cpu_intr [Percentage of CPU utilization for hardware interrupts] cpu_sintr [Percentage of CPU utilization for software interrupts] cpu_idle Percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request cpu_nice Percentage of CPU utilization that occurred while exe- cuting at the user level with nice priority cpu_user Percentage of CPU utilization that occurred while exe- cuting at the user level cpu_aidle Percent of time since boot idle CPU cpu_system Percentage of CPU utilization that occurred while exe- cuting at the system level cpu_wio Percentage of time that the CPU or CPUs were idle dur- ing which the system had an outstanding disk I/O request cpu_num Total number of CPUs cpu_speed CPU Speed in terms of MHz

disk_module disk_free Total free disk space part_max_used Maximum percent used for all partitions disk_total Total available disk space

load_module load_one One minute load average load_ve Five minute load average load_fteen Fifteen minute load average

mem_module mem_sreclaimable Amount of reclaimable slab memory mem_total Total amount of memory displayed in KBs mem_cached Amount of cached memory swap_total Total amount of swap space displayed in KBs mem_free Amount of available memory mem_buers Amount of buered memory mem_shared Amount of shared memory swap_free Amount of available swap memory

proc_module proc_run Total number of running processes proc_total Total number of processes

Table 2.1: List of default metrics: reformatted output of gmond -m part 1 9 2.1. GANGLIA MONITORING SYSTEM CHAPTER 2. MONITORING

module metric description

sys_module os_release Operating system release date mtu Network maximum transmission unit os_name Operating system name boottime The last time that the system was started sys_clock Time as reported by the system clock machine_type System architecture

net_module pkts_in Packets in per second bytes_in Number of bytes in per second bytes_out Number of bytes out per second pkts_out Packets out per second

Table 2.2: List of default metrics: reformatted output of gmond -m part 2

the frequency with which the metric is collected is set by collect_every. Every time the metric is collected, it is checked whether time_threshold or value_threshold are met and if that is the case the metric data will be sent via UDP.

1 collection_group { 2 collect_once = yes/ ∗ C o l l e c t metric only once on start up∗/ 3 /∗ collect_every= 60 ∗// ∗ I n t e r v a l of collection ∗/ 4 time_threshold = 80/ ∗Max time to wait before sending collected data ∗/ 5 metric {/ ∗ S t a r t of metric definition ∗/ 6 name ="example"/ ∗Name of the metric ∗/ 7 /∗name_match="example_([a −z]+)([0 −9]+)" ∗/ 8 value_threshold ="1"/ ∗Min deviation from earlier value to send c o l l e c t e d data ∗/ 9 } 10 } Listing 2.8: gmond collection group settings

The name of the metric can be dened either as a string or a regular expression.

The default metrics cover most aspects one in generally wants to inspect on a host, but since there are special nodes in the grid, monitoring their additional metrics requires the creation of custom modules. To do so there is the option of writing a module in C++[8] or Python[26].

Alternatively one could write a complete and independent program that collects the data and injects it into ganglia using the command line tool gmetricin earlier releases that was the only way to include custom metrics into gmond, but now it is no longer necessary as we can let gmong take care of the scheduling and use the thresholds to reduce data throughput. The possibility to collect data with a Python script is a module itself, modpython.so. De- pending on the repository used to install gmond it might need to be installed separately.

modpython.so To add the python module, the following lines are added to the modules section.

1 module { 2 name ="python_module" 3 path ="modpython.so" 4 params ="" 5 }

10 CHAPTER 2. MONITORING 2.1. GANGLIA MONITORING SYSTEM

To better structure the conguration les, the convention is to name the conguration les belong- ing to a python module with the ending `.pyconf', so an additionnal include statement is needed as well. Those .pyconf les contain the collection group settings for every module.

The skeleton module to collect metrics with a python module looks as follows:

1 descriptors = list() 2 3 d e f My_Callback_Function(name) : 4 '''Returna metric value.''' 5 r e t u r n 6 7 d e f metric_init(params): 8 '''Initialize all necessary initialization here.''' 9 g l o b a l descriptors 10 11 i f'Param1' in params: 12 p1 = params ['Param1']) 13 14 d1 = {'name':'My_First_Metric',#Name used in configuration 15 'call_back' : My_Callback_Function ,#Call back function queried by gmond 16 'time_max': 90,#Maximum metric value gathering interval 17 'value_type':'uint',#Data type(string, uint, float, double) 18 'units':'N',#Units label 19 'slope':'both',#'zero' constant values,'both' numeric v a l u e s 20 'format':'%u',#String formatting('%s','%u','%f') 21 'description':'Example module metric', 22 'groups':'example'}#Group in gweb 23 24 descriptors = [d1] 25 r e t u r n descriptors 26 27 d e f metric_cleanup(): 28 '''Clean up the metric module.''' 29 pass 30 31 #This code is for debugging and unit testing 32 i f __name__ =='__main__': 33 params = {'Param1':'Some_Value'} 34 metric_init(params) 35 f o rd in descriptors: 36 v = d ['call_back'](d['name']) 37 p r i n t'value for%s is%u' % (d['name'], v) Listing 2.9: Skeleton python module[18]

When the module is loaded, the metric_init function is called with the parameters specied in the conguration le. To guarantee that Ganglia receives the collected data and knows what to do with it the denitions of the metrics need to be thorough. Apart from description and groups all properties need to be set properly. The list descriptors contains all the denitions for all metrics collected by the module. Afterwards gmond calls the specied callback function for every metric every time data should be collected. This is where the actual code that collects the data is executed. When closing the metric_cleanup function is called to clean up resources that were allocated but not released yet.

2.1.2 RRDtool The RRDtool[30] by Tobias Oetiker is an open source tool to create and manage Round-Robin- Databases (RRD) as well as access the data they contain and graph it. A RRD is tailored to store time-series data for xed periods of time. On creation of the RRD the necessary storage for a xed amount of data points is allocated and subsequently lled in a round robin manner. Once the last empty allocated block is lled, the oldest one is overwritten instead of allocating more storage to write new data. This way no matter how long the monitoring system runs, the storage needed to contain the data never increases, provided no additional metrics or new hosts are introduced. To

11 2.1. GANGLIA MONITORING SYSTEM CHAPTER 2. MONITORING

RRD ......

row 1 2 3 4 row 1 2 3 4 header ... DS[2] DS[2] datasources RRA 1 RRA 2 DS[1] DS[1]

Figure 2.3: File structure of a Round Robin Database cover a large time window of monitoring data, RRDtool creates multiple Round Robin Archives (RRA) within the RRD which cover dierent time spans with dierent resolutions, e.g. one RRA can store new data every 10 seconds and contain a days worth of data, while another can store a data point every hour and cover a whole year.

The RRD le itself is separated into three main sections (g. 2.3):

• The header contains the basic information on the RRD: the rrd_version, the time the last update of the RRD occurred, the size of the header itself and most importantly the step. The step determines the time interval to which the collected data should be rasterized. Each step contains a single primary data point (PDP). The PDP is a normalized and rasterized data point.

• The datasources array describes every data source (DS) and the way incoming data from that source should be handled within the database. These ags are all set by gmetad, but are still important to understand how the data ows within the database. For each DS a type and a minimal_heartbeat have to be dened. The type determines what to do with incoming data. Possible settings are: GAUGE, COUNTER, ABSOLUTE, DERIVE and COMPUTE. gmetad always sets the type to GAUGE, which means the value is to be taken as is. COUNTER is for continuously rising values. For that type RRDtool takes the dierence between the current and previous value and stores that. ABSOLUTE does the same thing, but expects the source to reset every time the value is collected. DERIVE is the same as COUNTER but allows decreasing values as well. COMPUTE allows to dene a custom mathematical function to modify the reported value. The minimal_heartbeat determines how long RRDtool should wait until declaring the mea- surement unknown for the time interval between the last reported measurement and the next occurring one. The minimal_heartbeat is set to 8 × step by gmetad, but could be set to be smaller than step as well, when creating the RRD manually. When minimal_heartbeat is longer than step, a single reported value can contribute to multiple PDPs. If it is shorter, multiple reported values will be necessary to create a single PDP. min and max are thresholds that validate the reported data. If passed, the value will be dismissed.

• In the nal section are the RRAs that store the collected data to disk. A separate RRA is created for every time span and consolidation function (CF). Each RRA contains the information on how many PDPs should be consolidated into a single consolidated data point (CDP), which CF (MIN, MAX, AVERAGE) to use for every data source in DS and how many rows should be stored. A row contains one CDP for every DS for a single point in time.

The entire data processing happens in three steps as visualized in gure 2.4. 1. Processing incoming data according to set type 2. Ideally RRDtool would be provided a new sample (black bar) to store every step seconds. But this can not be guaranteed which is why RRDtool has to normalize and reconstruct the incoming data into PDPs.

12 CHAPTER 2. MONITORING 2.1. GANGLIA MONITORING SYSTEM

PDP 1 PDP 2 PDP 3 PDP 4 PDP 6 PDP 7 PDP 8 PDP 9

step

unknown CF RRA

t heartbeat Figure 2.4: Normalization and consolidation of reported data

PDP 1 can be determined from the incoming value, while PDP 2 is interpolated. PDP 3, while containing a measurement, consists to more than 50% of unknown data, and thus does result in a unknown PDP. PDP 4 ends up unknown as well. PDP 6, 7, and 9 are directly gleaned from the measured values while PDP 8 has to be interpolated. Interpolation is only possible, as long as a new value arrives within the minimal_heartbeat interval. Whenever a new reported value arrives, it is checked whether the last value was received longer than minimal_heartbeat seconds ago. If is wasn't, a PDP is created for already passed steps. If it was, the area up to the previous sample is marked as unknown (see g. 2.4, PDP 3 and 4) and then RRDtool tries to create PDPs. A step can not consist of more than 50% unknown data, otherwise the resulting PDP results unknown as well.

3. The last step is the application of the CF to the PDPs to write the data to the RRA.

2.1.3 gmetad gmetad's task is polling the clusters for the collected data and storing it in a database. It also summarizes the collected data for every cluster and the whole grid which makes the visualization of the complete state of the grid possible. For every polling process it creates a separate thread to make sure, that in case a cluster does not respond, the polling of the remaining clusters is unaected. gmetad itself only handles the communication between the clusters and the database. The actual creation and management of the database are passed on to the RRDtool.

TCP TCP Cluster 1 Cluster 2 module module M1 M2 M3 M4 ... M1 M2 M3 M4 ... LT LT IP IP rrd rrd rrd rrd rrd rrd rrd rrd Node Node module module

M1 M2 M3 M4 ... M1 M2 M3 M4 ... rrd rrd rrd rrd rrd rrd rrd rrd module module Node Node . metric data . metric data

. . Latest node data

Database gmond TCP gmond TCP Polling thread Polling thread gmetad Figure 2.5: Structural overview of gmetad. LT: listening threads, IP: interactive ports

The conguration for gmetad is not as extensive as for gmond and is manageable within a single le. The syntax is a simple list containing key value pairs separated by white space. The most important part is to establish the source of the data, the way the data should be stored and where it should be stored.

13 2.1. GANGLIA MONITORING SYSTEM CHAPTER 2. MONITORING

1 data_source"Cluster Name" [step] host[:port] [host2[:port]] The values in square brackets are optional. The step determines the frequency with which gmetad polls the cluster and is passed on to RRDtool when the database is created. The parameter defaults to 20s when not set. When no port is set, gmetad will use 8649. gmetad will always try to connect to the rst host in the list and ignore the rest, unless the rst host is not responding.

2 RRAs"RRA:CF:xff:steps:ROWS"["RRA:CF:xff:steps:rows"] Next are the parameters that are passed on to RRDtool to create the RRAs. x is a factor that determines how many of the PDPs necessary to create a CDP can be unknown. The default 0.5 means that if at more than half the PDPs are unknown, the resulting CDP will be unknown as well. rows is the number of CDPs to store.

tmin = steps × step tspan = tmin × rows (2.1)

Every RRA contains a data point for every tmin seconds and covers a time span of tspan.

3 rrd_rootdir"/var/lib/ganglia/rrds" 4 umask 0 rrd_rootdir determines where the RRD les are stored and umask who can access them. It is important that both gmetad and gweb can access the RRD les.

rrd_rootdir Summary Cluster 1 Cluster N

metrics Summary Summary Node1...Node N Node1...Node N

metrics metrics metrics metrics metrics metrics Figure 2.6: Structure of the directory containing the RRDs

Within the directory gmetad creates for every metric and every node a separate RRD le (g. 2.6). While technically a RRD can contain multiple data sources, once the RRD is created it is not meant to be modied structurally. It is not impossible, but not something that can just be done on the y, which contradicts Ganglias idea of being able to add new metrics and/or hosts as necessary. The exibility does not come without downsides. The data is splitted up on the disk, which increases the amount of IO operations.

5 # unsummarized_metrics diskstat CPU Per default gmetad will create summaries for every metric, which can be unnecessary when a metric is only available on a limited amount of hosts or when the whole cluster consists of a single machine. By setting unsummarized_metrics single metrics and entire ranges can be excluded from the summary. The remaining parameters manage how gmetad communicates with other instances of gmetad and gweb.

6 s c a l a b l e on 7 a u t h o r i t y"http://hostname/ganglia/" When scalable is on, the XML output of gmetad is put in GRID tags, that way it is recognized as a grid by other gmetad instances. The authority link points gweb and other gmetads to where the graphs for this gmetad instance are located. Both settings are relevant in multi-gmetad setups.

8 all_trusted off 9 trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org 10 11 xml_port 8651 12 interactive_port 8656 13 server_threads 4 Listing 2.10: gmetad settings

14 CHAPTER 2. MONITORING 2.1. GANGLIA MONITORING SYSTEM

The next settings determine who can connect to the gmetad instance and what ports can be used to do so. The interactive_port is used to perform queries on the data the gmetad stores. It is used by gweb to access the data, that is not stored in RRDs. There are a few more parameters that can be set, but are only relevant if gmetad is used in combination with alternative software. For example the tool Graphite[12] can be used instead of RRDtool to store and graph the collected data.

2.1.4 gweb Ganglia Web is a PHP[25] based web front end for Ganglia. It uses the RRDtool to access the database and graph the data. Since some metrics are polled only once at startup of the daemon, like the boot time, or are non-numeric, like the version and type of the currently running OS, they are not suited to be stored in a RRD. Ganglia Web directly accesses this data from gmetad through the interactive port.

Figure 2.7: Main navigation tabs of gweb

The top level navigation elements of Ganglia Web are the tabs (g. 2.7) at the top of the interface. The `Main' tab is the entry point to the website and provides multiple levels of detail to observe the state of the Grid. The views can be paired up into grid level, cluster level and host level views. In gure 2.8 these pairs are on the same height, on the left side the graphical view and on the right side the text-based.

The graphical views display the Load, Memory, CPU and Network reports for the cluster host or grid and look largely the same. The text-based views are dierent for every level:

• The Tree View (g. 2.8b) provides a overview of the structure of the grid. It is helpfull for navigation in multi-grid environments.

• The Physical View (g. 2.8d) displays load, CPU clock and count and memory for every host within a cluster.

• The Node View (2.8f) displays mostly information accessed from gmetad directly. The version and type of the OS, as well as hardware information for the host.

They summarize the most crucial information of every level into a readable form.

The remaining tabs are mostly self explanatory. The `Views' tab contains customized views. In the `Aggregate Graphs' tab it is possible to create graphs that stack up a metric over a range of hosts. This visualizes the combined amount of the metric but also the individual contribution of every host, that meets a regular expression. In the `Events' tab an event with a starting and end date can be dened. This is useful to annotate why measured data in certain time frames is irregularevents are marked in all graphs. The `Automatic Rotation' rotates the graphs from a custom view regularly while `Live Dash- board' displays a summarized version of a view, similar to the Physical View in the `Main' tab. These views are useful, if the graphs should be displayed on an always on display. The Cubism tab provides an alternative way to graph the data using the Cubism.js[9] library. The resulting graphs are essentially time-series heat maps, that are useful to display many metrics simultaneously.

Ganglia Web provides a wide range of customization. In fact the conguration le is over 150 lines long, not counting comments, which is far too extensive to cover. The conguration le is located in the ganglia web main directory:

1 \var\www\html\ganglia\conf_default .

15 2.1. GANGLIA MONITORING SYSTEM CHAPTER 2. MONITORING

Any changes have to be written to a separate le called conf.php that has to be created in the same directory. The only settings that are necessary to make the web interface work are:

1 $conf['rrds'] = "var/lib/ganglia/production"; 2 $conf['ganglia_ip '] = "127.0.0.1"; 3 $conf['ganglia_port '] = 8656; The rst line points gweb to the root directory where the RRDs are stored by gmetad and the other two how to connect to gmetad itself. It has to be the interactive_port dened in the gmetad conguration les.

(a) Grid View: Grid level, graphical view (b) Tree View: Grid level, text-based view

(c) Cluster View: Cluster level, graphical view (d) Physical View: Cluster level, text-based view

(e) Host View: Host level, graphical view (f) Node View: Host level, text-based view

Figure 2.8: Views of the main tab of Ganglia Web

16 3| Deployment

3.1 Installation

In the Linux world, programs are installed and managed with dedicated package managers. Red Hat distributionslike Scientic Linuxdeploy the package manager RPM[33]. An aiding tool to manage installed packages and resolve dependency issues when installing new packages is yum. It is the go to command when packages should be installed. Accessing every single node in a network as big as the RWTH Grid Cluster manually to man- age packages and congurations would be a time consuming and tedious task. The deployment system Quattor[27] is used to avoid this. It provides an additional level of abstraction to the package management and enables the administrator to manage the packages and congurations on all nodes remotely by modifying them within Quattor. It overwrites any local changes with the congurations set in Quattor: Packages installed with yum locally on a machine will be removed, and settings, like which services are permitted to run, will be reset. This needs to be kept in mind when modications are performed.

All components of Gangia, as well as the RRDtool in it's most recent version, are installed on the Ganglia Node1. On all other hosts within the system only gmond, and gmond-python to deploy python modules need to be installed.

3.2 Conguration

Given Ganglia's highly modular approach, it only makes sense to build up the Ganglia Monitoring System in a basic setup and then extend it to monitor the specic metrics of dierent clusters. But before beginning the actual setup, there are a few guidelines that have to be taken into account.

1. The Ganglia Monitoring System should complement Check_MK as much as possible and not replicate already available functionality. 2. The impact on the monitored machines should be minimized. 3. Within a cluster all machines should be equal and interchangeable.

3.2.1 Database The default conguration for the acquisition and storage of data is: RRAs"RRA:AVERAGE:0.5:1:5856""RRA:AVERAGE:0.5:4:20160""RRA:AVERAGE:0.5:40:52704" With the default step of 15s this corresponds to a day in 15s intervals, two weeks in 1min intervals and a year in 10min intervals. For real-time monitoring and collection of data over short periods of time there is already Check_MK running. Ganglia should not replicate these services so the step will be bumped to 60 s and the RRAs are modied to store data for a period of 10 years. RRAs"RRA:AVERAGE:0.5:1:2880""RRA:AVERAGE:0.5:10:4320""RRA:AVERAGE:0.5:60:8760"" RRA:AVERAGE:0.5:240:21900"

1Currently it resides on the host grid-tmp3. gweb is accessible through the URL: grid-tmp3/ganglia

17 3.2. CONFIGURATION CHAPTER 3. DEPLOYMENT

The new setting corresponds to two days in 1min intervals, a month in 10min intervals, a year in one hour intervals and 10 years in 4 hour intervals. At 8 bytes per data point, 37,860 data points per metric, and the default 29 metrics that are written to RRD les, the projected disk space needed per node is

8 bytes × 37, 860 × 29 = 8, 578 kibibytes (3.1)

For 227 WN and 51 storage servers this amounts to 2.27 gibibytes2. Since each RRD also con- tains some additional header information, the space necessary per metric is exactly 8.5 mebibytes. For the summary of all the metrics additional 8.5 mebibytes of data are necessary per cluster. Including the 1.5% correction for the header information and the summery data, Ganglia will allocate 2.32 gibibytes of disk space in the most basic conguration. Disk space will not be an issue, even if the number of metrics and/or hosts doubles there would be enough space to store more data. In respect to disk space, the monitoring system is fully scalable.

3.2.2 Network Topology All the hosts in the RWTH Grid Cluster have to be grouped into clusters rst (3.1).

Cluster Nodes TCP/UDP Port

Computing Element grid-ce 8661 Worker Nodes grid-wn001, grid-wn002... 8662 dCache Headnodes grid-dcache, grid-dcache2 8664 Storage Elements grid-se001, grid-se001... 8663

Table 3.1: RWTH Grid Cluster network topology

Looking at the Worker Nodes cluster, the default multicast environment would imply that 221 copies of the data would be kept within the cluster and each node would have to process the incoming data from 221 nodes wasting CPU cycles and causing network trac. That approach does not satisfy the previously established guidelines and needs to be reworked. A switch to a unicast topology with a dedicated receptor reduces the impact on both network as well as the load on every node. Making one of the nodes within the cluster the receptor would make is stand out from the others and no longer interchangeable, which is why the receptor is placed on the monitoring node in form of a mute gmond instance. This way all nodes are monitored, as long as the database is online. For the remaining hosts the switch to unicast has the advantage of not having to reserve the ip address that is used as the multicast address. The Computing Element has a single host. The UDP data can simply be directed to localhost. The dCache Headnote hosts share their collected data with each other. That is a single UDP send and receive channel per host in addition to localhost.

3.2.3 IO Optimisations As hosts are added to the monitoring system a high rise in IO Wait time is observed on the node hosting the Ganglia Monitoring System. As can be seen in table 2.1, the WIO is the percentage of time the CPU idles because it waits for IO operations to complete.

By gradually raising the number of nodes in the system, the dependency of the OI Wait per- centage on the number of nodes is measured. The result can be seen in gure 3.1a. It is clear that if the trend continues, the monitoring system will not be able to scale to all nodes. The solution is to reduce the amount of IO operation by caching them. RRDcached is

21 gibibyte = 10243 bytes, not to be confused with 1 gigabyte = 10003 bytes. Analogous: kibibyte and mebibyte.

18 CHAPTER 3. DEPLOYMENT 3.3. MODULES

(a) IO Wait dependency on number of nodes (b) IO Wait dependency on number of nodes

Figure 3.1: Impact of the monitoring system on the hosting node

part of the RRDtool and can be used as a buer for the incoming data, that is collected and only written to drive, when enough is accumulated. The dierence in required CPU usage can be seen in gure 3.1b. The subsequent peaks in that gure are caused by gweb requesting the cached data to be written to disk and drawn. The linear rise of required IO operation no longer occurs so the system's scalability is not hindered by IO throughput.

Using RRDcached can lead to loss of data when the system is rebooted. Luckily, the default priorities exactly match the needed order when starting the system up or shutting it down. This means the command chkconfig can be used and the rrdcached service will be started and closed before metad does so no data is lost. Also, Quattor will not reset that setting as neither tools are part of the dened conguration.

3.3 Modules

There is one thing all modules have in common: To acquire their data they need to connect to some external tool or database. Under heavy load, respond times of over a minute have been observed, which is a problem when collecting time series data. Performing queries with a frequency of once every minute for every single metric would cause a huge load on the monitored system or just time out. On the other hand those queries usually return data that can be used for multiple metrics simultaneously, so by creating a local database for the metrics that is lled on initiation and then updated regularly, the stress on the monitored system is signicantly reduced. This is realized by dening global dictionaries in the modules. Dictionaries are containers that store key-value pairs.

All dictionaries are created in the following form:

1 metric_group = { 2 'metric_name1':0, 3 'metric_name2':0 4 } This approach brings two big advantages. Since every metric is in such a dictionary and grouped by type, the denitions for the dictionary that is returned to gmond on initialization can be done in a more compact way by looping the procedure. The other advantage is that in some instances new metrics can be added to the monitoring system simply by adding a new key to the dictionary. The updated ow of the data collection can be seen in gure 3.2 and is implemented in all mod- ules. The actual queering for data happens in update_dictionary() function, which accumulates the metric data for every group of metrics in a dierent way. These specic update functions are discussed in the following sections.

19 3.3. MODULES CHAPTER 3. DEPLOYMENT

POLING_INTERVALL metric_callback(name) yes update_dictionary() return dictionary[name] REACHED?

no Figure 3.2: Optimized metric_callback function

3.3.1 Torque Torque is the batch system of the RWTH Grid Cluster. It manages the jobs and available resources. There is a number of metrics that need to be observed, and a handy API called pbs_python3[24] to access most of them from Python.

Node Statistics First it is important to know what resources are available. A node can report it's state as one of the following: Free, Busy, Job-Exclusive, O ine or Down.

update_node_stats() Go to next node no

Get node list through increase dictionary[state] Last Check the attribute 'state' pbs_python by 1 Node?

yes

Reset all values in dictionary dictionary to 0 return

Figure 3.3: Update function for node statistics

Using pbs_python a list of nodes is generated and then analyzed in a loop as can be seen in gure 3.3. Each node has an attached attributes list. The attribute that is needed has the key state. For each node in the list that has the attribute state the corresponding counter in the dictionary is increased by 1. The resulting Graph can be seen in gure

Job Statistics The second group of metrics for PBS are job statistics. Just like every node, every job has certain attributes. Among others, it has a state (Running, Queued...), an owner (CMS, CMSpi...), a queue (CMS, Short...), and an eciencywhich all should be tracked.

Get server list through Check sub-list update_jobs_stats() pbs_python Go to next state return 'state_count' (contains 1 element) no

Calculate Write state_count to Last yes Slots and write dictionary dictionary State? to dictionary

Figure 3.4: Update function for general job statistics

To gain the general job statistics, a pbs_python query is performed for server_information and a list containing a single element is returned, since there is a single computing element in the

3PBS is the commercial version of Torque

20 CHAPTER 3. DEPLOYMENT 3.3. MODULES

RWTH Grid Cluster (g. 3.4). Using the key state_count, access to a sub-list containing the key-value pairs for the states is gained. Those are then written to the database in a loop. The exception here is the Slots metric. The Slots metric represents the number of jobs that can run on the RWTH Grid Cluster simultaneously and has to be derived from the nodes statistics. It corresponds to the number of CPU cores of currently accessible nodes plus the number of the nodes themselves, since there is one slot per node added for the Short queue. The Short queue is a measurement to maximize the eciency. When a job nishes before it hits its walltimethe time it reserved for the job a job that is in the Short queue can be executed.

The next metric is the number of jobs counted per queue. To gain these metrics pbs_python

Last Go to next queue no Queue? yes

Get the list 'state_count' State is update_queue_stats() Go to next state return from queue 'Running'?

no yes no yes Increase Write Get queue list through non-running duisctionary[q duisctionary[q running jobs pbs_python jobs eue] by value eue] = value

Last State?

Figure 3.5: Update function for jobs by queue statistics provides a convenient list of queues. In this situation there is no need to know in what state the jobs are in, it will only be dierentiated between running and non running jobs. There are two dictionaries that store the data for each case (g. 3.5). The function goes for each queue through every state and either writes the Running value into the running dictionary or sums the others up into the non-Running dictionary.

Alternatively, the jobs should be also displayed sorted by owner.

Increase calculate and store time State is yes disctionary[Job_Owner] by update_owner_stats() Convert 'Job_Owner' spent in queue t_queue 'Running'? 1

no

Get jobs list through Increase Look if t_queue is smaller pbs_python Get attribute 'Job_Owner' non-running disctionary[Job_Owner] running jobs than last time and overwrite jobs by 1 if true

Last calculate avareage time in Go to next job no yes return time_stats Queue? queue

Figure 3.6: Update function for jobs by owner statistics

The last possible query with pbs_python is the job list. It returns a list containing every job currently in the system with a lot of attributes. The attribute needed is Job_Owner. There are hundreds of users, but they are can be grouped. Jobs initiated by CMS users all have the owner cmsnnn, where nnn is a 3-digit number. So called CMS pilot jobs, jobs where a job slot is reserved but the actual job is inserted later on, all have the owner cmspinnn. Additionally some user types can be combined: OPS contains ops, opss, opspilot, dte, dtep, and dtes jobs. The owners ending in s, like opss are software managers. Again two separate dictionaries are created for running and non-running jobs. Additionaly the time the job spent in a queue is calculated by subtracting the attribute start_time (Time when state switched to `Running') from qtime (Time when job entered queue) is calculated and written

21 3.3. MODULES CHAPTER 3. DEPLOYMENT

into a third dictionary. The node, that entered the state `Running' most recently is remembered and its q_time is written to the library as the time that the last job that was run spent queuing.

Job Eciency The Job Eciency is not part of the attributes that are returned with the pbs_python job lists. To gain access to that data, the output of the command showq -r needs to be parsed.

1 JobName S Par Effic XFactor Q User Group MHost Procs Remaining StartTime 2 3 23132198 R DEF 81.85 1.0 cm cms032 cms grid-wn278 1 21:10:25 Mon Aug 31 15:53:21

3581 23171055 R DEF ------1.0 dc dcms084 cms grid-wn044 1 3:00:00:00 Wed Sep 2 19:42:27 3582 3583 3579 Jobs 3579 of 3833 Processors Active (93.37%) Listing 3.1: Output of showq -r

The output is essentially a white space separated table of values. It is important to note that the relevant data is located between two empty lines.

update_efficiency_stats()

Add value to a Go to next line owner_list dictionary return Call 'showq -r' and store output yes no

Calculate mean and Value standard deviation for Go past empty line Get fourth value in line no Line Empty? yes Numerical? every user and write it to dictionary

Figure 3.7: Update function for job eciency by owner statistics

For each owner a list is created, and lled with ineciencies of jobs run bu that user (g. 3.7). The List is parsed until the second empty line is reached, then the mean and the standard deviations are calculated using the NumPy[22] library and written into the dictionary.

3.3.2 Maui Maui is the scheduler used in the RWTH Grid Cluster. The most important metrics that are to be retrieved from Maui are the reservations. Whenever a downtime is scheduled, there should be no jobs running on the nodes. To achieve that, the nodes have to be reserved. The reservations can be separated into normal jobs, standing reservations (Short Queue, Soft- ware Manager) and user reservations, which are reservations that do not t either of the afore- mentioned types. A downtime or general maintenance can be indicated by a large number of user reservations.

The only other metric tracked is the user fair share. For each user there is a fair share target dened in the Maui conguration les. Maui schedules jobs in a way to meet that target. When a user falls below the targeted fair share, he will be provided more computing time by being prioritized in the queues.

Reservations To display the reservations for every node the command showres -n is called.

1 reservations on Wed Sep 2 14:11:19 2 3 NodeName Type ReservationID JobState Task Start Duration StartTime 4 5 grid-wn001.physik.rwth-aachen.de Job 23160653 Running 1 -5:56:19 3:00:00:00 Wed Sep 2 08:15:00

22 CHAPTER 3. DEPLOYMENT 3.3. MODULES

3837 grid-wn345.physik.rwth-aachen.de Job 23157448 Running 1 -15:21:34 3:00:00:00 Tue Sep 1 22:49:45 3838 3833 nodes reserved Listing 3.2: Output of showres -n

The ReservationID is `Job' for all normal job reservations. Standing reservations have the ReservationID `User' and are in a JobState that ends with `.0.0'. All other reservations with the ReservationID `User' fall in the `User' category. For every reservation there is one less free slot on the host and if there are more reservations than slots the number of the surplus reservations will be tracked as the `Overcommit' metric.

yes update_reservations() JobState ends no with '.0.0'? Set ReservationID to host_list Call 'showres -n' and 'Standing' Go to next line store output yes

hist_list[host][ReservationID] += 1 Go past second empty Get host and ReservationID no Last line? no host_list[host]['Free'] -= 1 line ReservationID 'User'?

dictionary Summerize host_list and write to dictionary return

Figure 3.8: Update function for reservations statistics

The reservations module is slightly dierent from the other modules. Apart from the update function it contains a method to read the nodes le located in /var/spool/pbs/server_priv/nodes to create a list of all hosts and their available slots on initiation. When updating, at rst the le is parsed into a dictionary called host_list (g. 3.8). Once done, the values are summarized. Since every node at least has the standing reservations, a host in the host_list without a single reservation is down and its slots have to be removed from the summary.

Fair Share The fair share statistics have their own dedicated command. They can be accessed by calling mdiag -f.

1 FairShare Information 2 3 Depth: 7 intervals Interval Length: 2:00:00:00 Decay Rate: 0.90 4 5 FS Policy: DEDICATEDPS 6 System FS Settings: Target Usage: 0.00 Flags: 0 7 8 FSInterval % Target 0 1 2 3 4 5 6 9 FSWeight ------1.0000 0.9000 0.8100 0.7290 0.6561 0.5905 0.5314 10 TotalUsage 100.00 ------62361.5 61952.2 154415.7 125900.4 57783.3 26056.6 31909.2 Listing 3.3: Output of mdiag -f (truncated)

The output is checked line by line for the rst value (3.9). If it matches one of the keys from the dictionary, it is used to insert the second value into the dictionary.

3.3.3 dCache dCache is the aforementioned storage management system of the RWTH Grid Cluster. It decides where each le is stored, creates on demand copies of les on multiple pools to maximize through- put and when used in conjunction with a tape system automatically backs up everything to tape.

The le transfers to and from a pool are called movers. The amount of movers per pool is limited and if the amount of requests surpasses that limit the new requests are queued. To prioritize certain requests, multiple queues are created for dierent types of le transfer requests:

23 3.3. MODULES CHAPTER 3. DEPLOYMENT

Call 'mdiag -f' and store First value Write second value to Check line yes output valid key? dictionary[key]

no

Go to next line update_fairshare() dictionary

no Last Line? yes return

Figure 3.9: Update function for fair share statistics

• P2P Server (pool to pool, sending le) • P2P Client (pool to pool, receiving le) • Regular (transfers using dCap) • NFS (transfers using NFS) Technically there are also Store and Restore queues for writing data to and retrieving it from tape, but the RWTH Grid Cluster has no tape system they are always at 0.

Other than the pool queues, another relevant metric is the used and free disk space in the pools. dCache highlights les that are available only in a single copy. They are considered `precios' and make up the majority of the used space, since les are not backed up to tape and only available in multiple copies for performance improvements.

dCache provides the means to monitor the system by either writing logs to text les or into a SQL database. Both options are disabled since the RWTH Grid Cluster has a central- ized monitoring system. Nevertheless, dCache runs a service called webadmin that provides an web interface that displays the current state of the system. It is accessible on the head node: http://grid-dcache:2288/webadmin/, and will be used as the source of metric data.

Pool Usage To access the storage information, the possibility to dump information from webadmin in XML form is used.

BASE_URL ='http://grid −dcache:2288/webadmin/info?statepath=' STATEPATH ='summary' URL = BASE_URL + STATEPATH The returned XML is only a few lines short and generated in less than 10s. With it the pool_usage dictionary is updated as displayed in gure 3.11.

The leaf will have the Get the XML Tag 'metric' following attributes update_pool_usage() Check leaf yes dictionary xml_str = urlopen(URL) found? name = free, user, precious data = value

no Parse xml_str into a tree Go to next leaf

no Last leaf? yes used = used-precious

return

Figure 3.10: dCache pool usage collection query

The URL is opened and the returned xml is stored and parsed into a tree using the minidom library. Then the leaves are checked one by one, if they contain the tag `metric'. If they do, the

24 CHAPTER 3. DEPLOYMENT 3.4. GANGLIA WEB

leave will also have the tag `name', that is used as the key for the dictionary and a value that is stored there.

Pool Queues The pool queues can be queried the same way as the pool usage, but since the webadmin already provides a tabular overview and summarizes the data it is more ecient to parse it from the html le generated by the webadmin service. Furthermore it has a shorter respond time.

Get the HTML site Strip white space from line= update_queues () yes Go to next line html_str = urlopen (URL) current line Total

no Strip white space from dictionary Go to next line no current line

yes last line?

Match entry from list to return Add matched value to list dictionary no

yes line fits regexp: (-?[\d]+)

Figure 3.11: dCache queues collection query

The `Total' is the last entry in the table summarizing all the information. The html is stripped from white space and looped through until the container holding the text `Total' is found. From there the le is searched line by line for containers holding a simple integer number. The found values are appended to a list. Their order is always the same which is leveraged here to match them to the dictionary keys.

3.4 Ganglia Web

With the modules written and included into gmond, as well as dened collection groups the metrics are now collected and written to the database. Using gweb the data can be accessed and graphed. But the interface displays the metrics as simple one metric graphs. The full usefulness of the metrics is only revealed when displayed in context with their group. To do so, custom reports have to be dened. A report is a graph holding multiple metrics, like the one in gure 3.1b Reports can be constructed by creating .json le in gwebs sub-directory, that contains all the reports: /var/www/html/ganglia/graph.d/name_report.json That json le has a container called series that lists all the metrics that should appear in the report.

1 { 2 "report_name" : "node_report", 3 "title" : "Node Report", 4 " s e r i e s " : [ 5 {"metric" : "PBS_nodes_job−exclusive", "color" : "008200", "type" : "stack", "label" : "Job−Exclusive"}, 6 {"metric" : "PBS_nodes_free", "color" : "00dd00", "type" : "stack", "label" : "Free"}, 7 {"metric" : "PBS_nodes_busy", "color" : "faa800", "type" : "stack", "label" : "Busy"}, 8 {"metric" : "PBS_nodes_offline", "color" : "b7b7b7", "type" : "stack", "label" : "Offline"}, 9 {"metric" : "PBS_nodes_down", "color" : "003a05", "type" : "stack", "label" : "Down"}, 10 {"metric" : "PBS_nodes_stete−unknown", "color" : "d900c4", "type" : "stack", "label" : "Unknown"} 11 ]

25 3.5. EXTENSIONS CHAPTER 3. DEPLOYMENT

12 } Listing 3.4: Node Report denition

The resulting report can be seen in gure 3.12.

Figure 3.12: Graph generated from the nodes statistics

Because of the longer names of the custom graphics it is necessary to increase the width of the default graph dimensions. Otherwise some information will be cut o. Ganglia Web draws reports by default in the size `medium', so the width has to be modied in the `medium' array in the conf.php le.

1 'medium'=>array( 2 'height'=>100, 3 'width'=>375, 4 'fudge_0'=>0, 5 'fudge_1'=>14, 6 'fudge_2'=>28 7 ), The reports are then grouped into views. A view for dCache, Mauiu and Job statistics is build up by creating the le /var/lib/ganglia-web/conf/view_name.json.

1 { 2 "view_name": "dCache", 3 "view_type":"standard", 4 " items " : [ 5 {"hostname":"grid −dcache.physik.rwth−aachen.de", "graph":"dcache_pools_report "}, 6 {"hostname":"grid −dcache.physik.rwth−aachen.de", "graph":" dcache_active_movers_report"}, 7 {"hostname":"grid −dcache.physik.rwth−aachen.de", "graph":" dcache_queued_movers_report"} 8 ] 9 } Listing 3.5: dCache view denition

The resulting view is displayed in gure 3.13.

3.5 Extensions

Ganglia is a popular monitoring tool, and as organizations adapt it to their needs new extensions are created and published. One of those extensions, Job Monarch[13] (Job Monitoring and Archiving), is compatible to the RWTH Grid as it can provide information on the currently running jobs through Torque/PBS.

3.5.1 Job Monarch Job Monarch provides the most important information on currently running jobs at a glance (g. 3.14). The monitoring agent is a stand alone python based daemon running on the CE that accumulates the information from the batch system and injects the collected data as extra data

26 CHAPTER 3. DEPLOYMENT 3.5. EXTENSIONS

Figure 3.13: dCache view from the `Views' tab

via gmetric into gmonds network stream. The data is not stored in RRDs by gmetad but instead parsed by the Job Monarch add-on directly. In essence, Job Monarch is a separate program, that leverages the Ganglia infrastructure to transport and display the job information. As the name suggest, Job Monarch also provides the means to archive the job information, but it does so in a separate PostgreSQL database which results in a duplication of data and additional IO operations. Since the Grid Job Monitoring Tool[1] already tracks and archives the jobs, replicating that functionality in Job Monarch would not provide any advantage.

Monitoring daemon On the host the jobmond.py daemon needs to be placed in the /usr/local/sbin/ directory and the conguration le in the /etc/ directory.

1 DEBUG_LEVEL : 10 2 DAEMONIZE : 1 3 BATCH_API : pbs 4 BATCH_SERVER : grid −ce.physik.rwth−aachen . de 5 BATCH_POLL_INTERVAL : 60 6 GMETRIC_TARGET : 1 3 4 . 6 1 . 2 4 . 2 4 1 : 8 6 6 2 7 USE_SYSLOG : 1 8 SYSLOG_LEVEL : 0 9 SYSLOG_FACILITY : DAEMON 10 DETECT_TIME_DIFFS : 1 11 BATCH_HOST_TRANSLATE : Listing 3.6: Job Monarch conguration

With BATCH_API and BATCH_SERVER it is determined what kind of batch system is in operation and on which node it runs. The GMETRIC_TARGET points towards the gmetad instance. The remaining parameters are self explanatory.

Web interface The web interface is installed in three steps.

1. Copy the content of the template folder to the directory /var/www/html/ganglia/templates/ 2. Create the directory /var/www/html/ganglia/addons and copy the content from the folder add-ons into it.

27 3.5. EXTENSIONS CHAPTER 3. DEPLOYMENT

Figure 3.14: Job Monarch user interface

To get Job Monarch running, two variables need to be set. The rst one is pointing gweb to use the Job Monarch template. The line

1 $template_name ="job_monarch"; is added in /var/www/html/ganglia/conf.php The second variable is to direct Job Monarch to it's source of data, the non-interactive port of gmetad, therefore the variable

1 $DATA_SOURCE ='127.0.0.1:8655'; is set in /var/www/html/ganglia/addons/job_monarch/conf.php

28 4| Conclusion

The full Ganglia Monitoring System operates with the Ganglia Node1 at its core. All important nodes are monitored, as well as the middleware that keeps the grid operational. In gure 4.1 the system is fully displayed. To summarize it, the WN are grouped into a cluster with a receptive, mute gmond instance on the grid-tmp3 node. The structure for the SE is analogous, but to separate both data streams to the Ganglia Node, both transfer data using dierent ports.

SE SE SE WN WN WN

gmond gmond gmond gmond gmond gmond

... UDP UDP ... 8663 8662 gmond gmond gmond gmond gmond gmond SE SE SE WN WN WN Storage Elements Worker Nodes gmond gmond se wn TCP TCP 8663 8662

gmetad TCP TCP TCP 8664 8661 8656 Database gweb Computing Element RRDtool dCache Headnodes Ganglia Node user UDP gmond gmond gmetric gmond jobmon

grid-dcache grid-dcache2 grid-ce

Figure 4.1: Fully implemented Ganglia Monitoring System

The Computing Element cluster contains a single node, the grid-ce node. It runs a single instance of gmond and the Job Monarch monitoring daemon called `jobmon', that injects job information into gmond via gmetric. The dCache Headnodes cluster is build up of the dCache headnode grid-dcache and a secondary dcache node that is used to o oad some of the tasks from grid-dcache, called grid-dcache2. On each machine an instance of gmond collects the data sends it to the other and itself. The gmetad instance, running on the Ganglia Node, polls all clusters via TCP connections and stores the data in the locally stored database through the RRDtool. Ganglia Web is located on

1Currently it resides on the host grid-tmp3. gweb is accessible through the URL: grid-tmp3/ganglia

29 CHAPTER 4. CONCLUSION the same node and has access to both the database and gmetad.

The system was checked for scalability in terms of storage and IO operations and proved ade- quate, leaving room for extension.

In the future the system could be further extended by integrating the local network into the monitoring system in a multi-grid setup, as well as further integration with Check_MK, to trigger alarms on global statistics (e.g. no network trac in a cluster, or more than a certain amount of nodes down in the entire RWTH Grid Cluster).

30 Aknowledgement

First and foremost I would like to express my sincere gratitude to my supervisor Dr. Andreas Nowack, who aided me immensely with many helpful answers regarding Grid Computing in general, as well as the RWTH Grid Cluster specically.

Furthermore, I would like to thank Prof. Dr. Achim Stahl for providing me the opportunity to work on this thesis.

Special thanks to Galina Bessonov, who provided her support in improving many of the graphics.

Last but not lest I would like to thank my family for their continuous support.

Declaration of Authorship

Ich versichere, dass ich die Arbeit selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt, sowie Zitate kenntlich gemacht habe.

I hereby declare that the thesis submitted is my own unaided work. All direct or indirect sources used are acknowledged as references.

Aachen, den 08.09.2015

31 Bibliography

[1] A Grid Job Monitoring Tool. 2012. url: https://sarkar.web.cern.ch/sarkar/doc/ jobmon.html (visited on 08/31/2015). [2] Roel Aaij et al. Observation of J/ψp resonances consistent with pentaquark states in 0 − decays. In: (2015). arXiv: 1507.03414 [hep-ex]. Λb → J/ψK p [3] ALICE | CERN. 2015. url: http : / / home . web . cern . ch / about / experiments / alice (visited on 09/03/2015). [4] ATLAS | CERN. 2015. url: http:/ /home.web.cern.ch/about/ experiments/atlas (visited on 09/03/2015). [5] Martin Beech. The Large Hadron Collider: Unraveling the Mysteries of the Universe. As- tronomer's universe. New York, NY: Springer Science+Business Media LLC, 2010. isbn: 9781441956675. doi: 10.1007/978-1-4419-5668-2. url: http://dx.doi.org/10.1007/ 978-1-4419-5668-2. [6] CERN-Brochure-2009-003-Eng LHC: the guide. 2009. url: http://cds.cern.ch/record/ 1165534/files/CERN-Brochure-2009-003-Eng.pdf (visited on 08/21/2015). [7] CMS | CERN. 2015. url: http://home.web.cern.ch/about/experiments/cms (visited on 09/03/2015). [8] cplusplus.com - The C++ Resources Network. 2015. url: http://www.cplusplus.com/ (visited on 08/21/2015). [9] Cubism.js. 2013. url: https://square.github.io/cubism/ (visited on 09/05/2015). [10] dCache. 2015. url: https://www.dcache.org/ (visited on 08/21/2015). [11] Ganglia Monitoring System. 2015. url: http://ganglia.sourceforge.net/ (visited on 08/21/2015). [12] Graphite - Scalable Realtime Graphing - Graphite. 2015. url: http://graphite.wikidot. com/ (visited on 08/21/2015). [13] Job Monarch. 2015. url: https : / / oss . trac . surfsara . nl / jobmonarch (visited on 08/31/2015). [14] Mathias Kettner. Check_MK. 2015. url: https://mathias-kettner.de/check_mk.html (visited on 07/10/2015). [15] LEMON - LHC Era Monitoring. 2015. url: http://lemon- monitoring.web.cern.ch/ (visited on 08/21/2015). [16] LHC Collisions. 2008. url: http://lhc-machine-outreach.web.cern.ch/lhc-machine- outreach/collisions.htm (visited on 09/03/2015). [17] LHCb | CERN. 2015. url: http://home.web.cern.ch/about/experiments/lhcb (visited on 09/03/2015). [18] Matt Massie et al. Monitoring with Ganglia: Tracking Dynamic Host and Application Metrics at Scale. 1. ed. Beijing: O'Reilly, 2012. isbn: 9781449329709. [19] Maui - Adaptive Computing. 2015. url: http://www.adaptivecomputing.com/products/ open-source/maui/ (visited on 08/21/2015).

32 BIBLIOGRAPHY BIBLIOGRAPHY

[20] MonALISA. 2015. url: http : / / monalisa . caltech . edu / monalisa . htm (visited on 08/21/2015). [21] Nagios - The Industry Standard in IT Infrastructure Monitoring. 2015. url: https://www. nagios.org/ (visited on 08/21/2015). [22] NumPy  Numpy. 2013. url: http://www.numpy.org/ (visited on 09/07/2015). [23] Oracle | Integrated Cloud Applications and Platform Services. 21.08.2015. url: http://www. oracle.com/index.html (visited on 08/21/2015). [24] pbs_python. 2013. url: https://oss.trac.surfsara.nl/pbs_python (visited on 09/07/2015). [25] PHP: Hypertext Preprocessor. url: https://secure.php.net/ (visited on 08/21/2015). [26] Python. 21.08.2015. url: https://www.python.org/ (visited on 08/21/2015). [27] Quattor. 2015. url: http://www.quattor.org/ (visited on 07/10/2015). [28] Scientic Linux. 2015. url: https://www.scientificlinux.org/ (visited on 08/21/2015). [29] The Grid: A system of tiers | CERN. 2015. url: http://home.web.cern.ch/about/ computing/grid-system-tiers (visited on 06/30/2015). [30] Tobias Oetiker. RRDtool - The Time Series Database. 2015. url: http://oss.oetiker.ch/ / (visited on 07/14/2015). [31] TORQUE Resource Manager - Adaptive Computing. 2015. url: http://www.adaptivecomputing. com/products/open-source/torque/ (visited on 07/10/2015). [32] TOTEM | CERN. 2015. url: http://home.web.cern.ch/about/experiments/totem (visited on 09/03/2015). [33] Wikipedia, ed. RPM Package Manager. 2015. url: https://de.wikipedia.org/w/index. php?oldid=143668602 (visited on 09/04/2015). [34] Wikipedia, ed. - Wikipedia. 2015. url: https://en.wikipedia. org/w/index.php?oldid=670558512 (visited on 07/12/2015). [35] Worldwide LHC Computing Grid. 2015. url: http://wlcg-public.web.cern.ch/ (visited on 07/13/2015). [36] Worldwide LHC Computing Grid | About. 2015. url: http://wlcg-public.web.cern.ch/ about (visited on 07/13/2015).

33