MASARYK UNIVERSITY FACULTY OF INFORMATICS

IRC control bot for Zabb monitoring system

BACHELOR'S THESIS

Filip Zachar

Brno, Fall 2015 Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Filip Zachar

Advisor: RNDr. Adam Rambousek, Ph.D.

i Acknowledgement

I would like to thank my supervisor RNDr. Adam Rambousek, Ph.D. who supported me during writing this thesis and provided useful feedback. I would also like to thank Marek Mahut for advice and consultation regarding technical details for Zabbix monitoring system and for feedback provided during the test stage of this work.

ii Abstract

Applications these days runs as distributed systems consisting of many parts working together. This thesis discuss the necessity of mon• itoring these parts and describes the design and implementation of IRC control bot for Zabbix monitoring system that provides extended features through IRC communication network.

iii Keywords

Zabbix, monitoring, IRC, DevOps, Ruby

iv Contents

1 Introduction 1 2 Monitoring 2 2.1 Monitoring system 3 2.2 MRTG -MultiRouter Traffic Grapher 3 2.3 5 2.4 7 2.5 Zabbix 8 2.5.1 Architecture 8 2.5.2 Data collection 11 2.5.3 Data Visualization 13 2.5.4 Alerts and Triggers 14 2.5.5 Maintenance 15 2.5.6 API 15 3 IRC - Internet Relay Chat 17 3.1 Architecture 17 3.2 Conferencing 18 3.3 Bot 19 4 Zabbirc 21 4.1 Goal 21 4.2 Architecture 21 4.2.1 Zabbix API 22 4.2.2 IRC API 23 4.3 Implementation 24 4.3.1 Zabbix component 24 4.3.2 Services component 25 4.3.3 IRC component 25 4.4 Features 26 4.4.1 Events 26 4.4.2 Hosts 27 4.4.3 Maintenance 28 4.4.4 Settings 28 4.5 Installation 28 5 Conclusion 30 Index 31 Bibliography 31

v List of Tables

3.1 Operator privileged actions 3.2 Operator privileged actions List of Figures

2.1 Screenshot of MRTG web page [4] 4 2.2 Screenshot of RRDtool generated graph 5 2.3 Screenshot of Cacti web interface 6 2.4 Screenshot of Nagios web interface 7 2.5 Zabbix deployment model for large environments 9 2.6 Zabbix deployment model using proxy servers 10 2.7 Graph showing three items in stacked format 12 2.8 Map representing physical infrastructure of a web service 13 2.9 Screen showing map with appropriate graphs alongside 14 2.10 Trigger rules 15 2.11 JSON-RPC API login request 16 3.1 IRC network 18 4.1 Zabbirc basic architecture 22 4.2 Zabbirc components interaction 24 4.3 Definition of the matchers for Zabbirc commands 26 4.4 Event notification and acknowledgment 27 4.5 Host status reporting 27 4.6 Installation steps for Zabbirc 29

vii 1 Introduction

Over 3 billion people use Internet on daily basis nowadays and the number is still growing [1]. But Internet does not longer consist of just simple hypertex pages as it was at source of early World Wide Web era. It is full of dynamic web applications and services like social networks, video streaming services, storage exchange services or even office like applications that were used mainly as a desktop software. These applications serves to the millions of people at once. To accomplish this scale of availability they are no longer implemented on a single server but as more sophisticated distributed system. Distributed system consists of many moving parts working to• gether thus more points of failure are present. The whole system can be designed to be able to operate with some parts missing but to achieve the best reliability they should be monitored in order to pretend possible failures or be able to react to the occurred ones as soon as possible. In the Chapter 2 we describe what monitoring of the system con• sists of and discuss several existing monitoring solutions. Later in the chapter the thesis is focused on Zabbix monitoring system, it's capabilities and options to extend the system. The Chapter 3 describes IRC1 protocol used for communication in the bigger scale. It describes the architecture of the IRC network and discusses applicability of the IRC bots. The main goal of this thesis is to design and implement an IRC control bot that will serve as a gateway for controlling a Zabbix moni• toring system. The architecture, implementation and the features of the bot are described in the Chapter 4. The chapter also justifies used technologies and libraries that were used to create the bot.

1. Internet Relay Chat

1 2 Monitoring

Most of the applications we use on the daily basis and provides us sim• ple interface to accomplish demanded task are however much more complicated under the hood when we look at the architectural and implementation details. What started as a single server service main• tained by one system administrator can easily grow to the distributed system that sits in the cloud. This more complex distributed architec• ture allows the service providers to handle the enormous amount of the customers that use various internet services these days in order to provide more redundancy, availability and speed. With distributed environment like that, it is really important to keep all the required units in healthy state. The company that is creating a product has to do an important decision about the deployment. • It can build it's own bare metal infrastructure that will run the product. Choosing this option it will have to provide all the maintenance that the infrastructure needs to ensure product's availability. To accomplish this usage of the various monitoring systems is recommended.

• It can use some of the existing cloud platforms to host their appli• cation and thus bring a layer of abstraction to the infrastructure. It defines required services in a declarative way and the cloud platform takes care about fulfilling the defined requirements. Despite of what option the company chooses, there must be a bare metal infrastructure somewhere underneath. This cluster of devices in the infrastructure forms a computer network that faces several chal• lenges. The amount of data flowing in the network is almost constantly growing. Application data, media streams, backups, database queries and replication tend to saturate bandwidth just as much as they eat up storage space. To avoid a congestion in the network and outage of storage room on the nodes system administrators need to have good overview of the infrastructure status by visualizing the right data in the right way. This can be achieved by using a monitoring system. In this chapter we will talk about what a monitoring system is and what should it provide. We look at several existing monitoring systems

2 2. MONITORING and we will shortly discuss about their advantages, disadvantages and main characteristics.

2.1 Monitoring system

Monitoring system is a piece of software that collects data from several sources, analyzes the data and gives a sophisticated visualization about the data. The data source can be any component in the network. The monitor• ing system usually supports the data collection by standard methods like SNMP1 therefore any device that has implemented this protocol is potential data source for the monitoring system. It can also use custom API2 to retrieve information from the monitored devices using agent approach. Agent is a small program running on the monitored device that gathers information about the device and communicates with the monitoring system using the conventional protocol. Agents are mostly used to monitor computers in the network since they require more computation performance. The collected data are analyzed using the rules configured on the monitoring system. The monitoring system checks if there is any threshold overlap and takes appropriate action to respond to the event. The data visualizations consists of various graphs generated from the data. This graphs shows the data in more informative way and also shows the historical information about the metric. Several monitoring systems are available with good community and enterprise support. This thesis takes into a consideration four of them: MRTG, Cacti, Nagios and Zabbix.

2.2 MRTG - Multi Router Traffic Grapher

Multi Router Traffic Grapher (MRTG) was initially just a script which used external utilities to perform SNMP queries and create GIF images for display on the HTML pages (Figure 2.1). This script was being executed every 5 minutes and showed accumulated data in the graphs for last day, week, month and year. It was written by Tobias

1. Simple Network Management Protocol 2. Application Programming Interface 3 2. MONITORING

M ax S che dule d Che cks: 50.0 Average S che dule d Che cks: 44.0 Current S che dule d Che cks: 45.0 Max On-DemandChecks: 45.0 Average On-DemandChecks: 10.0 Current On-DemandChecks: 10.0

Figure 2.1: Screenshot of MRTG web page [4]

Oetiker in 1995 who was working at De Montfort University Leicester in United Kingdom as a System Administrator Trainee. At that time the university had 64 kBit Internet link and the management was not planing to increase this link any soon. The performance data provided by MRTG proved to bey a key argument in convincing management about necessity of the faster Internet link. [2, 3] One of the main problems in the first version of MRTG was perfor• mance. Monitoring 10 switch links worked fine but in larger environ• ments it encountered it's limits. The reason was the way MRTG was handling the data which were stored in ASCII file, and was rewrit• ten every time the script collected data. The next development was focused accordingly to solve this issue. In the subsequent version of MRTG the time critical parts were implemented implemented in while the glue of the package remained to be Perl. Whole new mech• anism for storing data and generating graphs was introduced and extracted into a separate package called Round Robin Database Man• ager (RRDtool). Example of graph generated using RRD is in Figure 2.2. This component was the biggest asset for the MRTG. Despite of web interface that provides index of generated graphs the MRTG is configurable only through plain text files. This makes it less popular for people with less UNIX experience.

4 2. MONITORING

2-Shaper - 8 CPU Utilization

• Mean CPU: Current: 19, 74 ••: Min: 3.02 Max 28 34 \ • CPUO Current: 20, 96 \ Hin: 0.04 % Max 32 74 \ • CPU1 Current: 10, 02 !i Min: 0.01 -. Max 39 79 m-i • CPU2 Current: 24, 99 \ Mm: 0.06 Max 65 45 m-i • CPU3 Current: 21,01 :. Min: 0.06 Max 31 95 \ • CPU4 Current: 17, 99 \ Min: 0.04 Max 27.93 \ • CPU5 Current: 20, 01 \ Min: 0.03 Max 34 90 \ • CPU6 Current: 20, 97 !i Min: 0.07 -. Max 37 90 m-i • CPU7 Current: 21,97 Min: 0.06 -. Max 29 92

Figure 2.2: Screenshot of RRDtool generated graph

2.3 Cacti

The Cacti project uses the RRDtool written by Tobias Oetiker men• tioned earlier in this chapter on page 4. It enriched this tool with new web interface that provides detailed configuration options. The primary web interface is PHP application and allows configu• ration of these aspects:

• Gathering the data from the hosts

• Generating graphs

• Threshold levels for alert generation

• Granular user right management

• Network configuration backups

A MySQL database is used for storing the configuration options and RRDtool for managing the logged data. It uses cron-based poller to gather the data from different sources. It supports all 3 versions of SNMP and is extensible by external scripts and commands.

5 2. MONITORING

Figure 2.3: Screenshot of Cacti web interface

6 2. MONITORING

2.4 Nagios

Nagios is highly configurable and extensible monitoring system. It has rich web interface and it is ready for enterprise solutions. The main features of the system are:

• Hosts and services - a host is any physical or virtual device on the network that is supposed to be monitored. The host can be assigned into several groups. A service is particular function­ ality the host is providing or the resource to be monitored. For example CPU load, memory usage or SSH server.

• Contacts - the people that should be contacted when some event occurs. Contacts can be grouped and each contact can be a mem­ ber of more contact groups.

• Time periods - definition of a time spans which influences the execution of an operation.

Current Network Status Nagios*

Documentation Current Status Tactical Overview Map (Legacy) Service Overview For All Host Groups Hosts Services Host Groups S.ir irai';.'

K 1C'R° ^| ^&a a .11 I WAR'. hC ^ & Ľjfj Services (Unhandled) a Hosts (Unhandled) a Ketv.'ü'-i Outages ^B Z:. ''- :v-;iJc.i a

Re po rts WAR1. SC Availability a Trends (Legacy) Alerts Q.B

3.n ivary Windows Servers (windows-si Histogram [Legacy] Notifications No rratdili^ B Event Log 21 OK »„ UP „p : «•ARNIN0 WARIINQ A — *— » a 3ĚOK a Comments CRITICAL CRITICAL ^B Downtime 10 OK „, 14 OK Process Info o. B a 0.8 Performance Info up '.VAR k VC PENDING jp A'ARK \C Scheduling Queue a 35 OK a Configuration CRITICAL I

Figure 2.4: Screenshot of Nagios web interface

7 2. MONITORING

• Notifications - defines which contacts should be notified about the particular events. For example, production-administrators con• tact groups should be notified about errors for hosts in production- servers during working hours otherwise critsit-team should be notified. • Escalations - extensions to the notifications which defines what other action should be performed after the service is in the same state for specific period of time. For example, if the web service is unavailable for 5 hours, IT management should be notified. Nagios converts any metric value into the one of the distinct states: Ok, Warning, Critical and Unknown. This allows administrators to ignore the monitoring values themselves and just decide the warning or critical limits.

2.5 Zabbix

Zabbix is very powerful and effective open source monitoring system for network and applications. It is suitable for small and large environ• ments. As an open source project it is easy to extend and customize either trough the scripts, custom probes or through the Zabbix API.

2.5.1 Architecture

The Zabbix system consists of three main components: • Database Server - provides data storage for all the collected data and the configuration. Zabbix supports most of the ma• jor database management systems. It can be configured to use MySQL, PostgreSQL, Oracle, SQLite or IBM DB2.

• Zabbix server - performs actual monitoring. It is a daemon3 that executes the monitoring procedures, collects the data and stores them on the database server. It is also responsible for generating alerts, evaluating triggers and notifications. It is written in C in order to provide best performance and least resource usage possible.

3. long run background process

8 2. MONITORING

• WEB server - is graphical user interface accessible through HTTP(S). It is used to configure the whole system and present the collected data in form of graphs and maps. It talks to database server directly in order to retrieve the collected data. It is written in PHP.

These three components talks to each other and forms the monitor• ing system that provides all the necessary features. There are several deployment models we can consider according to the monitored envi• ronment size. For small environments the simplest solution is to set up all the parts (Database server, Zabbix server and Web server) on the same machine. This solution is simple, meaning we do not have to set up network supporting the communication between the servers. However the more hosts the system is monitoring the more computing power it needs and by having all three parts on the same machine we can reach it's limits really quickly.

Zabbix Server Database Server WEB Server

Figure 2.5: Zabbix deployment model for large environments

The administrators of larger environments should consider another deployment model. By separating each component of the system to the dedicated machine they are able to scale up the necessary components of the system more easily. While it is not usually necessary to scale up the database server, by providing more computation power to the Zabbix server the number of monitored hosts could be more than thousand. This deployment model is shown in Figure 2.5. However, there are also scenarios that except facing the problem of handling huge amount of monitored hosts needs to take in consid• eration other aspects of the environment. The monitored environment can be physically located in distinct geographical areas which are far

9 2. MONITORING from each other. These areas can be connected by unreliable and slow network and protected by firewall which does not allow the system to talk to every host in the area. For example having more data centers on the different continents. A Zabbix proxy is another component of the system that sits between the Zabbix server and the monitored host. It is similar to the Zabbix server. It gathers monitoring data from the hosts under it's management but it sends the data to the main Zabbix server. The database on the Zabbix proxy is used to store collected data temporarily when the network connection to the main Zabbix server is down. After the connection is back, the data is sent to the main Zabbix server to be processed. By using the Zabbix proxy we are also scaling out the performance of the Zabbix server by adding the nodes that are performing the data collection. This deployment model is shown in figure 2.6.

Data center 1

Zabbix Proxy Server Server Server

/—V

WEB Server Database Server Zabbix Server

Data center 2

Zabbix Proxy Server Server Server

Figure 2.6: Zabbix deployment model using proxy servers

10 2. MONITORING

2.5.2 Data collection Having the Zabbix system set up and some hosts ready to be monitored there are several ways how we can actually get the metric from the monitored host to the system. The availability of different options depends from host to host. There are a lot of options for collecting the data from a machine running Linux operating system but on the other hand a thermometer that is not running any sophisticated operating system has usually just one option established by the manufacturer.

• Zabbix agent - Zabbix agent is a piece of software that runs on the monitored host. It collects the data and uses the Zabbix protocol to send the data to the server. This is the easiest way to monitor the host but the host must have the computation power to run the agent. The agent can operate in active or passive mode.

- Active - the agent asks the server about what items should be monitored, measures the data and send them back to the server. - Passive - the server contacts the agent periodically in order to collect the measured data

Zabbix provides native agents for most Unix-like and Windows operating systems.

• Simple Network Management Protocol (SNMP) - By using SNMP Zabbix can monitor devices that do not allow the instal• lation of the Zabbix agent or do not have enough computation power to run the agent. SNMP is designed to be simple as it name says. It is supported by the majority of the network switches, thermometers, etc.

• Intelligent Platform Management Interface (IPMI) - By using IPMI Zabbix can monitor IPMI aware devices independently of operating system on the device.

• Secure Shell (SSH) - This option is useful for the devices that does not support the Zabbix agent or for the vendor-specific appliances that has limited operating system but supports SSH.

11 2. MONITORING

I CPU usertime lavg] I CPU system time [avg] I Incoming network traffic on ethO [avg]

Figure 2.7: Graph showing three items in stacked format

The Zabbix server can be configured to run remote command on the device to ensure extraction and collections of the required data. This option is server-triggered.

• Database (via ODBC) - Zabbix can use ODBC to contact the monitored database server and run the SQL queries that calcu• late the monitored data.

• Java Management Extensions (JMX) - JMX can be used when monitoring Java applications. When configured, Zabbix starts the Java poller that is responsible for communication with Java applications using JMX protocol.

The actual data collected from the hosts through any of methods mentioned are stored in the relational database on the Zabbix server. The object that describes the specific metric is called item. The items are stored in the database in raw form, though no calculations are made before saving. By storing the raw data Zabbix can perform many operations on the data subsequently. Every item is associated with the host it was collected from. The hosts can form host groups allowing the system to be managed in the more sophisticated way.

12 2. MONITORING

Figure 2.8: Map representing physical infrastructure of a web service

2.5.3 Data Visualization

The stored data itself does not provide any meaningful information about the monitored environment status. Therefore there is a necessity to provide the collected data with more meaningful view. The most usual way to visualize the data is through the graphs. Ploting the item's value on the graph and providing the view on the value progress can be used to find patterns in the data. Zabbix is fully configurable regarding graphs. Administrators can either use the de• fault configured graphs or create custom ones. By putting more items on the one graph more complex correlations between the individual items can be observed. The figure 2.7 shows the correlation of the CPU utilization and the network traffic. Sometimes looking at how the item's value changes over the time is not exactly the thing the administrator is looking for. Necessity of topological point of view on the infrastructure is important in situ• ations when the administrator have to figure out which part of the system is causing the problem. Using maps Zabbix can be configured to show the graphical representation of the physical infrastructure with corresponding item values. The administrator can see which device is hitting it's limits and how does it affects the surrounding devices. The figure 2.8 is an example of a graphical representation of the web service infrastructure using the map.

13 2. MONITORING

MongoDB Database Operations (Hi) lory Footprint (lh|

Query Ops in Insert Ops in Update Ops i

Figure 2.9: Screen showing map with appropriate graphs alongside

To accomplish the best practical view regarding the complete pic• ture of the situation Zabbix offers screens. The screen is a page on which a lot of different elements can be put in order to group related things together. The figure 2.9 is an example of having the map of the service topology and history of the database performance data on one screen. This kind of view can decrease the time of finding the cause of the problem.

2.5.4 Alerts and Triggers

The monitoring system should be able to make some calculations, compare the data to the threshold limit and notify the administrator about the situation. The mechanism of evaluating conditions and assuming the state is called triggers. The trigger is set of rules that are evaluated into the boolean result. Therefore the trigger is either in the PROBLEM or OK state. It is not tied to the specific item. The composition of the rules can use a lot of different items and any of the data stored on the server. One trigger can even contain rules involving items from different hosts. The rules in figure 2.10 check if any of two web severs have the CPU load over 98% in last hour. Every trigger has the attribute of severity which can

14 2. MONITORING

{webl.company.com:cpu_idle_hour.last(0)} < 2 or {web2.company.com:cpu_idle_hour.last(0)} < 2

Figure 2.10: Trigger rules

be one of the six possible values: Not Classified, Information, Warning, Average, High, Disaster. When the trigger changes state Zabbix generates an event describ• ing the occurred situation. It can also perform an action executing the repair mechanism for the occurred problem. Restarting httpd process via SSH after several unsuccessful connections to the web server is a good example.

2.5.5 Maintenance

Zabbix supports scheduling maintenance periods in order to prevent false positive events to be generated. This maintenance periods can be set per host or per host group. During this period the alerts generated from the hosts under the maintenance are handled differently.

2.5.6 API

Despite the extensibility that Zabbix provides through the scripts and actions it also offers a way of interacting with it's internal objects. Ev• ery item, host, event or almost any other Zabbix entity can be accessed through the JSON-RPC API which is available through the WEB inter• face URL endpoint. The API uses HTTP(S) protocol and the endpoint is available on the URL https //zabbix.example.com/zabbix/api_jsonrpc.php where zabbix. example. com is the hostname of the Zabbix WEB inter• face.

15 2. MONITORING

"jsonrpc": "2.0", "method": "user . login ", "params": { "user": "MyUser", " password ": " my_password" }, "auth": null, "id": 0 }

Figure 2.11: JSON-RPC API login request

The request message shown in figure 2.11 is an example of the API call performing the user.login method on the Zabbix server using parameters user an password specified in the message body. Several other methods can be accessed by replacing the method attribute in the message body and providing respectively with the required param• eters in order to obtain requested entities or perform various tasks. The complete do list of the available methods can be accessed on the Zabbix documentation site4.

4. https://www.zabbix.com/documentation/2.4/manual/api

16 3 IRC - Internet Relay Chat

IRC is application layer protocol that provides communication in the form of text. It was created by Jarkko Oikarinen in August 1988 in order to replace existing communication solution at the University of Oulu [8].

3.1 Architecture

The IRC utilizes client-server model which is able to be deployed in the distributed manner. It uses TCP as an underlying network protocol and the IRC server usually listens on the port 6667. It consists of two main components a server and a client. The server or the network of the servers forms the backbone of IRC. It stores global state information about the clients. The sever is a point where the clients can connect to in order to talk to each other. It also provides a point of extension in form of connecting another server and expanding the network. The server is responsible for relaying the messages sent by the clients. The only allowed network configuration is spanning tree thus there must exist only one path between two clients clients connected to the network. This path is also the shortest one. The client is a program run by a user that is used to perform the communication itself. Initially, it talks to the server in order to register user in the network. No authentication is required by default but every client must choose a nickname so the server can refer back to it. Except the nickname the server needs to know real name of the host the client is running on and the username of the client on that host. Every server in the network needs to know which clients are con• nected to which servers. This global information is synchronized be• tween the servers. When clients wants to send a message to another client it needs to contact the sever which relays the message. No client to client communication is available. Server uses the global informa• tion about the clients and sends the message to the target client or the appropriate server if the sender and the receiver is not connected to the same server. However the information about the connected clients is global the message itself is sent only through servers that

17 3- IRC - INTERNET RELAY CHAT

User!

User 2 User 3

Server A

Server C Server B Server D

Figure 3.1: IRC network are necessary for the message delivery Referring figure 3.1 if User 1 wants to send the message to the User 2 the message is seen only by Server A and if the User 1 wants to send message to the User 3 the message is seen by servers A,B and D.

3.2 Conferencing

The main goal of IRC was providing a way of conference like commu• nication (one to many conversations). One of the methods to address this demand is utilizing the channels. Channel is a form of grouping clients together. It has a name that must be unique across all servers. It is created automatically when the first client subscribes to it and lasts until the last client unsubscribes from it. Every message that is addressed to a channel is sent to all the clients on the channel. The server is responsible for the multiplication of the message and distributing it to the appropriate clients. If the message has to traverse through more servers it is send only once in

18 3- IRC - INTERNET RELAY CHAT

Action Description KICK force a client to leave the channel MODE change the mode parameters of the channel TOPIC change the topic of the channel INVITE invite a client to the channel when the channels mode is invite-only

Table 3.1: Operator privileged actions server-to-server communication and the multiplexing is handled just at the final server. The server is responsible for delivering the message however the message is not stored on the server. The message is delivered to the all clients that are currently in the channel but when a new client connects to the channel he cannot see the conversation that was in the channel before he joined the channel. It is similar to people talking in the room. If somebody is not in the room he cannot hear what are the others saying and cannot even refer to it back unless somebody repeats it for him. When a channel is created the first client that created it becomes a channel operator. It is a form of ownership of the channel. The channel operator is able to perform privileged actions in order to keep control of the channel. The available privileged actions are described in the table 3.1. A channel behavior can be modified using channel modes. The modes justifies how can clients communicate in the channel or even prevent clients of finding or joining the channel. Every mode is rep• resented by a single letter. The list of available modes is in the table 3.2.

3.3 Bot

A IRC bot is a program that connects to the IRC network as a client but does not provide a interface for user to interact with the network. It uses a set of scripts in order to perform automated actions.

19 3- IRC - INTERNET RELAY CHAT

Mode Description i the channel is invite-only P the channel is private, the topic of the channel is hidden when a client asks the server for the list of all channels and their topics s the channel is secret, it is excluded from the list of all chan• nels t the topic could be set just by the channel operator m the channel is moderated 1 the number of users on the channel is limited

Table 3.2: Operator privileged actions

The bot listens on the channel, parses the messages on the channel and responds accordingly to the received messages. It can be used for calculating the statistics of the channel, logging the conversation or even to provide textual games. By connecting the IRC bot and artifical intelligence a bot that behaves like real human on the channel can be accomplished. Any computable task can be addressed by the bot.

20 4 Zabbirc

In this chapter the IRC bot created by the author of this thesis is described. The IRC bot is called Zabbirc and brings some features of the Zabbix monitoring system inside the IRC itself.

4.1 Goal

Nowadays there are a lot of companies that runs complex distributed systems in the cloud. To accomplish high availability of the system it is heavily monitored and the operation teams consists of several mem• bers. These operators oversee the computer system in order to keep the system running properly and being able to react to the occurred situations as soon as possible. Inside the operations team a communication between the members is needed. A company IRC network may be used for this purpose. The operators sits on the channel daily, discussing the system's health and proposing the solutions. Having a monitoring system with the web interface set up helps the operators keeping to their task. By bringing the features of the monitoring system into the place where the operators spend most of their time makes their work even more effective. Therefore the main goal of Zabbirc is to bring notifications from the Zabbix monitoring system into the IRC environment. The occurred events are often the discussion topics and Zabbirc enables operators to react to the events as part of the discussion. Zabbirc also provides a way of interaction with the Zabbix monitoring system. It can be also used to set up maintenance periods, acknowledge the events or review the status of particular host. All the features of Zabbirc are described in section 4.4 later in this chapter.

4.2 Architecture

Zabbirc was designed to be run as the background process. It connects two systems, Zabbix and IRC. The interactivity is made through one of these systems. The occurred event in the Zabbix system invokes the ac-

21 4- ZABBIRC

> IRC protocol Zabbix API IRC Zabbirc reading and data collection sending messages and manipulation

Figure 4.1: Zabbirc basic architecture tion that sends a message into the IRC channel or the received message from the IRC channel invokes the action that reads or modifies data in Zabbix. Zabbirc communicates with Zabbix using the Zabbix API accessible through the Zabbix web interface. Zabbirc communicates with IRC server using IRC protocol. Ruby was chosen as the programming language of the bot. Ruby is heavily object oriented interpreted programming language with good support for network features by it's standard library. In Ruby terminology a Ruby gem is a package available through the RubyGems package manager. Several libraries from RubyGems package manager were used to create the bot. The bot itself is shipped as a Ruby gem thus it follows conventions for creating a Ruby gem in form of directory structure and naming [10].

4.2.1 Zabbix API

Zabbix provides JSON-RPC API described earlier in this thesis on page 15. Zabbirc uses this API for communicating with Zabbix. The RubyGems package manager contains several gems supporting com• munication with the Zabbbix API. These gems usually handles the creation of the HTTP request to the Zabbix API and parsing the re• sponse from it by offering the developer Ruby objects capable of doing such functionality. In the initial stage of Zabbirc development two gems were considered to be used.

• zabbixapi1 - provides ZabbixApi class that creates a connec• tion object from three input arguments. The Zabbix API URL, Zabbix user and password for authentication. This connection object is able to receive Ruby method calls and create appropriate requests to Zabbix.

1. See https://rubygems.org/gems/zabbixapi for the library page.

22 4- ZABBIRC

• zabbix-client - similar to the previous gem it provides a class Zabbix::Client that creates a connection object which can be in• teracted using ruby method calls. The implementation, however, was cleaner than in zabbixapi gem. The main principle is just translating ruby method calls into the JSON object generation using Ruby's metaprogramming abilities. The outcome from this approach is that the library itself does not need to change in order to adapt to the Zabbix API changes.

The zabbix-client gem was eventually chosen for use with Zabbirc.

4.2.2 IRC API The IRC protocol defines straightforward interface for communicating with the server by interchanging messages through TCP. However accomplishing simple tasks using the protocol may end up as longer sequences of the messages. In the interest of hiding the protocol com• plexities another library was used to handle basic IRC tasks by provid• ing a ruby-friendly interface through exposed objects. Two libraries were considered to be used.

• autumn3 - full featured framework for making IRC bots. It uses MVC4 approach influenced by Ruby on Rails. It tells directory structure conventions the developer should stick to. It is quite complex and does not fit for the Zabbirc requirements.

• cinch5 - simple framework for making IRC bots. In contrast to the autumn gem it does not tell the developer the directory structure. It provides a simple interface using several ruby classes and can be extended by creating plugins.

The cinch gem was used in the implementation of Zabbirc to un• derlay communication with IRC server. A part of Zabbirc itself is implemented as a cinch plugin.

2. See https: //rubygems. org/gems/zabbix-client for the library page. 3. See https: //rubygems. org/gems/autumn for the library page. 4. Model View Controller architecture pattern 5. See https: //rubygems. org/gems/cinch for the library page.

23 4. ZABBIRC

Cinch::Bot «spawn» Zabbirc: :Service «spawn»-

«spawn»

«use» Zabbirc::Services::Events Zabbirc::Services::Ops

Zabbirc::lrc::Plugin «use»- Zabbirc::Zabbix -«use» Zabbix::Client Cinch::Plugin

External Library

Part of Zabbirc implementation

Figure 4.2: Zabbirc components interaction

4.3 Implementation

Under the hood Zabbirc is separated into several modules in order to accomplish good extensibility and maintainability of the code. This parts interacts with each other by sending messages between ruby objects. The figure 4.2 shows which modules interact together and what kind of interaction is that. Three main components can be rec• ognized in the figure regarding colors. The IRC component in yellow, the Zabbix component in red and the Services component in blue.

4.3.1 Zabbix component

This component consists of the Zabbirc::Zabbix module and it's classes. It provides an object-relational mapping (ORM) for the entities in Zabbix accessed by Zabbix:-.Client class from the zabbix-client gem. Zab- birc::Zabbix module implements an abstract class Zabbirc::Zabbix::Re- source which underlays the ORM capabilities. The Zabbix entities are

24 4. ZABBIRC consequently handled as ruby object instead of JSON objects repre• sented as a ruby Hash6.

4.3.2 Services component

This component is the main component of Zabbirc. It handles the background process responsibilities as reaction to the signals sent by the operating system and keeping Zabbirc running being able to react to the occurred events. The Zabbirc "Service class is the inception of the Zabbirc lifetime as a system process. It starts the Cinch::Bot service for handling the connection to the IRC server alongside the other two services Zab- birc::Services::Events and Zabbirc::Services::Ops. Each service runs in it's own thread. Zabbirc:'.Service then enters the endless sleep loop in order to keep the main process running. The Zabbirc "Services "Events service is responsible for notifying the users on the IRC channel about events reported by Zabbix. The service connects to the Zabbix server repeatedly asking for new events. The Zabbirc::Services::Ops is responsible for authenticating Zabbix operators on the IRC channel. Zabbirc does not implement any compre• hensive authentication mechanism. It delegates this responsibility to the administrator of the IRC channel. This administrator should set up the authentication mechanism on the IRC channel and avoid connect• ing users with inappropriate usernames. The Zabbirc::Services::Ops service then compares the IRC usernames with the list of users in the Zabbix system. This process is executed repeatedly to accomplish freshness of the data.

4.3.3 IRC component

The Cinch::Bot initiated by Zabbirc::Service connects to the IRC server a keeps maintaining the connection by sending keep-alive messages repeatedly. It loads Zabbirc::Irc::Plugin and registers the matchers for Zabbirc commands. The matcher is defined by the regular expression that is compared against every message and the method that is sup• posed to be invoked when the regular expression matches the message.

6. A Hash is a ruby implementation of dictionary-like data structure http:// ruby-doc.org/core-2.2.O/Hash.html

25 4. ZABBIRC class Zabbirc :: Ire :: Plugin include Cinch :: Plugin match /maint(?: (.*))?\Z/, method: : maintenance_command def maintenance_command m, cmd cmd = MaintenanceCommand .new(ops , m, cmd) cmd. run end end

Figure 4.3: Definition of the matchers for Zabbirc commands

Every Zabbirc command is implemented in the way shown in the fig• ure 4.3. The goal is to keep the plugin class as simple as possible and extract the functionality of every command into the separate class. The method referenced by the match method is executed in separate thread. The main Cinch::Bot thread continues to listen to the messages in the channel and to compare them against matcher allowing simultaneous commands to be handled by Zabbirc. The Zabbirc::Irc::Plugin registers matcher for every command Zab• birc provides. The separate class is also defined to handle the func• tionality of every command. This separation follows the single respon• sibility principle and makes Zabbirc easier to test.

4.4 Features

Regarding the implementation details described in the previous sec• tion Zabbirc provides several functionalities supporting the interaction with Zabbix through IRC channel.

4.4.1 Events

Zabbirc makes repetitive checks against Zabbix in order to keep a track of new occurred events. These events are reported to the channel addressing interested operators in the message.

26 4- ZABBIRC zabbirc: operator: |8J4| 20 Dec 03:06 [high] User CPU high on hostl - problem operator: lack 8J4 I'm on it zabbirc: operator: Event |8J4| 20 Dec 03:06 [high] User CPU high on hostl - poblems acknowledged with message: I'm on it

Figure 4.4: Event notification and acknowledgment

operator: !status hostl zabbirc: operator: Host: host 1 - status: 2 problems zabbirc: operator : status: 12 Oct 12:02 [warning] Lack of free swap space on hostl - problem zabbirc: operator: status: 20 Dec 03:38 [high] User CPU high on hostl - problem

Figure 4.5: Host status reporting

The operator can access the latest events using the '.eventscom • mand. The command accepts two optional arguments. A priority of the event mapped to the severity of the trigger that generated the event described earlier in this document on page 15. A text that is used as a filter for the host name associated to the event. The event can be acknowledged using the Zabbix acknowledge feature of the event. Zabbirc generates and keeps track of short event ids to provide easier typing for lack command. An example of acknowl• edging an event using Zabbirc is shown in figure 4.4

4.4.2 Hosts

After receiving several event notifications the things can get quite messy regarding the overall situation view of the problem. For obtain• ing the overview of the host status Zabbirc provides '.statuscomman d that prints the host status with appropriate triggers sorted by the severity. Example of this usecase is shown in figure 4.5 The command '.latest can list the last N events that occurred for the specified host where N is argument of the command.

27 4. ZABBIRC

4.4.3 Maintenance The Imaint command can be used to either list the active maintenance periods, create ones or delete the existing ones. When creating the maintenance period the operator can specify affected hosts or host groups. The other commands reporting the events recognizes when a host associated with the event is included in some maintenance period and the appropriate label is printed respectively.

4.4.4 Settings

Every operator has option to customize the specific situation when interacting with Zabbirc. The '.settings provides a way of showing and modifying these user specific settings:

• notify - whether the user want to be notified when an event occurs

• notify_recoveries - whether the user want to be notified when the trigger's state changes back from PROBLEM to OK and the recovery event is generated

• events_priority - the minimum priority of the events that the user should be notified about

Every setting can be adjusted globally for the specific user or on the host group basis. This enables operators from different teams not to be notified about the part of the network that is not under their surveillance.

4.5 Installation

Zabbirc is published as a Ruby gem in the RubyGems package man• ager7. It can be installed using the RubyGems command line tool. The package manager handles all the dependencies to be installed on the system.

7. https: / / rubygems .org / gems / zabbirc

28 4. ZABBIRC

When the Zabbirc gem is installed it needs to generate the config• uration file in order to be able to run. This configuration file contains authentication credentials for the Zabbix API and for the IRC server. Zabbix contains two executable scripts. The first one, zabbirc-install, is used for generating the configuration file from the template pro• vided with the package. The second one, zabbirc, is used for starting Zabbirc itself. It needs path to the configuration file as an argument. It also traps the SIGINT and SIGTERM signals from operating system and performs the valid shutdown.

$ gem install zabbirc

$ zabbirc-install path/to/zabbirc/installation/

$ zabbirc path/to/zabbirc/installation/zabbirc_config . rb

Figure 4.6: Installation steps for Zabbirc

The figure 4.6 shows the steps to install Zabbirc through termi• nal using command line. Assuming Ruby and RubyGems is already installed on the operating system Zabbirc can be installed as a gem using the first command. The second command then generates con• figuration file which should be filled with authentication credentials and the last command runs the bot using the configuration file.

29 5 Conclusion

This thesis described the necessity of the monitoring system in some production scenarios and discussed several options to accomplish this requirement. The focus was taken on Zabbix monitoring system which was explored in way of understanding the architecture and the interfaces it provides for interaction with the system. The IRC architecture was described along with the libraries in the Ruby programming language that provides comprehensive way of cooperating with the IRC infrastructure. The IRC control bot for Zabbix monitoring system was designed and implemented using Ruby programming language. The features of the bot were also described in the thesis along with installation and usage manual. The bot was named Zabbirc. The author of this thesis believes that the official requirements were fulfilled. Zabbirc is ready to run in production with passing test suite.

30 Bibliography

[1] Internet Live Stats, [online], [cit. 2016-01-03]. URL: http : / /www. internetlivestats.com/internet-users/. [2] Tobias Oetiker CV. [online], [cit. 2015-12-11]. URL: https: //tobi. oetiker.ch/vita.html. [3] Tobias Oetiker. MRTG The Multi Router Traffic Grapher. Dec. 1998. URL: https://www.usenix.org/legacy/publications/library/ proceedings/lisa98/full_papers/oetiker/oetiker.pdf. [4] Graphing Performance Info With MRTG. [online], [cit. 2015-12-22]. URL: https : //assets . nagios . com/downloads/nagioscore/ docs/nagioscore/3/en/mrtggraphs.html. [5] Thomas Urban. Cacti 0.8 beginner's guide : learn Cacti and de• sign a robust network operations center. Birmingham UK: Packt Pub./Open Source, 2011. ISBN: 978-1-849513-92-0. [6] Wojciech Kocjan. Learning Nagios 4 learn how to set up Nagios 4 in order to monitor your systems efficiently. Birmingham, UK: Packt Pub, 2014. ISBN: 978-1-78328-864-9. [7] Andrea Vacche. Mastering Zabbix: learn how to monitor your large IT environments using Zabbix with this one-stop, comprehensive guide to the Zabbix world. Birmingham, UK: Packt Publishing, 2015. ISBN: 978-1-78528-926-2. [8] Jarkko Oikarinen. Founding IRC. [online], [cit. 2015-12-17]. Dec. 2015. URL: http://www.mire.com/j arkko.html. [9] Jarkko Oikarinen and Darren Reed. Internet Relay Chat Protocol. RFC 1459. RFC Editor, May 1993. URL: http://www.rf c-editor. org/rfc/rfcl459.txt. [10] RubyGems Guides, [online], [cit. 2015-12-19]. URL: http: //guides. rubygems.org/.

31