Check MK Documentation Documentation Release
Total Page:16
File Type:pdf, Size:1020Kb
Check_MK Documentation Documentation Release Marius Pana August 12, 2019 Contents 1 What is Monitoring all about?3 2 Why monitor things? 5 2.1 States...................................................5 2.2 Events..................................................5 2.3 What is a monitoring system?......................................5 2.4 What is a check?.............................................6 2.5 Active vs. Passive Checks........................................6 3 Check_MK Architecture 7 3.1 Check_MK Components.........................................7 3.2 Configuration & Check Engine.....................................8 3.3 Livestatus.................................................8 3.4 Multisite.................................................8 3.5 WATO..................................................8 3.6 Notify...................................................9 3.7 Business Intelligence...........................................9 3.8 Mobile..................................................9 3.9 Event Console.............................................. 10 4 Check_MK Quickstart 11 4.1 Prerequisites............................................... 11 4.2 Operating System............................................ 11 4.3 Disk Space................................................ 11 4.4 SMTP for outgoing emails........................................ 11 4.5 Time services............................................... 12 4.6 Repository................................................ 12 4.7 Downloading Check_MK........................................ 12 4.8 Installation................................................ 12 4.9 Confirmation............................................... 13 5 Create a Site 15 5.1 OMD command............................................. 15 5.2 OMD config command.......................................... 16 5.3 Creating the site............................................. 18 6 Contacts 19 6.1 Roles................................................... 20 6.2 Contact Groups.............................................. 20 i 7 Adding a Linux host 23 7.1 Agent bakery............................................... 23 7.2 Prebuilt package............................................. 23 7.3 Xinetd.................................................. 24 7.4 Adding host to OMD........................................... 24 8 Adding a Windows host 25 8.1 Agent bakery............................................... 25 8.2 Prebuilt package............................................. 25 8.3 Windows installer............................................ 25 8.4 Adding host to OMD........................................... 26 9 Multisite 27 9.1 Views................................................... 27 9.2 Distributed monitoring.......................................... 27 ii Check_MK Documentation Documentation, Release Welcome to our Check_MK documentation. The documentation provided here is meant as a quick introduction to the Checkmk Enterprise Editi (CEE). The ma- jority of the information here should apply to the open source (RAW) versions as well. If you find mistakes or feel you could contribute in helping us provide better documentation please dive in by visiting this github page. The official Check_MK documentation is available at the following link https://checkmk.com/cms.html. Contents: Contents 1 Check_MK Documentation Documentation, Release 2 Contents CHAPTER 1 What is Monitoring all about? There are many definitions of what monitoring is and where it might be useful. For [https://spearhead.systems](our) purposes, as an IT service provider, the following defition fits well: “Monitoring is a process by which we obtain information that we use to determine availbility and perfor- mance of systems”. Systems are a bit harder to define and generally mean anything that has an IP address but it gets blurry as we start to look at services, processes, etc. Suffice it to say we monitor all aspects of modern IT infrastructures and all that they entail (containers, functions, clouds, physica/virtual servers, cooling units, storages, routers, etc.). 3 Check_MK Documentation Documentation, Release 4 Chapter 1. What is Monitoring all about? CHAPTER 2 Why monitor things? Keeping an eye on IT systems provides many benefits for the entire business. Availability, or the time the system is available and functional within normal parameters, is of interest to everyeone: any deviation from normal functionality could have impact to more than just computer systems, it could mean loss of money, damages or other negative results. Through monitoring we obtain useful information about what these systems are doing and we use information that to help us better optimize and plan for the future (we call this future planning “capacity planning”). An immediate effect of monitoring is that we get almost instant alerts when something is not functioning or is allto- gether missing. Longer term however we get insight into the systems and applications via trends, historical information and graphics that quickly identify patterns that may have otherwise never been observed. With relevant and timely information you can move to a more ordered and predictable state. For some this means a shift in methodology from a “putting out the fires” mentality to a keeping them from happening in the first place. So what can monitoring do for us? Lets take a look at it from the checkmk perspective and define some standard terminology to help us along. 2.1 States States are determined by a measurement or a check. This verification or polling needs to be carried out regularly. States give us information such as: is the system running, how much memory is being used? 2.2 Events Events are more dynamic in their nature and therefore are not easily identified by a regular polling interval. Events may also be errors which only occur once making them that much more difficult to identify by regular checks. Examples of events are: disk I/O errors or hot-plug/add of a device. 2.3 What is a monitoring system? A monitoring system is a piece of hardware or software that offers monitoring facilities. Usually such a system is state based that retrieves information from the monitored hosts via some mechanism. This information often called the check result is then processed, stored and possibly archived. The system will also provide methods to retrieve and display the check results. 5 Check_MK Documentation Documentation, Release 2.4 What is a check? A check is usually a piece of software running on a computer that does the measurement of a service. An example of a check is: opening a TCP connection to a web server and verifying the result. The check determines the state of a service. 2.5 Active vs. Passive Checks In monitoring there is the concept of active and passive checks. An active check is triggered by the monitoring server usually through a poll or request. A passive check is sent directly by the monitored server and the server has no influence on when this result is sent. Furthermore a passive check may not send a result during a specified time-frame and the monitoring server can raise an alarm at that point. Now that we’ve got that out of the way let’s take a look at the checkmk architecture. 6 Chapter 2. Why monitor things? CHAPTER 3 Check_MK Architecture You should always know the architecture of your systems, networks, applications, etc. The more you know about the moving parts the easier it is to debug or troubleshoot something. Having detailed information about the inner workings of anything is extremely useful in identifying potential issues before or after an incident. Below we will take a look at the Checkmk architecture so that you may know all of the components involved. This information will help you in scaling or identifying eventual performance issues as you should be able to spot the symptoms and associate them with functional components. 3.1 Check_MK Components There are many components that make up the Checkmk monitoring system. Below you will find information about each of the major components. 7 Check_MK Documentation Documentation, Release 3.2 Configuration & Check Engine The Checkmk CCE provides an elegant method for configuring your monitoring platform independently of the mon- itoring core you are using. You do not have to deal directly with typical configuration files since Checkmk takes care of automatic service discovery and the actual configuration. Checkmk provides a highly efficient method for doing checks: your hosts are contacted only once per Check Interval. The results are then sent to the monitoring core as passive checks. This results in a substantial reduction in resource usage. Compare the above to how typical nagios (and most monitoring platforms) function: For every Check Interval (usually once per minute) you poll your hosts for every metric that you are interested in. So if you have one CPU and one Hard Disk you would have two checks every minute (one for the CPU and one for Hard Disk). Checkmk obtains both metrics in one efficient check. Imagine now that you have thousands of hosts and up to hundreds of thousands of checks, check_mk would save you considerable resources. The Checkmk CCE takes care of creating complete configuration data necessary for your monitoring system. You can focus on gathering the information that is important to you and less on configuration files. Official Checkmk CCE documentation is available here https://checkmk.com/cms.html 3.3 Livestatus Livestatus provides a direct connection to Status Data on Hosts and Services via