Local Service Monitoring Status of Linux Operating Systems

Total Page:16

File Type:pdf, Size:1020Kb

Local Service Monitoring Status of Linux Operating Systems MASARYK UNIVERSITY FACULTY OF INFORMATICS Local service monitoring status of Linux operating systems BACHELOR THESIS Jakub Svoboda Brno, Spring 2012 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Jakub Svoboda Advisor: Mgr. Pavel Tuˇcek ii Acknowledgement I’d like to thank to my advisor Mgr. Pavel Tuˇcekfor patiency, guidance, invaluable assistance and encouragement. I’d also like to thank to Jan Koneˇcnýfor programming advices in the course of designing the application. iii Abstract Theoretical part of the thesis analyzes methods of monitoring Linux operating system and monitoring requirements of the Institute of Computer Science. In the practical part of the thesis, Linux monitoring application is designed and implemented. The application is developed as a part of ICS’ Large Enterprise Monitoring (Lemon) project. iv Keywords Linux, monitoring, Mono, Lemon, LinMon v Contents 1 Introduction ....................................... 1 2 Operating system monitoring in general ...................... 2 2.1 Operating system purpose ............................ 2 2.2 Reliability of operating system ......................... 2 2.3 Reasons for monitoring .............................. 3 2.4 Existing GNU/Linux-compatible solutions .................. 3 2.4.1 SYSSTAT . 4 2.4.2 Dstat . 4 2.4.3 vmstat . 4 2.4.4 Collectd . 4 2.4.5 Munin . 5 2.4.6 Nagios, Shinken and Icinga . 5 2.4.7 PCP . 5 2.4.8 Xymon . 5 3 System monitoring at the Institute of Computer Science ............. 7 3.1 Lemon project ................................... 7 3.1.1 Lemon architecture . 7 Event generators ............................. 7 Transport system ............................ 8 Processing system ............................ 8 Web service and presentation application .............. 9 3.2 Monitoring of operating systems ........................ 9 3.3 Requirements for monitoring of GNU/Linux machines ........... 9 3.3.1 Report format . 9 3.3.2 Scope of monitoring . 10 3.3.3 Runtime requirements . 11 3.3.4 Packaging requirements . 11 3.4 Suitability of existing applications ....................... 11 4 Implementation of GNU/Linux monitoring application . 12 4.1 Chosen goals .................................... 12 4.1.1 Operating system and programming language . 12 4.1.2 Configuration . 12 4.1.3 Types of reports . 13 4.1.4 Monitored areas . 13 Disk usage ................................ 13 Iptables .................................. 13 Network interfaces ........................... 17 Users ................................... 17 Groups .................................. 18 Recent logins (both physical and remote) . 18 vi Recent physical logins ......................... 18 Recent login attempts over ssh (both successful and unsuccess- ful) ................................ 18 Installed packages ............................ 19 Available package updates ....................... 19 Recent reboots .............................. 19 Time of last reboot ........................... 20 Processes using CPU above a set limit . 20 Information about operating system . 20 System installation date ........................ 20 4.2 Application architecture ............................. 21 4.2.1 Classes and interfaces . 21 4.2.2 Error handling . 22 4.2.3 File access and permissions . 23 4.2.4 Settings file and alternative settings . 23 4.2.5 Report production . 24 4.2.6 Report comparison . 24 Comparison without a primary key . 25 Comparison with a primary key ................... 25 5 Testing of LinMon ................................... 26 5.1 Testing environment ............................... 26 5.1.1 Personal computers . 26 GNU/Linux ................................ 26 Other than GNU/Linux ......................... 27 5.1.2 ICS server . 27 Problems that occurred during testing . 27 5.2 LinMon in production environment ...................... 28 6 Conclusion ........................................ 29 Bibliography . 29 vii 1 Introduction Masaryk University operates a large computer network. The need to manage the network effectively resulted in a computer monitoring project called Lemon. Lemon is a system that collects data about computers using agents installed on the computers and processes them on centralized servers. It allows Masaryk University to check computer health, localize problems and to prevent misuse. Lemon is modular and consists of several components. Lemon is capable of monitoring only Windows systems at this time, with planned Linux support. This thesis deals with the Linux monitoring agent. The second chapter of the thesis describes monitoring of operating systems in general and Linux-compatible monitoring applications. The third chapter presents system moni- toring at the Institute of Computer Science, requirements for the monitoring application and suitability of existing applications. The fourth chapter presents chosen goals and describes architecture of the developed application. The fifth chapter talks about testing and its results. The last chapter evaluates the developed application and its impact on system monitoring at the Masaryk University. The thesis is accompanied by three Appendices. Appendix A lists all source code, Appendix B contains class diagrams and Appendix C is an Administrator’s Guide which helps administrators with using LinMon. 1 2 Operating system monitoring in general 2.1 Operating system purpose Computers and the software they run are complex mechanisms with an inherent risk of failure1. Most computers today are run with an operating system so that multiple applications can run at once, use the operating system routines instead of accessing the bare hardware and use safe mechanisms of inter-process communication.2 2.2 Reliability of operating system The apparent advantage of running an operating system comes with a cost. A failure in the operating system can result in all the applications being unable to run properly or to run at all. (Not that an application running on the bare metal cannot fail, but OS adds additional possible point of failure apart from the application itself.) Microkernel and nanokernel operating systems try to solve this problem by minimizing the amount of code comprising the very core of the operating system with independent modules that can be restarted without affecting the rest of the system in a case of failure3. There are even operating systems capable of taking snapshots—containing state of all running programs, memory and necessary data—and then recovering to the latest snapshot in case of failure, with no need to restart programs or check filesystems, effectively restarting quickly and losing only last few minutes of data. This property is called orthogonal persistence4. However, general-purpose operating systems are not made in such a safe and uninterruptible way. They are made of interdependent components and failure can propagate throughout the system. The majority of workstations and a large share of servers run general-purpose operat- ing systems nowadays5. When a failure occurs, a manual intervention may be required. The system administrator6 must collect all the information they need to find and resolve the problem. This might include: • Errors logged in a system log. 1. Proving the correctness of software is a very hard to nearly impossible task. Bruce Schneier has an informative post with a discussion on his blog [1]. 2. Solutions going against this approach do exist—such as running a single application on bare hardware or using exokernel (such as MIT Exokernel Operating System [2]) merely for multiplexing bare hardware, isolating the applications from each other—but they are limited to microcontrollers and embedded systems, as in the case of a single application, or simply very rare, as in the case of using exokernel. 3. Such an operating system is QNX Neutrino [3]. 4. Such operating systems are KeyKOS [4] and EROS [5], for instance. 5. According to W3Techs statistics from March 25th 2012, “Unix” and Windows web servers together have 100% market share. http://w3techs.com/technologies/overview/operating_system/all 6. I will call anyone who is experienced in maintaining and repairing an operating system a “system administrator” for the purpose of this thesis. 2 2. OPERATING SYSTEM MONITORING IN GENERAL • Disk usage information. (Is there a disk that is full that consequently caused the system to fail?) • Installed software and changes in installed software. (Has a new software or software update caused the failure?) • Users, groups and changes in users and groups. (Is there a new user who may have run something that caused the failure?) • Logins to the system. (The system administrator can look who was logged in at the time of the failure and limit the scope of the investigation.) • Network interfaces. (Was there a change in the network configuration that caused the system to fail?) • Firewall rules. (Was there a change of the firewall rules or does a rule collide with something?) • Reboots. (If and when the machine was rebooted.) • What is the exact version of the operating system (there might be a bug specific to this OS version). 2.3 Reasons for monitoring The need for frequent and repetitive collection of information makes automation rele- vant. And not only that. There are cases where it is useful to monitor operating system state. A history of the states can be inspected
Recommended publications
  • Naemonbox Manual Documentation Release 0.0.7
    NaemonBox Manual Documentation Release 0.0.7 NaemonBox Team September 16, 2016 Contents 1 Introduction 3 1.1 Target audience..............................................3 1.2 Prerequisite................................................3 2 About Naemonbox 5 2.1 Project..................................................5 2.2 Features..................................................6 3 Installation Guide 7 3.1 System requirements...........................................7 3.2 Recommended system requirements...................................7 3.3 Client Operating Systems........................................7 3.4 Openvz VPS installation.........................................8 3.5 GNU/Linux Debian 7 (or later) Installation...............................8 3.6 Installing Naemonbox..........................................8 4 Getting Started 9 4.1 Step one.................................................9 4.2 Step two................................................. 10 4.3 Step three................................................. 10 4.4 Step four................................................. 10 5 Configuring Naemon 11 5.1 Introduction............................................... 11 5.2 Actions.................................................. 11 5.3 Hosts Definition............................................. 12 5.4 Services.................................................. 13 5.5 Commands................................................ 14 5.6 Time periods............................................... 15 5.7 Contacts................................................
    [Show full text]
  • Josh Malone Systems Administrator National Radio Astronomy Observatory Charlottesville, VA
    heck What the #%!@ is wrong ^ with my server?!? Josh Malone Systems Administrator National Radio Astronomy Observatory Charlottesville, VA 1 Agenda • Intro to Monitoring • Internet protocols 101 • • Nagios SMTP • IMAP • Install/Config • HTTP • Usage • Custom plugins • Packet sniffing for dummies • Intro to Troubleshooting • Tools • telnet, openssl • grep, sed • ps, lsof, netstat 2 MONITORING 3 Automated Monitoring Workflow 4 Monitoring Packages: Open Source • • Pandora FMS • Opsview Core • Naemon • • • • • • Captialware ServerStatus • Core • Sensu All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here. 5 Monitoring Packages: Commercial • Nagios XI • Groundwork • PRTG network monitor • CopperEgg • WhatsUp Gold • PRTG network monitor • op5 (Naemon) All Trademarks and Logos are property of their respective trademark or copyright holders and are used by permission or fair use for education. Neither the presenter nor the conference organizers are affiliated in any way with any companies mentioned here. 6 Why Automatic Service Monitoring? • Spot small problems before they become big ones • Learn about outages before your users do • Checklist when restoring from a power outage • Gives you better problem reports than users • Problems you might never spot otherwise • Failed HDDs in RAIDs • Full /var partitions • Logs not rotating • System temperature rising 7 Why Automatic Service Monitoring? • Capacity planning • Performance data can generate graphs of utilization • RAM, Disk, etc. • Availability reports - CAUTION • Easy to generate -- even easier to generate wrong • Make sure your configurations actually catch problems • Will also include problems with Nagios itself :( • If you’re going to quote your availability numbers (SLAs, etc.) make sure you understand what you’re actually monitoring.
    [Show full text]
  • Monitoring Bareos with Icinga 2 Version: 1.0
    Monitoring Bareos with Icinga 2 Version: 1.0 We love Open Source 1 © NETWAYS Table of Contents 1 Environment 2 Introduction 3 Host 4 Active Checks 5 Passive Events 6 Graphite 2 © NETWAYS 1 Environment 3 © NETWAYS Pre-installed Software Bareos Bareos Database (PostgreSQL) Bareos WebUI Icinga 2 IDO (MariaDB) Icinga Web 2 Graphite 4 © NETWAYS 2 Introduction 5 © NETWAYS 2.1 Bareos 6 © NETWAYS What is Bareos? Backup Archiving Recovery Open Sourced Backup, archiving and recovery of current operating systems Open Source Fork of Bacula (http://bacula.org) Forked 2010 (http://bareos.org) AGPL v3 License (https://github.com/bareos/bareos) A lot of new features: LTO Hardware encryption Bandwidth limitation Cloud storage connection New console commands Many more 7 © NETWAYS Bareos Structure 8 © NETWAYS 2.2 Icinga 2 9 © NETWAYS Icinga - Open Source Enterprise Monitoring Icinga is a scalable and extensible monitoring system which checks the availability of your resources, notifies users of outages and provides extensive BI data. International community project Everything developed by the Icinga Project is Open Source Originally forked from Nagios in 2009 Independent version Icinga 2 since 2014 10 © NETWAYS Icinga - Availability Monitoring Monitors everything Gathering status Collect performance data Notifies using any channel Considers dependencies Handles events Checks and forwards logs Deals with performance data Provides SLA data 11 © NETWAYS What is Icinga 2? Core based on C++ and Boost Supports all major *NIX and Windows platforms Powerful configuration
    [Show full text]
  • Pynag Documentation Release 0.9.0
    pynag Documentation Release 0.9.0 Pall Sigurdsson and Tomas Edwardsson July 23, 2014 Contents 1 Introduction 3 1.1 About pynag...............................................3 2 The pynag module 5 2.1 pynag Package.............................................5 2.2 Subpackages...............................................5 3 The pynag command line 85 3.1 NAME.................................................. 85 Python Module Index 89 i ii pynag Documentation, Release 0.9.0 Release 0.9.0 Date July 23, 2014 This document is under a Creative Commons Attribution - Non-Commercial - Share Alike 2.5 license. Contents 1 pynag Documentation, Release 0.9.0 2 Contents CHAPTER 1 Introduction 1.1 About pynag Pynag is a all around python interface to Nagios and bretheren (Icinga, Naemon and Shinken) as well as providing a command line interface to them for managing them. 3 pynag Documentation, Release 0.9.0 4 Chapter 1. Introduction CHAPTER 2 The pynag module 2.1 pynag Package 2.2 Subpackages 2.2.1 Control Package Control Package The Control module includes classes to control the Nagios service and the Command submodule wraps Nagios com- mands. class pynag.Control.daemon(nagios_bin=’/usr/bin/nagios’, nagios_cfg=’/etc/nagios/nagios.cfg’, na- gios_init=None, sudo=True, shell=None, service_name=’nagios’, na- gios_config=None) Bases: object Control the nagios daemon through python >>> from pynag.Control import daemon >>> >>> d= daemon() >>> d.restart() SYSTEMD = 3 SYSV_INIT_SCRIPT = 1 SYSV_INIT_SERVICE = 2 reload() Reloads Nagios. Returns Return code of the reload command ran by pynag.Utils.runCommand() Return type int restart() Restarts Nagios via it’s init script. Returns Return code of the restart command ran by pynag.Utils.runCommand() Return type int 5 pynag Documentation, Release 0.9.0 running() Checks if the daemon is running Returns Whether or not the daemon is running Return type bool start() Start the Nagios service.
    [Show full text]
  • Azure Icinga 2.5 - Client Connection Guide Scope
    Azure Icinga 2.5 - Client Connection Guide Scope The purpose of this document is to provide the steps necessary for connecting a client instance of Icinga 2, version 2.5 or later, to a master node. The steps contained within are sourced from the official Icinga 2 documentation in Section 6, "Distributed Monitoring with Master, Satellites, and Clients" This version of the documentation has been adapted to match the necessary upgrade steps for an instance of the Shadow-Soft Marketplace VHD image. Distributed Monitoring Your Shadow-Soft Marketplace VHD image for Icinga 2 is already configured with a "Master" node. If you have a second Icinga 2 node that you would like to have as a part of your monitoring environment, you can connect the two Icinga 2 daemons together securely using the included icinga2 node wizard commands. This creates an SSL-authenticated tunnel between the daemons over port 5665. This connection will allow configuration to be distributed outward to the satellite, and allow local checks on the satellite node to be executed, then communicated upstream to the master. A master node has no parent node A master node is where you usually install Icinga Web 2. A master node can combine executed checks from child nodes into backends and notifications. A satellite node has a parent node, and may have a child node. A satellite node may execute checks on its own or delegate check execution to child nodes. A satellite node can receive configuration for hosts/services, etc. from the parent node. A satellite node continues to run even if the master node is temporarily unavailable.
    [Show full text]
  • Observing the Clouds: a Survey and Taxonomy of Cloud Monitoring Jonathan Stuart Ward† and Adam Barker*†
    Ward and Barker Journal of Cloud Computing: Advances, Systems and Applications (2014) 3:24 DOI 10.1186/s13677-014-0024-2 RESEARCH Open Access Observing the clouds: a survey and taxonomy of cloud monitoring Jonathan Stuart Ward† and Adam Barker*† Abstract Monitoring is an important aspect of designing and maintaining large-scale systems. Cloud computing presents a unique set of challenges to monitoring including: on-demand infrastructure, unprecedented scalability, rapid elasticity and performance uncertainty. There are a wide range of monitoring tools originating from cluster and high-performance computing, grid computing and enterprise computing, as well as a series of newer bespoke tools, which have been designed exclusively for cloud monitoring. These tools express a number of common elements and designs, which address the demands of cloud monitoring to various degrees. This paper performs an exhaustive survey of contemporary monitoring tools from which we derive a taxonomy, which examines how effectively existing tools and designs meet the challenges of cloud monitoring. We conclude by examining the socio-technical aspects of monitoring, and investigate the engineering challenges and practices behind implementing monitoring strategies for cloud computing. Keywords: Cloud computing; Monitoring Introduction most accepted description of the general properties of Monitoring large-scale distributed systems is challeng- cloud computing comes from the US based National Insti- ing and plays a crucial role in virtually every aspect of tution of Standards and Technology (NIST) and other a software orientated organisation. It requires substantial contributors [3,4]: engineering effort to identify pertinent information and to • obtain, store and process that information in order for it On-demand self service: A consumer is able to to become useful.
    [Show full text]
  • Supervision Utilisation De Check-MK
    Supervision Utilisation de Check-MK Réseau Min2Rien Journée Thématique « retour d’expériences » - 13/02/2014 NICOLAS JAMIN – Administrateur Système – DSI de l’Académie de LILLE Supervision: Utilisation de Check-MK Plan . Présentation . Fonctionnement / Configuration . Création de scripts (Présentation et exemple) . Utilisation / Capture d’écran . Livestatus / Multisite Nicolas JAMIN – DSI de l’Académie de LILLE 14/02/2014 2 Supervision: Utilisation de Check-MK Presentation . Check-MK : add-on de supervision écrit en python . Nagios utilise des plugins de type actif 1 plugin = 1 service Exécution côté Nagios • Exemple: check_disk = test de l’espace disque . Check-MK utilise des plugins de type passif 1 plugin ~ 30 tests de base Exécution côté Client • Exemple: check_mk = check_disk + check_mem + check_ps + … . Forte baisse du taux d’utilisation des CPUs sur Nagios grâce à Check-MK . Remplace les plugins de type NRPE, NSCLIENT ++ Nicolas JAMIN – DSI de l’Académie de LILLE 14/02/2014 3 Supervision: Utilisation de Check-MK Fonctionnement Nicolas JAMIN – DSI de l’Académie de LILLE 14/02/2014 4 Supervision: Utilisation de Check-MK Configuration (sur le serveur NAGIOS) . Fichier main.mk Contient la configuration générale (seuil d’alerte, exclusions des services …) • Inventory_df_exclude_mountpoints = [‘/dev’,’/mnt/vzsnap0’] • Filesystem_default_level[‘levels’] = (90.0, 95.0) . Fichier tcp_hosts.mk Contient les hosts et hostgroups des serveurs auxquels ils appartiennent. • template124.expr.in.ac-lille.fr|linux|apt|openvz|ubuntu12 . Fichier hostgroups.mk Contient la définition des hostgroups. • (‘Serveurs Ubuntu 12.04’, [ ‘ubuntu12’ ], ALL_HOSTS), Nicolas JAMIN – DSI de l’Académie de LILLE 14/02/2014 5 Supervision: Utilisation de Check-MK Configuration (sur le serveur NAGIOS) . Plus aucune configuration côté Nagios Core • /etc/nagios/host.cfg, /etc/nagios/hostgroup.cfg … .
    [Show full text]
  • Network Monitoring Using Nagios and Autoconfiguration for Cyber Defense Competitions
    NETWORK MONITORING USING NAGIOS AND AUTOCONFIGURATION FOR CYBER DEFENSE COMPETITIONS Jaipaul Vasireddy B.Tech, A.I.E.T, Jawaharlal Nehru Technological University, India, 2006 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE at CALIFORNIA STATE UNIVERSITY, SACRAMENTO FALL 2009 NETWORK MONITORING USING NAGIOS AND AUTOCONFIGURATION FOR CYBER DEFENSE COMPETITIONS A Project by Jaipaul Vasireddy Approved by: __________________________________, Committee Chair Dr. Isaac Ghansah __________________________________, Second Reader Prof. Richard Smith __________________________ Date ii Student: Jaipaul Vasireddy I certify that this student has met the requirements for format contained in the University format manual, and that this Project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator ________________ Dr. Cui Zhang Date Department of Computer Science iii Abstract of NETWORK MONITORING USING NAGIOS AND AUTOCONFIGURATION FOR CYBER DEFENSE COMPETITIONS by Jaipaul Vasireddy The goal of the project is to monitor the services running on the CCDC (College Cyber Defense Competition) network, using Nagios which uses plugins to monitor the services running on a network. Nagios is configured by building configuration files for each machine which is usually done to monitor small number of systems. The configuration of Nagios can also be automated by using shell scripting which is generally done in an industry, where the numbers of systems to be monitored are large. Both the above methods of configuration have been implemented in this project. The project has been successfully used to know the status of each service running on the defending team’s network.
    [Show full text]
  • Performance Monitoring Using Nagios Core Hpc4e-Comcidis Vin´Icius P
    Performance Monitoring Using Nagios Core HPC4e-ComCiDis Vin´ıcius P. Kl^oh Mariza Ferro Gabrieli D. Silva Bruno Schulze LNCC { Petr´opolis,RJ Abstract The High Performance Computing for Energy (HPC4e) project aims to apply\new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources that are the present and the future of energy like, wind energy production and design, efficient combustion systems for biomass-derived fuels (biogas), and exploration geophysics for hydrocarbon reservoirs". Beyond the general objective, there are specific technical objectives that will be developed to enhance the final results. Our objective is study the mapping and optimization of the codes proposed for simulations in energy domain (atmosphere, biomass and geophysics for energy), analysing all the aspects related with the performance of these simulations' codes. Trying to meet all these objectives, we are investigating performance tools that would help our research. We investigated at first tools that enable online measurement of performance (online approaches are without code instrumentation). More specifically, in this work we present our initial work with Nagios and the hard begin to put this performance tool on work. In this work we present the steps and instructions, on how to install and configure Nagios Core to enhance it monitoring your local and remote host. July 2016 Contents 1 Introduction 2 2 Nagios Core 3 3 Install and Configure Nagios Core and Basic Plugins 4 4 Plugins 6 4.1 Install and Configure NRPE (Nagios Remote Plugin Executor) .
    [Show full text]
  • Peter Helin, ABB Crane Systems Stefan Löfgren, Mälardalen
    Fredrik Linder [email protected] Thesis for the Degree of Bachelor of Science in Engineering - Computer Network Engineering 15.0 hp IDT (academy for Innovation, Design and Technology) Mälardalen University Västerås, Sweden 2015-11-15 Supervisors: Peter Helin, ABB crane systems Stefan Löfgren, Mälardalen University Examinator: Mats Björkman, Mälardalen University | Network monitoring of automated harbor terminals | | Linder | ABSTRACT Nowadays, more information and services are migrated into computers using a network as the carrier for data communication. One example of this is the migration from PROFIBUS to PROFINET. The big problem is that there are no network monitoring solution implemented, despite the significance of the network. The drawback of not having a network monitor solution are many. This includes areas within security, availability, control and troubleshooting. The goal is to find a solution to this problem with providing a complete network monitor solution in ABB’s existing environment. The work was divided in four different parts, research, design, implementation and verification. Information from the World Wide Web (WWW) pointed towards that CentOS (operating system) together with OMD (network monitor software) would be the best choice for this task. The design phase described how all devices should be able to communicate with each other. The implementation part mostly were configuration of devices, such as servers, monitor software, switches and security. The configuration were made by using the built in user interface in each device. The verification part proved that all functions worked as expected, such as security and notifications. Rest of the verification could be seen in the web interface. The result is a complete solution demonstrated in ABB crane system’s lab, which includes a complete set of network devices that exist on sites all around the world.
    [Show full text]
  • Ausreißer Check Mk
    05/2014 Check_mk als Nagios-Alternative Titelthema Ausreißer Check_mk 34 Check_mk hat zurzeit Rückenwind: Es gilt nicht mehr als schnödes Nagios-Plugin und seine Oberfläche Multi- site lässt die Konkurrenz alt aussehen. Doch wird Check_mk diesem Ruf in der Praxis gerecht? Holger Gantikow www.linux-magazin.de wachen will und wie er alle Ergebnisse in einem Rutsch zurück. dies möglichst optimal Das erfordert nur eine einzige Verbin- mittels Hostgruppen und dung (Abbildung 1, rechts). Templates abbildet, da- Auch Nagios wertet anders aus: Mit dem mit seine Konfiguration Nagios Remote Plugin Executor (NRPE) wartbar bleibt. Zugleich stupst es in der Regel ein Plugin auf dem wächst aber die Last auf Host an, das etwa die RAM-Auslastung dem Nagios-Server mit je- misst. Das Plugin erhält einige Schwel- der Erweiterung: Je mehr lenwerte und liefert einen Status zurück Dienste und Hosts dieser (»OK«, »Warning«, »Critical«). überwacht, desto mehr Check_mk bewertet die Ergebnisse hin- wird die Serverhardware gegen erst auf dem Server. Dies hat den zum Engpass. Die Anzahl positiven Nebeneffekt, dass die Agents der „aktiven Checks“ pro auf den zu überwachenden Hosts keine Prüfintervall bestimmt, weitere Konfiguration benötigen. Der wie leistungsfähig der Check_mk-Server filtert die von den Server sein muss (Abbil- Agenten gelieferten Daten mit Hilfe der © Ljupco Smokovski, 123RF Smokovski, © Ljupco dung 1, links). Konfigurationsparameter und Schwellen- werte, die der Admin festgelegt hat, und Wer den Zustand von mehr als einer Check_mk am Start schickt die Ergebnisse an Nagios weiter. Handvoll Systeme im Blick behalten möchte, muss auf handgestrickte Shell- Speziell gegen diese beiden Probleme, Testlauf skripte oder gelegentliche Kontrollen gegen die komplexe Konfiguration und verzichten und braucht ein vollwertiges die hohe Serverlast, möchte Check_mk Ob Check_mk hält, was es verspricht, Monitoring.
    [Show full text]
  • Best Practices in Monitoring
    Best Practices in Monitoring Lars Vogdt Team Lead SUSE DevOPS <[email protected]> About Lars Vogdt ● Co-developer of the SUSE School Server (2003) ● Team lead openSUSE Education since 2006 ● Team lead internal IT Services Team 2009 – 2016 ● Team lead DevOPS Team since Sep. 2016 (Main Target: Build Service) • Responsible for Product Generation, Build Service and Package Hub inside and outside SUSE ● Responsible for “monitoring packages” at SUSE 2 Control your infrastructure Optimize your IT resources ? How can you do that without knowing your requirements and your current resources ? Conclusion: Monitoring is a basic requirement before thinking about anything else... Agenda SUSE monitoring packages Tips and Tricks • Generic Tips • Examples High available and/or load balanced monitoring: one possible way to go Demos: • Icinga, PNP4Nagios, NagVis • automatic inventory via check_mk • Pacemaker / Corosync (SUSE Linux Enterprise High Availability) • (mod_)Gearman • Salt • … The future of monitoring @SUSE SUSE monitoring packages SUSE monitoring packages Official vs. unsupported Official supported server:monitoring SUSE Package Hub SUSE official repos https://download.opensuse.org/ https://packagehub.suse.com/ Nagios for <= SLES 11 Base repository for ALL New repository with checked monitoring packages packages, provided via SCC (special channel) nagios-plugins <= > 650 packages Contains packages from SLES 11 server:monitoring which saw additional reviews & testing Icinga 1 for >= SLES Newer packages, Stable, but without support. 12 via SUSE Manager including Add-Ons Rollback possible. - no support monitoring-plugins for Used heavily inside >= SLES 12 SUSE, but with no official support Tips and Tricks Monitoring? 1. Monitoring starts before a machine/service goes into production 2. Monitoring without history will not help to think about the future 3.
    [Show full text]