
WMLUG July 2015 Nagios, PNP4Nagios, and NConf by Patrick TenHoopen What is Nagios? Nagios is an IT infrastructure monitoring and alerting tool. The free Nagios DIY Core provides the central monitoring engine and the basic web interface. Current Version: 4.08 (2014-08-12) Download: https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.0.8.tar.gz Nagios Demo Demo Installation Prerequisites ● gcc ● apache2 ● perl ● php ● rrdtool ● php5-gd ● php5-zlib ● php5-socket Installation Follow Quick-Start Guides https://assets.nagios.com/downloads/nagioscore/d ocs/nagioscore/4/en/quickstart.html After install, don't forget to configure the firewall on the Nagios server to allow http access if one is running. Installation, cont. tar xf nagios-4.0.8.tar.gz cd nagios-4.0.8 ./configure --with-command-group=nagcmd make all make install make install-init make install-config make install-commandmode make install-webconf htpasswd2 -c /usr/local/nagios/etc/htpasswd.users nagiosadmin Nagios Plugins Download: http://nagios-plugins.org/download/nagios-plugins-2.0.3.tar.gz tar xf nagios-plugins-2.0.3.tar.gz cd nagios-plugins-2.0.3 ./configure --with-nagios-user=nagios --with-nagios-group=nagios make make install Configuration Nagios comes with a default configuration for monitoring the localhost that Nagios is installed on (localhost.cfg) plus some other examples. The configuration files are stored at /usr/local/nagios/etc/objects/ and are plain text files formatted in a proprietary format. Detailed description of configuration files and options: https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4 /en/objectdefinitions.html Default Configuration Files ● commands.cfg – Check commands that are used in service definitions ● contacts.cfg - Who to contact if an alert is generated ● hosts.cfg - Hosts to monitor ● localhost.cfg - Basic config for Nagios host ● printer.cfg – Sample config for printers ● services.cfg - Things on hosts to monitor ● switch.cfg - Sample config for switches ● templates.cfg - Definition templates used by hosts, services, etc. ● timeperiods.cfg – Notification times/hours of alerting ● windows.cfg - Sample config for a Windows machine Configuration File Organization You don't need to separate the definitions into separate files, and you can have just one large configuration file. The cfg_file line(s) in the /usr/local/nagios/etc/nagios.cfg file controls what files are used. Note: If you want to import existing Nagios conf files into NConf (discussed later), it will work better if they are separated out by function/type. Templates Templates are used by configuration definitions to provide default values for settings. It keeps the actual definition smaller and easy to update. If you modify a template, all definitions that use it get updated. Generic Linux Host Template # Linux host definition template - This is NOT a real host, just a template! define host{ name linux-server ; The name of this host template use generic-host ; Inherits other values from generic-host template check_period 24x7 ; By default, Linux hosts are checked round the clock check_interval 5 ; Actively check the host every 5 minutes retry_interval 1 ; Schedule host check retries at 1 minute intervals max_check_attempts 10 ; Check each Linux host 10 times (max) check_command check-host-alive ; Default command to check Linux hosts notification_period workhours ; Only notify during the day ; Note that the notification_period variable is being ; overridden from the value that is inherited from the ; generic-host template! notification_interval 120 ; Resend notifications every 2 hours notification_options d,u,r ; Only send notifications for specific host states contact_groups admins ; Notifications get sent to the admins by default register 0 ; DONT REGISTER THIS DEFINITION } Generic Service Template # Generic service definition template - This is NOT a real service, just a template! define service{ name generic-service ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized ; (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts is_volatile 0 ; The service is not volatile check_period 24x7 ; The service can be checked at any time of the day max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state normal_check_interval 10 ; Check the service every 10 minutes under normal conditions retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined contact_groups admins ; Notifications get sent out to everyone in the 'admins' group notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events notification_interval 60 ; Re-notify about service problems every hour notification_period 24x7 ; Notifications can be sent out at any time register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! } Commands Nagios comes with several commands for checking services and more are installed with the Nagios plugins. They are located in the /usr/local/nagios/lib/ directory. Some examples: check_disk, check_http, check_log, check_nt, check_ping Community Check Commands You can download command definitions created by the Nagios community by perusing the plugin exchange at: http://exchange.nagios.org/directory/Plugins Custom Commands You can also create custom check commands using scripts or custom programs. The script/program just needs to return one of the exit statuses that Nagios expects: UNKNOWN = 3, CRITICAL = 2, WARNING = 1, OK = 0 Example Check Definition The $USER1$, $HOSTADDRESS$, $ARG1$, and $ARG2$ are Nagios macros. They are substituted for the values passed into the check when it is called from the service definition. -W is warning threshold. -C is critical threshold. # 'check_ping' command definition define command{ command_name check_ping command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5 } Example Host Definition define host{ use linux-server ; Name of host templates to use ; This host definition will ; inherit all variables that are ; defined in (or inherited by) ; the linux-server host template ; definition. host_name localhost alias localhost address 127.0.0.1 } Example Service Note that the command parameters are delimited by an "!". The parameters are used in the check definition ($ARG1$, $ARG2$, etc). # Define a service to "ping" the local machine define service{ use local-service ; Name of service ; template to use host_name localhost service_description PING check_command check_ping!100.0,20%!500.0,60% } Host Groups By using host groups, you can easily set up checks for a set of hosts with one service definition. You can create a new config file named hostgroups.cfg. define hostgroup{ hostgroup_name linux-servers ; Name of the hostgroup alias Linux Servers ; Long name of the group members localhost,linuxbox1,linuxbox2 ; Comma separated list of ; hosts that belong to this group } define service{ use local-service ; Name of service template to use hostgroup_name linux-servers service_description PING-LINUX-HOSTS check_command check_ping!100.0,20%!500.0,60% } Parent/Child Relationships By defining what other hosts a host depends on, Nagios can distinguish between down and unreachable states for the host. For example if Nagios is monitoring a host connected to another switch and the switch is down, preventing Nagios from pinging it, Nagios only alerts that the switch is offline and doesn't alert that the other host is down too. Parents Setting When defining a host, use the "parents" setting to establish the parent/child relationship. define host{ host_name Nagios ; Nagios host has no parent } define host{ host_name Switch1 parents Nagios } define host{ host_name OtherHost parents Switch1 } Parent/Child Relationship Picture Pictorial representation: https://assets.nagios.com/downloads/nagioscore/d ocs/nagioscore/4/en/networkreachability.html NSClient++ With the NSClient++ add-on, you can easily set up checks on Windows servers. http://exchange.nagios.org/directory/Addons/Mon itoring-Agents/NSClient%2B%2B/details NRPE The NRPE (Nagios Remote Plugin Executor) add- on runs checks on a remote Linux host. It also acts as an NRPE listener on the Windows server. http://exchange.nagios.org/directory/Addons/Mon itoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Ex ecutor/details File Count Example Using NSClient++ and a community-created command (Check Filecount), you can monitor the number of files in a directory on a Windows computer. File Count Example – Service Definition From services.cfg: # Service definition define service { use generic-service host_name WINSERVER service_description Temp File Count check_command check_temp_files } File Count Example – Command Definition From commands.cfg: # 'check_temp_files' command definition define command{ command_name check_temp_files command_line
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages54 Page
-
File Size-