Status Reporting Basics

Status Reporting Basics

Status reporting basics S. Teige February 2, 2016 1 Determination of the status of a service The status of a service1 monitored by the service monitor probe is reported in a form similar to the below: OK timestamp:1454357761 generated:Mon Feb 1 20:16:01 UTC 2016 uptime:516949.70 kernel:2.6.32-573.12.1.el6.x86_64 filesystem_use:/var at 42% Only the first two fields are manditory, everything else is optional and/or informative. This reporting file is generated on the machine being monitored. Any logic desired (that can be reduced to a status) can be implemented as described below. Derived from this reporting are the service history2 and availability metrics3 1.1 Detection The above information is accessed by the RSV infrastructure at the GOC either via a wget command (for http capable services) or via a shared file system (for other services). The status reported is given by the first (required) line of the status report. It may have one of three values: OK, WARNING or CRITICAL. The logic used to determine this value is entirely implemented on the machine being monitored. The scripts are: /net/nas01/Public/status/<machine name>/status_stamp.sh These are run on the monitored machine via an entry in /etc/cron.d, typically every fifteen minutes. Implementation of any desired diagnostic test is done via modification of these scripts. They should be publically available to edit as you see fit. The first two lines must be present and the remaining content is unconstrained. An example script is the bare-bones script currently monitoring perfsonar1: 1for example, the service status: http://tinyurl.com/gqanunl 2history: http://tinyurl.com/j59e5lc 3availability: http://tinyurl.com/jb28a7s 1 overall_status=0 ## system stuff kern=‘uname -r‘ utime=‘ cat /proc/uptime | awk ’{ print $1 }’‘ now=‘date +%s‘ dire=‘hostname | awk -F. ’{print $1}’‘ max_frac=‘df -h | grep -v nas01 | awk ’{print $5}’ | sed s/%//g | grep -v Use | sort -n max_sys=‘df -h | grep $max_frac% | awk ’{print $6" at "$5}’‘ if [ $max_frac -gt 90 ]; then overall_status=1; fi if [ $max_frac -gt 95 ]; then overall_status=2; fi if [ $overall_status -eq 0 ]; then echo "OK" > /net/nas01/Public/status/$dire/stamp fi if [ $overall_status -eq 1 ]; then echo "WARNING" > /net/nas01/Public/status/$dire/stamp fi if [ $overall_status -eq 2 ]; then echo "CRITICAL" > /net/nas01/Public/status/$dire/stamp fi echo "timestamp:$now" >> /net/nas01/Public/status/$dire/stamp now=‘date‘ echo "generated:$now" >>/net/nas01/Public/status/$dire/stamp echo "uptime:$utime" >>/net/nas01/Public/status/$dire/stamp echo "kernel:$kern" >>/net/nas01/Public/status/$dire/stamp echo "filesystem_use:$max_sys" >>/net/nas01/Public/status/$dire/stamp 1.2 Reporting The RSV infrastructure at the GOC obtains the reported status file from the monitored machine via wget or from the shared file system. If it cannot obtain this file the reported status is UNKNOWN. If the file is obtained successfully, the status reported is determined from the first and second fields of the status report. The timestamp line is checked to assure the status report was generated recently where recently is configurable. Twenty minutes to generate a WARNING status and thirty minutes to generate a CRITICAL status are typical values. 2 If the timestamp is current, the status reported is given by the first field of the status file. That is, RSV becomes a simple pass through of the status diagnosed remotely. 3.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    3 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us