Wednesday, July 21, 2010

Munin and Alerting: Method 3


Integration With Nagios: Via a Nagios Plugin

If you don't want to use passive checks. You can use check_munin_rrd plugin.

Basically Munin-node data get stored on the munin server as usual and Nagios is reading those data to check the status of the node.

$ /usr/lib/nagios/plugins/check_munin_rrd.pl --help

Monitor server via Munin-node pulled data
Usage: /usr/lib/nagios/plugins/check_munin_rrd.pl  -H -M
[-D ] -w -c [-V]
-h, --help
      print this help message
-H, --hostname=HOST
      name or IP address of host to check
-M, --module=MUNIN MODULE
      Munin module value to fetch
-D, --domain=DOMAIN
      Domain as defined in munin
-w, --warn=INTEGER
      warning level
-c, --crit=INTEGER
      critical level
-v      --verbose
      Be verbose
-V, --version
      prints version number
check_munin_rrd.pl (nagios-plugins 1.4.2) 0.9
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.

Previous implementation was using a check from Nagios directly onto Munin-node which is overkill since the Munin server gets the data already via cron.

You need to define a

  • new command :
define command{
     command_name check_munin
     command_line /usr/lib/nagios/plugins/check_munin_rrd.pl -H $HOSTALIAS$ -M $ARG1$ -w $ARG2$ -c $ARG3$
     }
  • new service template :
# generic service template definition check via munin
define service{
       name                            generic-munin-service ; The 'name' of this service template
       active_checks_enabled           1       ; Active service checks are enabled
       passive_checks_enabled          0       ; Passive service checks are enabled/accepted
       parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
       obsess_over_service             1       ; We should obsess over this service (if necessary)
       check_freshness                 0       ; Default is to NOT check service 'freshness'
       notifications_enabled           1       ; Service notifications are enabled
       event_handler_enabled           1       ; Service event handler is enabled
       flap_detection_enabled          1       ; Flap detection is enabled 
       failure_prediction_enabled      1       ; Failure prediction is enabled
       process_perf_data               1       ; Process performance data
       retain_status_information       1       ; Retain status information across program restarts
       retain_nonstatus_information    1       ; Retain non-status information across program restarts
       notification_interval           0       ; Only send notifications on status change by default.
       is_volatile                     0
       check_period                    24x7
       normal_check_interval           5             ; This directive is used to define the number of "time units" to wait before scheduling the next "regular" check of the service.
       retry_check_interval            3       ; This directive is used to define the number of "time units" to wait before scheduling a re-check of the service.
       max_check_attempts              2             ; This directive is used to define the number of times that Nagios will retry the service check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the service check again.
       notification_period             24x7
       notification_options            w,u,c,r
       contact_groups                  admins
       register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
       }

Don't use smaller value for normal_check_interval, munin updates data every 5 minutes.

  • new service example :
# check the disk usage via munin
define service{
       hostgroup_name                  web-servers
       service_description             disk-usage
       check_command                   check_munin_rrd!df!75!90
       use                             generic-munin-service
     }


No comments:

Post a Comment