在 Debian Wheezy 更新的服务器上,我正在使用以下软件包的反向端口:
- Nagios : nagios3 (3.4.1-5~bpo7+1)
- 穆宁:穆宁(2.0.25-1~bpo70+1)
和 nsca (2.9.1-2) 将数据从 Munin 传输到 Nagios 以处理警报。
Nagios 与以下配置的 Munin 服务一起工作正常:
# generic service template definition
define service{
name generic-munin-service ; The 'name' of this service template
use generic-service
check_command return-unknown!"No Data from passive check"
active_checks_enabled 0 ; Active service checks are disabled
passive_checks_enabled 1
parallelize_check 1
notifications_enabled 1
event_handler_enabled 1
is_volatile 1
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_freshness 1
freshness_threshold 360
flap_detection_options n
max_check_attempts 2
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
;first_notification_delay 6 ; Delay first notification for false positives (will execute 2 checks : munin sends 1 check every 5 minutes)
}
define service {
hostgroup_name munin
service_description Disk latency per device :: Average latency for /dev/sda
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
service_description Disk latency per device :: Average latency for /dev/sdb
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name munin
service_description Disk usage in percent
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name munin
service_description Inode usage in percent
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name munin
service_description File table usage
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
但是,当我添加在所有受监控主机上也可用的其他服务时,它们将在 Nagios中标记为UNKNOWN :
define service {
hostgroup_name munin
service_description Memory usage
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name munin
service_description CPU usage
use generic-munin-service
notification_interval 0 ; set > 0 if you want to be renotified
}
我已经发现,根据 munin 插件图形标题格式,Nagios 可能无法理解传入的数据,这就是为什么我将服务器上的包更新为 Wheezy 的 backports 版本,因为 Munin 2.0.7 应该清除所有标题。
我还尝试使用更高的调试级别进行调试,日志显示:
[1434122043] SERVICE ALERT: HostIJZI4;Memory usage;UNKNOWN;HARD;2;INCONNU
但我可能需要你的帮助才能走得更远。