0

我已经配置了监视器来检查我的 IRCd 及其服务是否正在运行。最近,运行所有这些的实例重新启动,但它没有完成它的工作。

它被配置为在启动时启动。

[root@ip-172-31-21-162 ec2-user]# chkconfig --list monit
monit           0:off   1:off   2:on    3:on    4:on    5:on    6:off

控制文件

[root@ip-172-31-21-162 ec2-user]# cat /etc/monit.conf
set httpd port 2812
  allow 127.0.0.1
set daemon 60
  include /etc/monit.d/*

check process ircd with pidfile /home/ec2-user/inspircd/run/pid
  start program = "/usr/bin/perl /home/ec2-user/inspircd/run/inspircd start" 
    as uid "ec2-user" and gid "ec2-user"
    with timeout 30 seconds 

check process services with pidfile /home/ec2-user/anope/run/data/services.pid
  depends on ircd
  start program = "/bin/sh /home/ec2-user/anope/run/bin/anoperc start"
    as uid "ec2-user" and gid "ec2-user"
    with timeout 30 seconds

根据文档,它的语法看起来不错......

<START | STOP | RESTART> [PROGRAM] = "program"
    [[AS] UID <number | string>]
    [[AS] GID <number | string>]
    [[WITH] TIMEOUT <number> SECOND(S)]

并对其进行检查说同样的

[ec2-user@ip-172-31-29-142 ~]$ sudo monit -t 
Control file syntax OK

但是,日志显示没有为这些受监控的进程定义启动方法!

[UTC May 14 04:39:51] error    : 'ircd' process is not running
[UTC May 14 04:39:51] error    : monit: Start or stop method not defined -- process ircd
[UTC May 14 04:39:51] error    : 'services' process is not running
[UTC May 14 04:39:51] error    : monit: Start or stop method not defined -- process services

出于某种原因,通过 monit 手动启动进程有效

[root@ip-172-31-21-162 ec2-user]# monit start ircd
[root@ip-172-31-21-162 ec2-user]# monit status
The Monit daemon 5.2.5 uptime: 7h 14m 

Process 'ircd'
  status                            running
  monitoring status                 monitored
  pid                               26483
  parent pid                        1
  uptime                            3m 
...
  data collected                    Sat May 14 02:49:57 2016

Process 'services'
  status                            running
  monitoring status                 monitored
  pid                               26488
  parent pid                        1
  uptime                            3m 
...
  data collected                    Sat May 14 02:49:57 2016

这很奇怪。当我停止这些检查的进程并在启用调试日志记录的情况下重新启动监视器时,我看到它报告了启动程序。

Process Name          = ircd
 Pid file             = /home/ec2-user/inspircd/run/pid
 Monitoring mode      = active
 Start program        = '/home/ec2-user/inspircd/run/inspircd start' as uid 500 as gid 500 timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert

 Process Name          = services
 Pid file             = /home/ec2-user/anope/run/data/services.pid
 Monitoring mode      = active
 Start program        = '/home/ec2-user/anope/run/bin/anoperc start' as uid 500 as gid 500 timeout 30 second(s)
 Existence            = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert
 Depends on Service   = ircd
 Pid                  = if changed 1 times within 1 cycle(s) then alert
 Ppid                 = if changed 1 times within 1 cycle(s) then alert

知道 Glob 的名字在这里发生了什么吗?

4

1 回答 1

0

根据 monit 的记录行为,还必须定义一个停止方法,以便正确启动非运行进程

在活动模式(默认)下,Monit 将主动监控服务,并在出现问题时发出警报和/或重新启动服务。

--监控文档;服务方式

当进程未运行时,Monit 执行的操作始终是“重新启动”,但由于没有独立的“重新启动程序”(直到 Monit 5.7),所以使用了停止+启动顺序。

--监控问题;进程未运行时重新启动而不是启动

因此,解决方案是并将该stop program行添加到控制文件中的已检查进程中。显然,如果您运行的版本 >=5.7,您也可以使用restart program

于 2016-05-14T05:36:13.630 回答