我们已经使用 Sphinx (2.0.4) 大约 4 年了,但最近(比如一个月前),搜索开始每小时停机一次。
我们在夜间对所有数据进行了大更新,这不会造成任何问题。然后,我们在每小时开始时运行一次索引,这通常会向索引添加 20 到 30 行。
因此,cronjob 每小时运行一次:
#!/bin/bash
date1=`date`
echo "$date1 : Starting indexation..." >> /cyber/indexer-new.log
/usr/local/bin/indexer -c /etc/sphinx.conf newCompanies --noprogress --rotate
date1=`date`
echo "$date1 : Indexation ended" >> /cyber/indexer-new.log
date1=`date`
echo "$date1 : Restart searchd" >> /cyber/indexer-new.log
/usr/local/bin/searchd -c /etc/sphinx.conf --stopwait
exitCode=`echo $?`
echo "Exit code (--stopwait) : $exitCode" >> /cyber/indexer-new.log
/usr/local/bin/searchd -c /etc/sphinx.conf
exitCode=`echo $?`
echo "Exit code (restart sphinx) : $exitCode" >> /cyber/indexer-new.log
date1=`date`
echo "$date1 : searchd restarted.." >> /cyber/indexer-new.log
exit 0
结果曾经是这样的(searchd.log),但有 70000 行 binlog 重播(1560 个 binlog 文件)似乎是空的:
[Wed May 18 10:01:14.541 2016] [11983] rotating indices (seamless=1)
[Wed May 18 10:01:14.545 2016] [11983] caught SIGTERM, shutting down
[Wed May 18 10:01:16.953 2016] [11983] shutdown complete
[Wed May 18 10:01:16.954 2016] [11982] Child process 11983 has been finished, exit code 0. Watchdog finishes also. Good bye!
[Wed May 18 10:01:16.957 2016] [13505] Child process 13506 has been forked
[Wed May 18 10:01:16.958 2016] [13506] listening on all interfaces, port=9312
[Wed May 18 10:01:16.958 2016] [13506] listening on all interfaces, port=9306
[Wed May 18 10:01:36.131 2016] [13506] rotating index 'newCompanies': success
[Wed May 18 10:01:36.133 2016] [13506] binlog: replaying log /usr/local/var/data/binlog.001
[Wed May 18 10:01:36.133 2016] [13506] binlog: replay stats: 0 rows in 0 commits; 0 updates; 0 indexes
但它现在看起来像这样
[Sun Jul 31 10:00:57.285 2016] [28792] rotating indices (seamless=1)
[Sun Jul 31 10:00:57.292 2016] [28792] caught SIGTERM, shutting down
[Sun Jul 31 10:00:57.294 2016] [28792] rotating index 'newCompanies': started
[Sun Jul 31 10:00:57.404 2016] [28792] rotating index 'newCompanies': success
[Sun Jul 31 10:00:57.404 2016] [28792] rotating index: all indexes done
[Sun Jul 31 10:01:00.166 2016] [28792] shutdown complete
[Sun Jul 31 10:01:00.167 2016] [28791] Child process 28792 has been finished, exit code 0. Watchdog finishes also. Good bye!
[Sun Jul 31 10:01:00.175 2016] [29781] Child process 29782 has been forked
[Sun Jul 31 10:01:00.175 2016] [29782] listening on all interfaces, port=9312
[Sun Jul 31 10:01:00.175 2016] [29782] listening on all interfaces, port=9306
[Sun Jul 31 10:03:01.946 2016] [29782] binlog: replaying log /usr/local/var/data/binlog.001
[Sun Jul 31 10:03:01.953 2016] [29782] binlog: replay stats: 0 rows in 0 commits; 0 updates; 0 indexes
[Sun Jul 31 10:03:01.953 2016] [29782] binlog: finished replaying /usr/local/var/data/binlog.001; 0.0 MB in 0.000 sec
注意 2 分钟的间隔。
cronjob 日志如下所示:
Mon Aug 1 11:00:01 EDT 2016 : Starting indexation...
Mon Aug 1 11:01:17 EDT 2016 : Indexation ended
Mon Aug 1 11:01:17 EDT 2016 : Restart searchd
Exit code (--stopwait) : 0
Exit code (restart sphinx) : 0
Mon Aug 1 11:04:40 EDT 2016 : searchd restarted..
似乎 searchd 需要 2 到 3 分钟才能重新启动,与一个月前不同(并且没有任何改变)。退出代码 0 表示在 stopwait 或 searchd 启动过程中出错。
我似乎找不到导致创建这些二进制日志的原因,也不知道该去哪里找了。知道问题可能是什么吗?我没有查看哪些日志以了解有关该问题的更多信息?
谢谢