0

我们正在使用 2 节点 cratedb 集群(v2.3.4)。它运行了一个多月,没有任何问题。最近我们了解到,一个节点在没有任何外部干扰的情况下消失了。我们无法找到此事件的根本原因。

以下是日志。请帮忙。

Apr 12 23:47:04 STATS-DB-M crate[162556]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
Apr 12 23:47:04 STATS-DB-M crate[162556]: at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Apr 12 23:47:04 STATS-DB-M crate[162556]: [2018-04-12T23:47:04,027][WARN ][o.e.c.a.s.ShardStateAction] [crate3] [online_dlr_report_cache_20180412][7] received shard failed for shard id [[online_dlr_report_cache_20180412][7]], allocation id [NahsM0yfRPaHA5waOpu5OA], primary term [2], message [mark copy as stale]
Apr 12 23:47:04 STATS-DB-M crate[162556]: [2018-04-12T23:47:04,027][WARN ][o.e.c.a.s.ShardStateAction] [crate3] [online_dlr_report_cache_20180412][1] received shard failed for shard id [[online_dlr_report_cache_20180412][1]], allocation id [haMsWkQGTe-yTIfGSkLbHw], primary term [2], message [mark copy as stale]
Apr 12 23:47:04 STATS-DB-M crate[162556]: [2018-04-12T23:47:04,026][WARN ][o.e.c.a.s.ShardStateAction] [crate3] [online_dlr_report_cache_20180412][1] received shard failed for shard id [[online_dlr_report_cache_20180412][1]], allocation id [ZfHGc1DiTZmJ2JQ3YoA_Yg], primary term [1], message [failed to perform indices:crate/data/write/upsert on replica [online_dlr_report_cache_20180412][1], node[1RRQy42EQ8meT7S40loaEw], [R], s[STARTED], a[id=ZfHGc1DiTZmJ2JQ3YoA_Yg]], failure [RemoteTransportException[[crate3][192.168.1.50:4300][indices:crate/data/write/upsert[r]]]; nested: IllegalStateException[active primary shard cannot be a replication target before  relocation hand off [online_dlr_report_cache_20180412][1], node[1RRQy42EQ8meT7S40loaEw], [P], s[STARTED], a[id=ZfHGc1DiTZmJ2JQ3YoA_Yg], state is [STARTED]]; ]
Apr 12 23:47:04 STATS-DB-M crate[162556]: org.elasticsearch.transport.RemoteTransportException: [crate3][192.168.1.50:4300][indices:crate/data/write/upsert[r]]
Apr 12 23:47:04 STATS-DB-M crate[162556]: Caused by: java.lang.IllegalStateException: active primary shard cannot be a replication target before  relocation hand off [online_dlr_report_cache_20180412][1], node[1RRQy42EQ8meT7S40loaEw], [P], s[STARTED], a[id=ZfHGc1DiTZmJ2JQ3YoA_Yg], state is [STARTED]
Apr 12 23:47:10 STATS-DB-M systemd[1]: crate.service: main process exited, code=exited, status=126/n/a
Apr 12 23:47:10 STATS-DB-M systemd[1]: Unit crate.service entered failed state.
Apr 12 23:47:10 STATS-DB-M systemd[1]: crate.service failed.
4

1 回答 1

0

日志没有提示节点为何关闭。你有其他信息吗?通常,我们建议使用最少 3 个节点的集群,以便在节点出现故障时能够拥有仲裁。如果您有更多信息,请告诉我们。谢谢,乔

于 2018-04-25T07:27:00.730 回答