当我将新节点添加到 ElasticSearch 集群时,我遇到了未分配的 searchguard 分片的问题。集群位于公共云中,并使用node.awareness.attributes: availability_zone启用了感知设置。Searchguard 已启用默认启用的副本计数自动扩展。当我在一个区域中有三个节点并且在其他两个区域中有一个节点时,问题再次出现:
- eu-central-1a = 3 个节点
- eu-central-1b = 1 个节点
- eu-central-1c = 1 个节点
我明白这是集群配置有点不平衡,这只是生产问题的重播。我只是想了解一下elasticsearch和searchguard的逻辑。为什么会导致这样的问题。所以这是我的配置
{
"cluster_name" : "test-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 5,
"active_primary_shards" : 1032,
"active_shards" : 3096,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 99.96771068776235
}
指数
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open searchguard GuL6pHCUTUKbmygbIsLAYw 1 4 5 0 131.3kb 35.6kb
解释
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[searchguard][0], node[a59ptCI2SfifBWmnmRoqxA], [R], s[STARTED], a[id=d3rMAN8xQi2xrTD3y_SUPA]]"
},
{
"decider" : "awareness",
"decision" : "NO",
"explanation" : "there are too many copies of the shard allocated to nodes with attribute [aws_availability_zone], there are [5] total configured shard copies for this shard id and [3] total attribute values, expected the allocated shard count per attribute [3] to be less than or equal to the upper bound of the required number of shards per attribute [2]"
}
]
searchguard 配置
{
"searchguard" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"auto_expand_replicas" : "0-all",
"provided_name" : "searchguard",
"creation_date" : "1554095156112",
"number_of_replicas" : "4",
"uuid" : "GuL6pHCUTUKbmygbIsLAYw",
"version" : {
"created" : "6020499"
}
}
}
}
}
我的问题:
- searchguard 配置说
"number_of_replicas" : "4",
但分配器解释说there are [5] total configured shard copies
所以 5 这是主副本吗?就算这样... - 将所有这些分片(3)放到一个区域(eu-central-1a)有什么问题,即使区域崩溃了,我们在其他区域也会有两个副本,恢复还不够吗?
- elasticsearch 如何计算这些条件
required number of shards per attribute [2]
。考虑到这个限制,我最多只能为我的集群提高 2*zones_count (2*3 = 6)。这真的不多。看起来应该有办法克服这个限制。