sql - 如何缩减 CrateDB 集群？

Question

为了测试，我想将我的 3 节点集群缩小到 2 个节点，以便稍后为我的 5 节点集群做同样的事情。

但是，在遵循缩小集群的最佳实践之后：

备份所有表

对于所有表：alter table xyz set (number_of_replicas=2)如果之前小于 2

SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
3个。如果数据检查应始终为绿色，请将 min_availability 设置为“完整”： https ://crate.io/docs/reference/configuration.html#graceful-stop

在一个节点上启动优雅停止

等待数据检查变为绿色

从 3 开始重复。

完成后，将节点配置保存在crate.yml： gateway.recover_after_nodes: n discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1 gateway.expected_nodes: n

我的集群再也没有回到“绿色”状态，而且我的关键节点检查也失败了。

这里出了什么问题？

板条箱.yml：

  ... 
  ################################## Discovery ##################################

  # Discovery infrastructure ensures nodes can be found within a cluster
  # and master node is elected. Multicast discovery is the default.

  # Set to ensure a node sees M other master eligible nodes to be considered
  # operational within the cluster. Its recommended to set it to a higher value
  # than 1 when running more than 2 nodes in the cluster.
  #
  # We highly recommend to set the minimum master nodes as follows:
  #   minimum_master_nodes: (N / 2) + 1 where N is the cluster size
  # That will ensure a full recovery of the cluster state.
  #
  discovery.zen.minimum_master_nodes: 2

  # Set the time to wait for ping responses from other nodes when discovering.
  # Set this option to a higher value on a slow or congested network
  # to minimize discovery failures:
  #
  # discovery.zen.ping.timeout: 3s
  #

  # Time a node is waiting for responses from other nodes to a published
  # cluster state.
  #
  # discovery.zen.publish_timeout: 30s

  # Unicast discovery allows to explicitly control which nodes will be used
  # to discover the cluster. It can be used when multicast is not present,
  # or to restrict the cluster communication-wise.
  # For example, Amazon Web Services doesn't support multicast discovery.
  # Therefore, you need to specify the instances you want to connect to a
  # cluster as described in the following steps:
  #
  # 1. Disable multicast discovery (enabled by default):
  #
  discovery.zen.ping.multicast.enabled: false
  #
  # 2. Configure an initial list of master nodes in the cluster
  #    to perform discovery when new nodes (master or data) are started:
  #
  # If you want to debug the discovery process, you can set a logger in
  # 'config/logging.yml' to help you doing so.
  #
  ################################### Gateway ###################################

  # The gateway persists cluster meta data on disk every time the meta data
  # changes. This data is stored persistently across full cluster restarts
  # and recovered after nodes are started again.

  # Defines the number of nodes that need to be started before any cluster
  # state recovery will start.
  #
  gateway.recover_after_nodes: 3

  # Defines the time to wait before starting the recovery once the number
  # of nodes defined in gateway.recover_after_nodes are started.
  #
  #gateway.recover_after_time: 5m

  # Defines how many nodes should be waited for until the cluster state is
  # recovered immediately. The value should be equal to the number of nodes
  # in the cluster.
  #
  gateway.expected_nodes: 3

score 1 · Accepted Answer

所以有两点很重要：

副本数本质上是您在典型设置中可以松散的节点数（建议使用 2 个，以便您可以缩小并在此过程中松散一个节点并且仍然可以）
建议将该程序用于> 2 个节点的集群；）

CrateDB 将自动在集群中分布分片，没有副本和主节点共享一个节点。如果这是不可能的（如果您有 2 个节点和 1 个具有 2 个副本的主节点，则数据检查将永远不会返回“绿色”。因此，在您的情况下，将副本数设置为 1 以获得簇回到绿色（alter table mytable set (number_of_replicas = 1)）。

关键节点检查是由于集群尚未收到更新的 crate.yml：您的文件中还包含 3 节点集群的配置，因此出现了消息。由于 CrateDB 仅在启动时加载 expected_nodes（它不是运行时设置），因此需要重新启动整个集群才能完成缩减。可以通过滚动重启来完成，但一定要设置SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;好，否则共识不起作用……

此外，建议逐个缩减，以避免重新平衡导致集群过载和意外丢失数据。

sql - 如何缩减 CrateDB 集群？

1 回答 1

Related

Reference