2

为了测试,我想将我的 3 节点集群缩小到 2 个节点,以便稍后为我的 5 节点集群做同样的事情。

但是,在遵循缩小集群的最佳实践之后:

  1. 备份所有表
  2. 对于所有表:alter table xyz set (number_of_replicas=2)如果之前小于 2
  3. SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
    3个。如果数据检查应始终为绿色,请将 min_availability 设置为“完整”: https ://crate.io/docs/reference/configuration.html#graceful-stop
  4. 在一个节点上启动优雅停止
  5. 等待数据检查变为绿色
  6. 从 3 开始重复。
  7. 完成后,将节点配置保存在crate.yml gateway.recover_after_nodes: n discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1 gateway.expected_nodes: n

我的集群再也没有回到“绿色”状态,而且我的关键节点检查也失败了。

这里出了什么问题?

板条箱.yml:

  ... 
  ################################## Discovery ##################################

  # Discovery infrastructure ensures nodes can be found within a cluster
  # and master node is elected. Multicast discovery is the default.

  # Set to ensure a node sees M other master eligible nodes to be considered
  # operational within the cluster. Its recommended to set it to a higher value
  # than 1 when running more than 2 nodes in the cluster.
  #
  # We highly recommend to set the minimum master nodes as follows:
  #   minimum_master_nodes: (N / 2) + 1 where N is the cluster size
  # That will ensure a full recovery of the cluster state.
  #
  discovery.zen.minimum_master_nodes: 2

  # Set the time to wait for ping responses from other nodes when discovering.
  # Set this option to a higher value on a slow or congested network
  # to minimize discovery failures:
  #
  # discovery.zen.ping.timeout: 3s
  #

  # Time a node is waiting for responses from other nodes to a published
  # cluster state.
  #
  # discovery.zen.publish_timeout: 30s

  # Unicast discovery allows to explicitly control which nodes will be used
  # to discover the cluster. It can be used when multicast is not present,
  # or to restrict the cluster communication-wise.
  # For example, Amazon Web Services doesn't support multicast discovery.
  # Therefore, you need to specify the instances you want to connect to a
  # cluster as described in the following steps:
  #
  # 1. Disable multicast discovery (enabled by default):
  #
  discovery.zen.ping.multicast.enabled: false
  #
  # 2. Configure an initial list of master nodes in the cluster
  #    to perform discovery when new nodes (master or data) are started:
  #
  # If you want to debug the discovery process, you can set a logger in
  # 'config/logging.yml' to help you doing so.
  #
  ################################### Gateway ###################################

  # The gateway persists cluster meta data on disk every time the meta data
  # changes. This data is stored persistently across full cluster restarts
  # and recovered after nodes are started again.

  # Defines the number of nodes that need to be started before any cluster
  # state recovery will start.
  #
  gateway.recover_after_nodes: 3

  # Defines the time to wait before starting the recovery once the number
  # of nodes defined in gateway.recover_after_nodes are started.
  #
  #gateway.recover_after_time: 5m

  # Defines how many nodes should be waited for until the cluster state is
  # recovered immediately. The value should be equal to the number of nodes
  # in the cluster.
  #
  gateway.expected_nodes: 3
4

1 回答 1

1

所以有两点很重要:

  • 副本数本质上是您在典型设置中可以松散的节点数(建议使用 2 个,以便您可以缩小并在此过程中松散一个节点并且仍然可以)
  • 建议将该程序用于> 2 个节点的集群;)

CrateDB 将自动在集群中分布分片,没有副本和主节点共享一个节点。如果这是不可能的(如果您有 2 个节点和 1 个具有 2 个副本的主节点,则数据检查将永远不会返回“绿色”。因此,在您的情况下,将副本数设置为 1 以获得簇回到绿色(alter table mytable set (number_of_replicas = 1))。

关键节点检查是由于集群尚未收到更新的 crate.yml:您的文件中还包含 3 节点集群的配置,因此出现了消息。由于 CrateDB 仅在启动时加载 expected_nodes(它不是​​运行时设置),因此需要重新启动整个集群才能完成缩减。可以通过滚动重启来完成,但一定要设置SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;好,否则共识不起作用……

此外,建议逐个缩减,以避免重新平衡导致集群过载和意外丢失数据。

于 2016-12-08T15:25:53.737 回答