cassandra - Datastax Enterprise 5.0 集群重新平衡尝试失败

Question

我们有一个由 4 台机器组成的 DSE 5.0 集群。在数据摄取期间，其中一台机器存储了大部分数据（100G），而其他三台存储的数据少得多（每台大约 15G）。我不知道为什么会发生这种情况，并计划进行调查，并可能在一个单独的问题中提出。

现在我尝试重新平衡集群。我知道这样做的唯一方法是在 OpsCenter中单击Cluster Actions-> 。Rebalance重新平衡开始并在大约 5 分钟后可重现地中止，并出现以下错误：

Rebalance Failed: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
java.net.SocketTimeoutException: Read timed out

一些数据按照重新平衡预览中的建议传输，大部分不是。

事件日志：

Error   Rebalance failed: java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: java.net.SocketTimeoutException: Read timed out       admin
Info    Moving node xx.xx.xx.xx from token 5848419665553670365 to 2542108353485192999   NODE-04 
Info    Starting rebalance

可能是什么原因，我该如何调查和解决它？

群集部署在 Azure 上的 4 台专用机器上。

score 0 · Accepted Answer

您不必在数据加载后重新平衡集群。您可能想要更深入地挖掘您的数据模型，并确保您的分区键能够将数据均匀地分布在环上。在这种情况下，我怀疑热点。

cassandra - Datastax Enterprise 5.0 集群重新平衡尝试失败

1 回答 1

Related

Reference