cassandra - cassandra 多数据中心数据同步滞后

Question

我已经配置了一个跨越两个数据中心（AWS、us-east 和 us-west）的 Cassandra 集群。写入仅发生在 us-east 环，我可以看到数据同步到另一个环。但是，滞后性很高。

On DC1
cqlsh:ks> select count(*) from cf1 limit 1000000;

 count
--------
 225568

On DC2
cqlsh:ks> select count(*) from cf1 limit 1000000;

 count
--------
 139964

--

为什么会这样，这取决于什么？
有没有办法使用任何工具查看滞后？是否可以在 OpsCenter 中查看？

score 2 · Accepted Answer

由于您的两个 DC 位于不同的 AWS 区域，您可能会发现两者之间存在一些滞后。这确实取决于跨 DC 同步的数据量。如果您有大型列族和/或高级别写入，那么这只会意味着要同步更多数据。使用LOCAL_QUORUM是将写入保存在本地 DC 中的正确选择。如果需要，您可以使用较低的一致性级别，一般来说，如果数据一致性很重要，则经验法则总是以高于读取的一致性级别写入。

除了通常的操作系统级工具外，Cassandra 确实具有该nodetool实用程序。对于监控，您可以使用以下nodetool命令：

nodetool netstats- （显示节点是否为流数据）http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNetstats.html

nodetool cfstats-（显示对延迟等有用的列族统计信息）http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFstats.html

nodetool proxyhistograms- （显示来自协调节点的统计数据）http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsProxyHistograms.html

还有许多其他非常有用的 nodetool 命令，您可以使用它们：

http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html

我假设您使用的是 Cassandra 2.0，但对于其他版本，很多命令与nodetool

作为旁注，您还可以使用提供集群图形视图的 OpsCenter，有关更多信息，请参阅：http ://www.datastax.com/documentation/opscenter/5.0/opsc/about_c.html