cassandra - Elassandra replication information and rack configuration

Question

I recently started working with an Elassandra cluster with two data centers which have been configured using NetworkTopologyStrategy.

Cluster details : Elassandra 6.2.3.15 = Elasticsearch 6.2.3 + Cassandra 3.11.4

Datacenter: DC1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  <ip1>         50 GiB  256          ?       6cab1f4c-8937-437d-b010-0a5677443dc3  rack1
UN  <ip2>         48 GiB  256          ?       6c9e7ad5-a642-4c0d-8b77-e78d821d904b  rack1
UN  <ip3>         50 GiB  256          ?       7e493bc6-c8a5-471e-8eee-3f3fe985b90a  rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns    Host ID                               Rack
UN  <ip4>         47 GiB  256          ?       c49c1203-cc38-41a2-b9c8-2b42bc907c17  rack1
UN  <ip5>         67 GiB  256          ?       0d9f31bc-9690-49b6-9d88-4fb30c1b6c0d  rack1
UN  <ip6>         88 GiB  256          ?       80c4d60d-185f-457a-ae9b-2eb611735f07  rack1

schema info
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;

The DC2 is kind of a Disaster Recovery site and in an ideal world, we should be able to use only that in case of a disaster.

With the very limited knowledge I have, I strongly suspect that we need to modify the rack configuration to have a 'proper' D/R cluster (So that data in DC1 gets replicated in DC2) or am I getting this wrong? If so, is there a standard guideline to follow?
When there are multiple DCs, does Cassandra automatically replicate this regardless of rack configurations? (Are racks kind of additional fail proof?)
DC2 has more data than DC1. Is this purely related to hash function?
Is there any other things that can be rectified in this cluster?

Many thanks!

score 2 · Accepted Answer

这些复制设置意味着您的密钥空间的数据在 2 个 DC 之间实时复制，每个 DC 具有 3 个副本（副本）：

CREATE KEYSPACE my_keyspace WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '3',
  'DC2': '3'
}

Cassandra 中的复制是实时发生的——发送到一个 DC 的任何写入都会同时发送到所有其他 DC。与传统的 RDBMS 或具有主/辅助或活动/DR 的配置不同，Cassandra 复制是即时和即时的。

逻辑 Cassandra 机架用于额外的冗余机制。如果您将 C* 节点部署在不同的 (a) 物理机架或 (b) 公共云可用区中，Cassandra 会将副本分发到单独的机架，以便每个机架都有完整的数据副本。在 DC 中复制因子为 3 时，如果一个机架由于某种原因出现故障，那么其余 2 个机架中仍然存在数据的完整副本，并且一致性为LOCAL_QUORUM（或更低）的读/写请求不会受到影响。

我在这篇文章中更详细地解释了这一点——https: //community.datastax.com/questions/1128/。

如果您是 Cassandra 的新手，我们推荐https://www.datastax.com/dev链接到简短的动手教程，您可以在其中快速学习 Cassandra 的基础知识——全部免费。本教程是一个很好的起点——https://www.datastax.com/try-it-out。干杯!

cassandra - Elassandra replication information and rack configuration

1 回答 1

Related

Reference