I recently started working with an Elassandra cluster with two data centers which have been configured using NetworkTopologyStrategy.
Cluster details : Elassandra 6.2.3.15 = Elasticsearch 6.2.3 + Cassandra 3.11.4
Datacenter: DC1
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip1> 50 GiB 256 ? 6cab1f4c-8937-437d-b010-0a5677443dc3 rack1
UN <ip2> 48 GiB 256 ? 6c9e7ad5-a642-4c0d-8b77-e78d821d904b rack1
UN <ip3> 50 GiB 256 ? 7e493bc6-c8a5-471e-8eee-3f3fe985b90a rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN <ip4> 47 GiB 256 ? c49c1203-cc38-41a2-b9c8-2b42bc907c17 rack1
UN <ip5> 67 GiB 256 ? 0d9f31bc-9690-49b6-9d88-4fb30c1b6c0d rack1
UN <ip6> 88 GiB 256 ? 80c4d60d-185f-457a-ae9b-2eb611735f07 rack1
schema info
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true;
The DC2
is kind of a Disaster Recovery site and in an ideal world, we should be able to use only that in case of a disaster.
- With the very limited knowledge I have, I strongly suspect that we need to modify the rack configuration to have a 'proper' D/R cluster (So that data in DC1 gets replicated in DC2) or am I getting this wrong? If so, is there a standard guideline to follow?
- When there are multiple DCs, does Cassandra automatically replicate this regardless of rack configurations? (Are racks kind of additional fail proof?)
- DC2 has more data than DC1. Is this purely related to hash function?
- Is there any other things that can be rectified in this cluster?
Many thanks!